pyspark dataframe save into hive

时间：2021-01-18 11:29:36 阅读：0 评论：0 收藏：0 [点我收藏+]

标签：port short byte into hadoop spark comment from sql

# 先定义dataframe各列的数据类型

from pyspark.sql.types import *
schema = StructType([
StructField("a", NullType(), True),
StructField("b", AtomicType(), True),
StructField("c", NumericType(), True),
StructField("d", IntegralType(), True),
StructField("e", FractionalType(), True),
StructField("f", StringType(), True),
StructField("g", BinaryType(), True),
StructField("h", BooleanType(), True),
StructField("i", DateType(), True),
StructField("j", TimestampType(), True),
StructField("k", DecimalType(), True),
StructField("l", DoubleType(), True),
StructField("m", FloatType(), True),
StructField("n", ByteType(), True),
StructField("o", IntegerType(), True),
StructField("p", LongType(), True),
StructField("q", ShortType(), True),
StructField("r", ArrayType(), True),
StructField("s", MapType(), True)])

# 通过定义好的dataframe的schema来创建空dataframe
df1 = spark.createDataFrame(spark.sparkContext.emptyRDD(), schema)

from pyspark.sql.types import *
schema = StructType([
StructField("a", NullType(), True),
StructField("b", BooleanType(), True),
StructField("c", ByteType(), True),
StructField("d", ShortType(), True),
StructField("e", IntegerType(), True),
StructField("f", LongType(), True),
StructField("g", FloatType(), True),
StructField("h", DoubleType(), True),
StructField("i", DecimalType(), True),
StructField("j", StringType(), True),
StructField("k", BinaryType(), True),
StructField("l", DateType(), True),
StructField("m", TimestampType(), True),
StructField("n", ArrayType(StringType()), True),
StructField("o", MapType(StringType(), IntegerType()), True)])
=================================================================
from pyspark.sql.types import *
schema = StructType([
StructField("b", BooleanType(), True),
StructField("c", ByteType(), True),
StructField("d", ShortType(), True),
StructField("e", IntegerType(), True),
StructField("f", LongType(), True),
StructField("g", FloatType(), True),
StructField("h", DoubleType(), True),
StructField("i", DecimalType(), True),
StructField("j", StringType(), True),
StructField("k", BinaryType(), True),
StructField("l", DateType(), True),
StructField("m", TimestampType(), True),
StructField("n", ArrayType(StringType()), True),
StructField("o", MapType(StringType(), IntegerType()), True)])

df = spark.createDataFrame(spark.sparkContext.emptyRDD(), schema)

=============================================================================
pyspark 创建dataframe
>>> from pyspark.sql.types import *
>>> schema = StructType([
... StructField("b", BooleanType(), True),
... StructField("c", ByteType(), True),
... StructField("d", ShortType(), True),
... StructField("e", IntegerType(), True),
... StructField("f", LongType(), True),
... StructField("g", FloatType(), True),
... StructField("h", DoubleType(), True),
... StructField("i", DecimalType(), True),
... StructField("j", StringType(), True),
... StructField("k", BinaryType(), True),
... StructField("l", DateType(), True),
... StructField("m", TimestampType(), True),
... StructField("n", ArrayType(StringType()), True),
... StructField("o", MapType(StringType(), IntegerType()), True)])
>>> df = spark.createDataFrame(spark.sparkContext.emptyRDD(), schema)
>>> df.show()
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| b| c| d| e| f| g| h| i| j| k| l| m| n| o|
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

beeline -u jdbc:hive2://hdp-node3:10000 -n hadoop
0: jdbc:hive2://hdp-node3:10000> show databases;
+------------------------+--+
| database_name |
+------------------------+--+
| da_component_instance |
| default |
| fileformatdb |
| ods |
| test |
+------------------------+--+
5 rows selected (0.6 seconds)
0: jdbc:hive2://hdp-node3:10000> use default;
No rows affected (0.493 seconds)
0: jdbc:hive2://hdp-node3:10000> show tables;
+------------------------------+--+
| tab_name |
+------------------------------+--+
| liutest |
| pysparkdf |
+------------------------------+--+
51 rows selected (0.523 seconds)
0: jdbc:hive2://hdp-node3:10000> desc pysparkdf;
+-----------+------------------+----------+--+
| col_name | data_type | comment |
+-----------+------------------+----------+--+
| b | boolean | |
| c | tinyint | |
| d | smallint | |
| e | int | |
| f | bigint | |
| g | float | |
| h | double | |
| i | decimal(10,0) | |
| j | string | |
| k | binary | |
| l | date | |
| m | timestamp | |
| n | array<string> | |
| o | map<string,int> | |
+-----------+------------------+----------+--+
14 rows selected (0.61 seconds)

pyspark dataframe save into hive

标签：port short byte into hadoop spark comment from sql

原文地址：https://www.cnblogs.com/songyuejie/p/14289283.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行