Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
871 views
in Technique[技术] by (71.8m points)

scala - How to create hive table from Spark data frame, using its schema?

I want to create a hive table using my Spark dataframe's schema. How can I do that?

For fixed columns, I can use:

val CreateTable_query = "Create Table my table(a string, b string, c double)"
sparksession.sql(CreateTable_query) 

But I have many columns in my dataframe, so is there a way to automatically generate such query?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Assuming, you are using Spark 2.1.0 or later and my_DF is your dataframe,

//get the schema split as string with comma-separated field-datatype pairs
StructType my_schema = my_DF.schema();
String columns = Arrays.stream(my_schema.fields())
                       .map(field -> field.name()+" "+field.dataType().typeName())
                       .collect(Collectors.joining(","));

//drop the table if already created
spark.sql("drop table if exists my_table");
//create the table using the dataframe schema
spark.sql("create table my_table(" + columns + ") 
    row format delimited fields terminated by '|' location '/my/hdfs/location'");
    //write the dataframe data to the hdfs location for the created Hive table
    my_DF.write()
    .format("com.databricks.spark.csv")
    .option("delimiter","|")
    .mode("overwrite")
    .save("/my/hdfs/location");

The other method using temp table

my_DF.createOrReplaceTempView("my_temp_table");
spark.sql("drop table if exists my_table");
spark.sql("create table my_table as select * from my_temp_table");

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...