hive - Overwrite only some partitions in a partitioned spark Dataset

Question

Welcome To Ask or Share your Answers For Others

hive - Overwrite only some partitions in a partitioned spark Dataset

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:13:34+0000

Since Spark 2.3.0 this is an option when overwriting a table. To overwrite it, you need to set the new spark.sql.sources.partitionOverwriteMode setting to dynamic, the dataset needs to be partitioned, and the write mode overwrite. Example in scala:

spark.conf.set(
  "spark.sql.sources.partitionOverwriteMode", "dynamic"
)
data.write.mode("overwrite").insertInto("partitioned_table")

I recommend doing a repartition based on your partition column before writing, so you won't end up with 400 files per folder.

Before Spark 2.3.0, the best solution would be to launch SQL statements to delete those partitions and then write them with mode append.

Categories

hive - Overwrite only some partitions in a partitioned spark Dataset

hive - Overwrite only some partitions in a partitioned spark Dataset

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags