Since Spark 2.3.0 this is an option when overwriting a table. To overwrite it, you need to set the new spark.sql.sources.partitionOverwriteMode
setting to dynamic
, the dataset needs to be partitioned, and the write mode overwrite
.
Example in scala:
spark.conf.set(
"spark.sql.sources.partitionOverwriteMode", "dynamic"
)
data.write.mode("overwrite").insertInto("partitioned_table")
I recommend doing a repartition based on your partition column before writing, so you won't end up with 400 files per folder.
Before Spark 2.3.0, the best solution would be to launch SQL statements to delete those partitions and then write them with mode append.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…