scala - Specifying the filename when saving a DataFrame as a CSV

Question

Welcome To Ask or Share your Answers For Others

scala - Specifying the filename when saving a DataFrame as a CSV

1 Answer

深蓝 · Answer 1 · 2021-10-06T17:41:08+0000

It's not possible to do it directly in Spark's save

Spark uses Hadoop File Format, which requires data to be partitioned - that's why you have part- files. You can easily change filename after processing just like in this question

In Scala it will look like:

import org.apache.hadoop.fs._
val fs = FileSystem.get(sc.hadoopConfiguration)
val file = fs.globStatus(new Path("path/file.csv/part*"))(0).getPath().getName()

fs.rename(new Path("csvDirectory/" + file), new Path("mydata.csv"))
fs.delete(new Path("mydata.csv-temp"), true)

or just:

import org.apache.hadoop.fs._
val fs = FileSystem.get(sc.hadoopConfiguration)
fs.rename(new Path("csvDirectory/data.csv/part-0000"), new Path("csvDirectory/newData.csv"))

Edit: As mentioned in comments, you can also write your own OutputFormat, please see documents for information about this approach to set file name

Categories

scala - Specifying the filename when saving a DataFrame as a CSV

scala - Specifying the filename when saving a DataFrame as a CSV

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags