Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.4k views
in Technique[技术] by (71.8m points)

amazon s3 - Rename written CSV file Spark

I'm running spark 2.1 and I want to write a csv with results into Amazon S3. After repartitioning the csv file has kind of a long kryptic name and I want to change that into a specific filename.

I'm using the databricks lib for writing into S3.

dataframe
    .repartition(1)
    .write
    .format("com.databricks.spark.csv")
    .option("header", "true")
    .save("folder/dataframe/")

Is there a way to rename the file afterwards or even save it directly with the correct name? I've already looked for solutions and havent found much.

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can use below to rename the output file.

dataframe.repartition(1).write.format("com.databricks.spark.csv").option("header", "true").save("folder/dataframe/")

import org.apache.hadoop.fs._

val fs = FileSystem.get(sc.hadoopConfiguration)

val filePath = "folder/dataframe/"
val fileName = fs.globStatus(new Path(filePath+"part*"))(0).getPath.getName

fs.rename(new Path(filePath+fileName), new Path(filePath+"file.csv"))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...