Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
460 views
in Technique[技术] by (71.8m points)

java - How to append data to an existing parquet file

I'm using the following code to create ParquetWriter and to write records to it.

ParquetWriter<GenericRecord> parquetWriter = new ParquetWriter(path, writeSupport, CompressionCodecName.SNAPPY, BLOCK_SIZE, PAGE_SIZE);

final GenericRecord record = new GenericData.Record(avroSchema);

parquetWriter.write(record);

But it only allows to create new files(at the specfied path). Is there a way to append data to an existing parquet file (at path)? Caching parquetWriter is not feasible in my case.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

There is a Spark API SaveMode called append: https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/SaveMode.html which I believe solves your problem.

Example of use:

df.write.mode('append').parquet('parquet_data_file')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...