Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
562 views
in Technique[技术] by (71.8m points)

hdfs - Python: save pandas data frame to parquet file

Is it possible to save a pandas data frame directly to a parquet file? If not, what would be the suggested process?

The aim is to be able to send the parquet file to another team, which they can use scala code to read/open it. Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Pandas has a core function to_parquet(). Just write the dataframe to parquet format like this:

df.to_parquet('myfile.parquet')

You still need to install a parquet library such as fastparquet. If you have more than one parquet library installed, you also need to specify which engine you want pandas to use, otherwise it will take the first one to be installed (as in the documentation). For example:

df.to_parquet('myfile.parquet', engine='fastparquet')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...