pandas - Python - Efficient way to add rows to dataframe

Question

Welcome To Ask or Share your Answers For Others

pandas - Python - Efficient way to add rows to dataframe

1 Answer

深蓝 · Answer 1 · 2021-10-17T01:38:27+0000

I used this answer's df.loc[i] = [new_data] suggestion, but I have > 500,000 rows and that was very slow.

While the answers given are good for the OP's question, I found it more efficient, when dealing with large numbers of rows up front (instead of the tricking in described by the OP) to use csvwriter to add data to an in memory CSV object, then finally use pandas.read_csv(csv) to generate the desired DataFrame output.

from io import BytesIO
from csv import writer 
import pandas as pd

output = BytesIO()
csv_writer = writer(output)

for row in iterable_object:
    csv_writer.writerow(row)

output.seek(0) # we need to get back to the start of the BytesIO
df = pd.read_csv(output)
return df

This, for ~500,000 rows was 1000x faster and as the row count grows the speed improvement will only get larger (the df.loc[1] = [data] will get a lot slower comparatively)

Hope this helps someone who need efficiency when dealing with more rows than the OP.

Categories

pandas - Python - Efficient way to add rows to dataframe

pandas - Python - Efficient way to add rows to dataframe

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags