pd.read_csv(iterator=True)
returns an iterator of type TextFileReader
. I need to call TextFileReader.get_chunk
in order to specify the number of rows to return for each call.
import random
import pandas as pd
chunks = pd.read_csv('file.csv', iterator=True)
try:
while True:
chunk = chunks.get_chunk(random.randint(1,3))
print(chunk)
except StopIteration:
pass
Question: Is there a way to get rid of the try construction in this code? Said otherwise is there a condition to put in the while statement to indicate the iterator has no more rows to deliver?
Here is some csv content for tests:
"Year", "Score", "Title"
1968, 86, "Greetings"
1970, 17, "Bloody Mama"
1971, 40, "Born to Win"
1973, 98, "Mean Streets"
1973, 88, "Bang the Drum Slowly"
1976, 41, "The Last Tycoon"
1976, 99, "Taxi Driver"
Notes
I know the for
loop is designed to catch the StopIteration
signal, and there is a way to iterate over TextFileReader
returned by pd.read_csv
but in this case I think I can't manage the variable number of rows returned, it must be fixed:
chunks = pd.read_csv('file.csv',chunksize=3)
for chunk in chunks:
print(chunk)
Difficulties with the documentation:
For some reason the pandas documentation doesn't provide the documentation of pandas.io.parsers.TextFileReader
, the only pseudo-documentation I found is from kite site, and is mostly an empty shell.
It seems also TextFileReader
has been a context manager at some time, and this could have been another solution. However this is not the case anymore, in spite the documentation still says it is one, and provides examples which don't work like:
with pd.read_csv("tmp.sv", sep="|", iterator=True) as reader:
reader.get_chunk(5)
question from:
https://stackoverflow.com/questions/65850627/pd-read-csv-using-variable-size-chunks-how-to-stop-without-try-except