I am trying to save OHLCV (stock pricing) data from a dataframe into a single zipped csv file as follows. My test data is ohlcvData.csv, which I read into a dataframe with
import pandas as pd
df = pd.read_csv('ohlcvData.csv', header=None, names=['datetime', 'open', 'high', 'low', 'close', 'volume'], index_col='datetime')
and when I try to write it to a zip file like so (following stackoverflow.com/questions/55134716) :
df.to_csv('ohlcvData.zip', header=False, compression=dict(method='zip', archive_name='ohlcv.csv'))
I get the following warning ...
C:Program Files (x86)Microsoft Visual StudioSharedPython37_64libzipfile.py:1473: UserWarning: Duplicate name: 'ohlcv.csv'
return self._open_to_write(zinfo, force_zip64=force_zip64)
and the resultant ohlcvData.zip file contains two files, both named ohlcv.csv, each containing a portion of the results.
When I try to read the zip file back into a dataframe ...
dfRead = pd.read_csv(ohlcvData.zip', header=None, names=['datetime', 'open', 'high', 'low', 'close', 'volume'], index_col='datetime')
... I get the following error...
*File "C:UsersjeffmAppDataRoamingPythonPython37site-packagespandasiocommon.py", line 618, in get_handle
"Multiple files found in ZIP file. "
ValueError: Multiple files found in ZIP file. Only one file per ZIP: ['ohlcv.csv', 'ohlcv.csv']*
However, when I reduce the number of rows in the input file from 200 to around 175 (for this file structure it varies slightly how many lines I have to remove depending on the data), it works and produces a zip file, containing one csv file, which can be loaded back into a dataframe without error. I have tried many different files, with different data and formats and I still get the same result -- any file with over (approx) 175 lines fails and any file with less works fine. So it looks like its splitting the file after a certain size, but from the docs there doesn't appear to be such a setting. Any help on this would be appreciated. Thanks.