I have a dataframe that I saved to a pickle file. When I load it with read_pickle
it fails with the following error on roughly 1/10th of runs:
ValueError: Level values must be unique: [Timestamp('2020-06-03 15:59:59.999999+0000', tz='UTC'), datetime.date(2020, 6, 3), datetime.date(2020, 6, 4), datetime.date(2020, 6, 5)] on level 0
What is causing this stochastic behaviour?
The issue can be reproduced with the following:
from datetime import timedelta, date
import pandas as pd
import pytz
from pandas import Timestamp
utc = pytz.UTC
data = {
"date": [
Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).replace(minute=59, second=59, microsecond=999999),
Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date(),
Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date(),
Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=1),
Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=1),
Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=2),
Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=2),
],
"status": ["in_progress", "in_progress", "done", "in_progress", "done", "in_progress", "done"],
"issue_count": [20, 18, 2, 14, 6, 10, 10],
"points": [100, 90, 10, 70, 30, 50, 50],
"stories": [0, 0, 0, 0, 0, 0, 0],
"tasks": [100, 100, 100, 100, 100, 100, 100],
"bugs": [0, 0, 0, 0, 0, 0, 0],
"subtasks": [0, 0, 0, 0, 0, 0, 0],
"assignee": ["Name", "Name", "Name", "Name", "Name", "Name", "Name"],
}
df = pd.DataFrame(data).groupby(["date", "status"]).sum()
df.to_pickle("~/failing_df.pkl")
pd.read_pickle("~/failing_df.pkl")
question from:
https://stackoverflow.com/questions/65937092/read-pickle-failing-stochastically