Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
108 views
in Technique[技术] by (71.8m points)

python - read_pickle failing stochastically

I have a dataframe that I saved to a pickle file. When I load it with read_pickle it fails with the following error on roughly 1/10th of runs:

ValueError: Level values must be unique: [Timestamp('2020-06-03 15:59:59.999999+0000', tz='UTC'), datetime.date(2020, 6, 3), datetime.date(2020, 6, 4), datetime.date(2020, 6, 5)] on level 0

What is causing this stochastic behaviour?

The issue can be reproduced with the following:

from datetime import timedelta, date
import pandas as pd
import pytz
from pandas import Timestamp

utc = pytz.UTC

data = {
    "date": [
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).replace(minute=59, second=59, microsecond=999999),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date(),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date(),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=1),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=1),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=2),
        Timestamp("2020-06-03 15:00:00").replace(tzinfo=utc).date() + timedelta(days=2),
    ],
    "status": ["in_progress", "in_progress", "done", "in_progress", "done", "in_progress", "done"],
    "issue_count": [20, 18, 2, 14, 6, 10, 10],
    "points": [100, 90, 10, 70, 30, 50, 50],
    "stories": [0, 0, 0, 0, 0, 0, 0],
    "tasks": [100, 100, 100, 100, 100, 100, 100],
    "bugs": [0, 0, 0, 0, 0, 0, 0],
    "subtasks": [0, 0, 0, 0, 0, 0, 0],
    "assignee": ["Name", "Name", "Name", "Name", "Name", "Name", "Name"],
}
df = pd.DataFrame(data).groupby(["date", "status"]).sum()

df.to_pickle("~/failing_df.pkl")
pd.read_pickle("~/failing_df.pkl")
question from:https://stackoverflow.com/questions/65937092/read-pickle-failing-stochastically

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

try to_csv() or to_dict()

# write it to csv
df.to_csv('temp.csv')
# read it from csv
df2 = pd.read_csv('temp.csv')
df2.set_index(['date', 'status'], inplace=True)

or optionally

df_dict = df.to_dict()

# pickle it
df.to_pickle('temp.pkl')

# unpickle it   
df2 = pd.read_pickle('temp.pkl')

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...