Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
489 views
in Technique[技术] by (71.8m points)

python - Elegant and Efficient way retain date values as is without OOB error

I have a dataframe which is like as shown below

df1_new = pd.DataFrame({'person_id': [1, 1, 3, 3, 5, 5],'obs_date': ['7/23/2377  12:00:00 AM', 'NA-NA-NA NA:NA:NA', 'NA-NA-NA NA:NA:NA', '7/27/2277  12:00:00 AM', '7/13/2077  12:00:00 AM', 'NA-NA-NA NA:NA:NA']})

enter image description here

As you can see few of my date values are out of bound values. However, I would still like to retain them as it is. Unfortunately, I couldn't due to OOB issue

I tried below

pd.to_datetime(df1_new['obs_date'], format='%m/%d/%Y %I:%M:%S %p', errors='coerce')

enter image description here

Is there any other efficient way to retain the date value as is but by changing the format alone? I am fine if it can be string column/datatype

I expect my output to be like as shown below.

enter image description here

updated try/except screenshot

enter image description here

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can convert values to datetimes and then to day Period for only possible format in pandas for represent out of bound values.

If omit it, then working with python datetimes objects, not with pandas datetimes (timestamps).

from datetime import datetime
def str2time(x):
    try:
        return pd.Period(datetime.strptime(x, '%m/%d/%Y %I:%M:%S %p'), 'D')
    except:
        return np.nan

df1_new['obs_date'] = df1_new['obs_date'].apply(str2time)
print(df1_new)
   person_id    obs_date
0          1  2377-07-23
1          1         NaT
2          3         NaT
3          3  2277-07-27
4          5  2077-07-13
5          5         NaT

print(df1_new['obs_date'].dtype)
period[D]

If possible multiple formats:

def str2time(x):
    try:
        #MM/DD/YYYY II:MM:SS pp like 7/23/2377  12:00:00 AM
        return pd.Period(datetime.strptime(x, '%m/%d/%Y %I:%M:%S %p'), 'D')
    except:
        try:
            #YYYY-MM-DD HH:MM:SS like 2377-07-23 00:00:00
            return pd.Period(datetime.strptime(x, '%Y-%m-%d %H:%M:%S'), 'D')
        except:
            return np.nan

df1_new['obs_date'] = df1_new['obs_date'].apply(str2time)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...