Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
206 views
in Technique[技术] by (71.8m points)

python - issue in converting a dtype('O')to date format

Hi i am having dates (under column name: 'earliest_cr_line') in raw data set, however when I check the dtype on Jupyter it is dtype('O'), I used the following code to convert to date time format in following code:

pd.to_datetime(final_3['earliest_cr_line'], format='%m/%d/%Y')

But this code is showing the error as ValueError: time data 'Jan-85' does not match format '%m/%Y' (match)

how do I convert that entire column into date format and then create another column where I see the difference in months between that date and June 30th 2015

question from:https://stackoverflow.com/questions/65936893/issue-in-converting-a-dtypeoto-date-format

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

If all datetime has format MMM-YY and MMM is first 3 letters of months names use:

print (final_3)
        id earliest_cr_line
0  1077501           Jan-85
1  1077430           Apr-99
2  1077175           Nov-01
3  1076863           Feb-96
4  1075358           Jan-96

final_3['earliest_cr_line'] = pd.to_datetime(final_3['earliest_cr_line'], format='%b-%y')

final_3['diff'] = (pd.to_datetime('2015-06-30') - final_3['earliest_cr_line']).dt.days
print (final_3)
     id earliest_cr_line   diff
0  1077501       1985-01-01  11137
1  1077430       1999-04-01   5934
2  1077175       2001-11-01   4989
3  1076863       1996-02-01   7089
4  1075358       1996-01-01   7120

EDIT: You can subtract 100 years by some treshold, here is used year > 2021:

print (final_3)
        id earliest_cr_line
0  1077501           Jan-63
1  1077430           Apr-99
2  1077175           Nov-01
3  1076863           Feb-96
4  1075358           Jan-96

final_3['earliest_cr_line'] = pd.to_datetime(final_3['earliest_cr_line'], format='%b-%y')

mask = final_3['earliest_cr_line'].dt.year > 2021
h = pd.DateOffset(years=100)
final_3.loc[mask, 'earliest_cr_line'] = final_3['earliest_cr_line'] - h

final_3['diff'] = (pd.to_datetime('2015-06-30') - final_3['earliest_cr_line']).dt.days

print (final_3)
        id earliest_cr_line   diff
0  1077501       1963-01-01  19173
1  1077430       1999-04-01   5934
2  1077175       2001-11-01   4989
3  1076863       1996-02-01   7089
4  1075358       1996-01-01   7120

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...