I have a dataframe with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?
My sample df
:
df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1),
end=dt.datetime(2016,1,2,0,0,1), freq='H'))
.reset_index().rename(columns={'index':'datetime'})
df.head()
datetime
0 2016-01-01 00:00:01
1 2016-01-01 01:00:01
2 2016-01-01 02:00:01
3 2016-01-01 03:00:01
4 2016-01-01 04:00:01
Now I convert datetime to timestamp value-by-value with .apply()
but it takes a very long time (some hours) if I have some (hundreds of) million rows:
df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)
df.head()
datetime ts
0 2016-01-01 00:00:01 1451602801
1 2016-01-01 01:00:01 1451606401
2 2016-01-01 02:00:01 1451610001
3 2016-01-01 03:00:01 1451613601
4 2016-01-01 04:00:01 1451617201
The above result is what I want.
If I try to use the .dt
accessor of pandas.Series
then I get error message:
df['ts'] = df['datetime'].dt.timestamp
AttributeError: 'DatetimeProperties' object has no attribute
'timestamp'
If I try to create eg. the date parts of datetimes with the .dt
accessor then it is much more faster then using .apply()
:
df['date'] = df['datetime'].dt.date
df.head()
datetime ts date
0 2016-01-01 00:00:01 1451602801 2016-01-01
1 2016-01-01 01:00:01 1451606401 2016-01-01
2 2016-01-01 02:00:01 1451610001 2016-01-01
3 2016-01-01 03:00:01 1451613601 2016-01-01
4 2016-01-01 04:00:01 1451617201 2016-01-01
I want something similar with timestamps...
But I don't really understand the official documentation: it talks about "Converting to Timestamps" but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime()
but not to timestamp...
pandas.Timestamp
constructor also doesn't work (returns with the below error):
df['ts2'] = pd.Timestamp(df['datetime'])
TypeError: Cannot convert input to Timestamp
pandas.Series.to_timestamp
also makes something totally different that I want:
df['ts3'] = df['datetime'].to_timestamp
df.head()
datetime ts ts3
0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016...
1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016...
2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016...
3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016...
4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016...
Thank you!!
question from:
https://stackoverflow.com/questions/40881876/python-pandas-convert-datetime-to-timestamp-effectively-through-dt-accessor