Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
304 views
in Technique[技术] by (71.8m points)

python - filter pandas dataframe by time

I have a pandas dataframe which I want to subset on time greater or less than 12pm. First i convert my string datetime to datetime[64]ns object in pandas.

segments_data['time'] = pd.to_datetime((segments_data['time']))

Then I separate time,date,month,year & dayofweek like below.

import datetime as dt

segments_data['date'] = segments_data.time.dt.date
segments_data['year'] = segments_data.time.dt.year
segments_data['month'] = segments_data.time.dt.month
segments_data['dayofweek'] = segments_data.time.dt.dayofweek
segments_data['time'] = segments_data.time.dt.time

My time column looks like following.

segments_data['time']
Out[1906]: 
  07:43:00
  07:52:00
  08:00:00
  08:42:00
  09:18:00
  09:18:00
  09:18:00
  09:23:00
  12:32:00
  12:43:00
  12:55:00
  Name: time, dtype: object

Now I want to subset dataframe with time greater than 12pm and time less than 12pm.

segments_data.time[segments_data['time'] < 12:00:00]

It doesn't work because time is a string object.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Update

From pandas docs at https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.between_time.html. Thanks to Frederick in the comments.

Create dataframe with datetimes in it:

i = pd.date_range('2018-04-09', periods=4, freq='1D20min')
ts = pd.DataFrame({'A': [1, 2, 3, 4]}, index=i)
ts
                     A
2018-04-09 00:00:00  1
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3
2018-04-12 01:00:00  4

Use between_time:

ts.between_time('0:15', '0:45')
                     A
2018-04-10 00:20:00  2
2018-04-11 00:40:00  3

You get the times that are not between two times by setting start_time later than end_time:

ts.between_time('0:45', '0:15')
                     A
2018-04-09 00:00:00  1
2018-04-12 01:00:00  4

Old Answer

Leave a column as the raw datetime, call it ts:

segments_data['ts'] = pd.to_datetime((segments_data['time']))

Next you can cast the datetime to an H:M:S string and use between(start,end) seems to work:

In [227]:
segments_data=pd.DataFrame(x,columns=['ts'])
segments_data.ts = pd.to_datetime(segments_data.ts)
segments_data
Out[227]:
ts
0   2016-01-28 07:43:00
1   2016-01-28 07:52:00
2   2016-01-28 08:00:00
3   2016-01-28 08:42:00
4   2016-01-28 09:18:00
5   2016-01-28 09:18:00
6   2016-01-28 09:18:00
7   2016-01-28 09:23:00
8   2016-01-28 12:32:00
9   2016-01-28 12:43:00
10  2016-01-28 12:55:00

In [228]:    
 segments_data[segments_data.ts.dt.strftime('%H:%M:%S').between('00:00:00','12:00:00')]
Out[228]:
ts
0   2016-01-28 07:43:00
1   2016-01-28 07:52:00
2   2016-01-28 08:00:00
3   2016-01-28 08:42:00
4   2016-01-28 09:18:00
5   2016-01-28 09:18:00
6   2016-01-28 09:18:00
7   2016-01-28 09:23:00

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...