Question
I would like to slice the dataframe (customer_df
) based on other two dataframes (start_time_df
and end_time_df
) without using for loop.
- The goal is to know which customers are handled by the certain employee.
customer_df
: showing the names of customers, locations and the time where and when they bought products.
start_time_df
& end_time_df
: A time sheet of the certain employee's start and end time of work.
I have managed to do this using for loop, but if the dataframe gets large, it would take much time using for loop. As such I am looking for how to do this without using the for loop.
Example and my solution(for loop ver.) is provided below.
Example
import pandas as pd
from datetime import datetime
customer_df = pd.DataFrame({'customer': ['A','B','C','A','D','E','K','A','D','F','P','J'],
'location': ['NY','TX','NY','UT','MA','NV','NY','TX','NY','UT','MA','NV']},
index = [datetime(2020,5,1,9), datetime(2020,5,1,11), datetime(2020,5,1,12),
datetime(2020,5,1,18), datetime(2020,5,2,5), datetime(2020,5,2,10),
datetime(2020,5,2,19), datetime(2020,5,3,2), datetime(2020,5,3,10),
datetime(2020,5,3,18), datetime(2020,5,4,20), datetime(2020,5,4,22)])
start_time_df = pd.DataFrame({'start_time':[datetime(2020,5,1,8), datetime(2020,5,2,8), datetime(2020,5,3,5)]})
end_time_df = pd.DataFrame({'end_time':[datetime(2020,5,1,17), datetime(2020,5,2,17), datetime(2020,5,3,20)]})
customer_df
>>> customer location
2020-05-01 09:00:00 A NY
2020-05-01 11:00:00 B TX
2020-05-01 12:00:00 C NY
2020-05-01 17:00:00 A UT
2020-05-02 05:00:00 D MA
2020-05-02 10:00:00 E NV
2020-05-02 19:00:00 K NY
2020-05-03 02:00:00 A TX
2020-05-03 08:00:00 D NY
2020-05-03 15:00:00 F UT
2020-05-04 20:00:00 P MA
2020-05-04 22:00:00 J NV
- I have managed to slice the
customer_df
using for loop, but is there a way to do this without using for loop?
sliced_df_list = [] # to store slices
start_time_series = start_time_df.loc[:,'start_time']
end_time_series = end_time_df.loc[:,'end_time']
for start_time, end_time in zip(start_time_series, end_time_series):
sliced_df_list.append(customer_df.loc[start_time:end_time,:])
pd.concat(sliced_df_list)
>>> customer location
2020-05-01 09:00:00 A NY
2020-05-01 11:00:00 B TX
2020-05-01 12:00:00 C NY
2020-05-02 10:00:00 E NV
2020-05-03 10:00:00 D NY
2020-05-03 18:00:00 F UT
question from:
https://stackoverflow.com/questions/65642366/how-to-slice-pd-dataframe-several-times-without-using-for-loop