Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
490 views
in Technique[技术] by (71.8m points)

python - Pandas Dataframe keep rows where date is between two dates (seperate columns)

I have a dataframe that looks similar to this :

Price        From           To
 300€        2020-01-01     2020-01-07
 250€        2020-01-04     2020-01-08
 150€        2020-02-01     2020-02-04
 350€        2020-02-04     2020-02-08

And then I have a list of dates. For example: list = [2020-01-03, 2020-02-04]

I would like to keep only the rows of the dataframe where the dates are in between the From column and the To column.

So, after transformation I would have the following dataframe.

Price        From           To
 300€        2020-01-01     2020-01-07
 150€        2020-02-01     2020-02-04
 350€        2020-02-04     2020-02-08

First I thought of using a lambda with an apply but I thought it was not very efficient because my dataset is very large. Is there a simpler way to do this with pandas ?

The result would be contained in one single dataframe


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Let's try with numpy broadcasting:

x, y = df[['From', 'To']].values.T
a = np.array(['2020-01-03', '2020-02-04'], dtype=np.datetime64)
mask = ((x[:, None] <= a) & (y[:, None] >= a)).any(1)

df[mask]

  Price       From         To
0  300€ 2020-01-01 2020-01-07
2  150€ 2020-02-01 2020-02-04
3  350€ 2020-02-04 2020-02-08

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...