Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
296 views
in Technique[技术] by (71.8m points)

python - 过去n个日期中熊猫的真实值总和(Sum of true values over past n dates in pandas)

I have a dataframe of several thousand rows with columns of geography, response_dates and True/False for in_compliance.

(我有几千行的数据框,其中包含地理列,response_dates和in / compliance的True / False。)

df = pd.DataFrame( { 
"geography" : ["Baltimore", "Frederick", "Annapolis", "Hagerstown", "Rockville" , "Salisbury","Towson","Bowie"] , 
"response_date" : ["2018-03-31", "2018-03-30", "2018-03-28", "2018-03-28", "2018-04-02", "2018-03-30","2018-04-07","2018-04-02"],
"in_compliance" : [True, True, False, True, False, True, False, True]})

I want to add a column that represents the number of True values for the most recent four dates in the response_date column, including the response_date for that row.

(我想在response_date列中添加代表最近四个日期的True值数量的列,包括该行的response_date。)

An example of the desired output:

(所需输出的示例:)

 geography  response_date   in_compliance   Past_4_dates_sum_of_true
Baltimore   2018-03-24  True    1
Baltimore   2018-03-25  False   1
Baltimore   2018-03-26  False   1
Baltimore   2018-03-27  False   1
Baltimore   2018-03-30  False   0
Baltimore   2018-03-31  True    1
Baltimore   2018-04-01  True    2
Baltimore   2018-04-02  True    3
Baltimore   2018-04-03  False   3
Baltimore   2018-04-06  True    3
Baltimore   2018-04-07  True    3
Baltimore   2018-04-08  False   2

I've tried different approaches to groupby and rolling.

(我尝试了不同的分组和滚动方法。)

But I get results that are not what I expect and need.

(但是我得到的结果不是我期望和期望的。)

df.groupby('city').resample('d').sum().fillna(0).groupby('city').rolling(4,min_periods=1).sum()

This was another approach I took:

(这是我采取的另一种方法:)

    df1 = df.groupby(['city']).apply(lambda x: x.set_index('response_date').resample('1D').first())
    df2 = df1.groupby(level=0)['in_compliance']
         .apply(lambda x: x.shift().rolling(min_periods=1,window=4).count())
         .reset_index(name='Past_4_dates_sum_of_true')
  ask by JamesMiller translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

It's much simplier:

(这要简单得多:)

df['Past_4_dates_sum_of_true'] = df.rolling(4, min_periods=1)['in_compliance'].sum().astype(int)

Output:

(输出:)

       geography response_date  in_compliance  Past_4_dates_sum_of_true
0   Baltimore    2018-03-24           True                         1
1   Baltimore    2018-03-25          False                         1
2   Baltimore    2018-03-26          False                         1
3   Baltimore    2018-03-27          False                         1
4   Baltimore    2018-03-30          False                         0
5   Baltimore    2018-03-31           True                         1
6   Baltimore    2018-04-01           True                         2
7   Baltimore    2018-04-02           True                         3
8   Baltimore    2018-04-03          False                         3
9   Baltimore    2018-04-06           True                         3
10  Baltimore    2018-04-07           True                         3
11  Baltimore    2018-04-08          False                         2

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...