Proceed as follows:
Define a function to get mean features for particular row from
the current group, for the required period:
def getRowMeans(row, grp):
dTo = row.Date
return grp[grp.Date.between(dTo - pd.DateOffset(years=3), dTo)]
.loc[:, 'Feature 1' : 'Feature 3'].mean()
The idea is:
- the current object is a row, from a group of rows (for some
member_id),
- for the operations below the whole group must also be known,
so it is another parameter of this function (grp),
- from grp take rows for 3 years preceding from the Date from
the current row (including this date),
- from these rows take all 3 Feature columns and return their means.
Define a function, to be called for each group of rows (for some
member_id), returning a copy of this group with all Feature columns
replaced with their means (generated by getRowMeans):
def FeaturesToMeans(grp):
means = grp.apply(getRowMeans, axis=1, grp=grp)
rv = grp.copy()
rv.update(means)
return rv
The first step is to compute feature means.
In order not to alter the original group, the object to be finally
returned (rv) must be created as a copy of the original group.
Then it is updated with the just computed means.
Note however that update operates in place and does not return any
result.
The returned object is the updated group.
Generate the actual result, as a new DataFrame:
result = df.groupby('member_id', group_keys=False).apply(FeaturesToMeans)
The result, for your sample data, is:
member_id Feature 1 Feature 2 Feature 3 Date
0 1 0.133333 0.333333 0.20 2020-01-02
1 1 0.200000 0.233333 0.20 2018-01-02
2 1 0.300000 0.200000 0.20 2016-01-02
3 1 0.200000 0.200000 0.15 2017-01-04
4 2 0.450000 0.100000 0.30 2018-01-02
5 2 0.500000 0.100000 0.20 2015-01-02
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…