Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
169 views
in Technique[技术] by (71.8m points)

python 3.x - How to calculate aggregated mean based on time window in Pandas?

For the problem, I want to calculate the mean for a particular member with a time window of 3 years. For example for the following data frame:

member_id Feature 1 Feature 2 Feature 3 Date
1 0.1 0.5 0.2 1/2/20
1 0.2 0.3 0.3 1/2/18
1 0.3 0.2 0.2 1/2/16
1 0.1 0.2 0.1 1/4/17
2 0.4 0.1 0.4 1/2/18
2 0.5 0.1 0.2 1/2/15
question from:https://stackoverflow.com/questions/65920966/how-to-calculate-aggregated-mean-based-on-time-window-in-pandas

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Proceed as follows:

  1. Define a function to get mean features for particular row from the current group, for the required period:

    def getRowMeans(row, grp):
        dTo = row.Date
        return grp[grp.Date.between(dTo - pd.DateOffset(years=3), dTo)]
            .loc[:, 'Feature 1' : 'Feature 3'].mean()
    

    The idea is:

    • the current object is a row, from a group of rows (for some member_id),
    • for the operations below the whole group must also be known, so it is another parameter of this function (grp),
    • from grp take rows for 3 years preceding from the Date from the current row (including this date),
    • from these rows take all 3 Feature columns and return their means.
  2. Define a function, to be called for each group of rows (for some member_id), returning a copy of this group with all Feature columns replaced with their means (generated by getRowMeans):

    def FeaturesToMeans(grp):
        means = grp.apply(getRowMeans, axis=1, grp=grp)
        rv = grp.copy()
        rv.update(means)
        return rv
    

    The first step is to compute feature means.

    In order not to alter the original group, the object to be finally returned (rv) must be created as a copy of the original group.

    Then it is updated with the just computed means. Note however that update operates in place and does not return any result.

    The returned object is the updated group.

  3. Generate the actual result, as a new DataFrame:

    result = df.groupby('member_id', group_keys=False).apply(FeaturesToMeans)
    

The result, for your sample data, is:

   member_id  Feature 1  Feature 2  Feature 3       Date
0          1   0.133333   0.333333       0.20 2020-01-02
1          1   0.200000   0.233333       0.20 2018-01-02
2          1   0.300000   0.200000       0.20 2016-01-02
3          1   0.200000   0.200000       0.15 2017-01-04
4          2   0.450000   0.100000       0.30 2018-01-02
5          2   0.500000   0.100000       0.20 2015-01-02

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...