r - dplyr, lubridate : how to aggregate a dataframe by week?

Question

Welcome To Ask or Share your Answers For Others

r - dplyr, lubridate : how to aggregate a dataframe by week?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - dplyr, lubridate : how to aggregate a dataframe by week?

Consider the following example

library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))

Source: local data frame [25 x 3]

         time values day_of_week
       <date>  <dbl>      <fctr>
1  2014-02-24     30         Mon
2  2014-02-25     45        Tues
3  2014-02-26     30         Wed
4  2014-02-27     50       Thurs
5  2014-02-28     50         Fri
6  2014-03-01     20         Sat
7  2014-03-02     35         Sun
8  2014-03-03     50         Mon
9  2014-03-04     35        Tues
10 2014-03-05     35         Wed

I would like to aggregate this dataframe by week.

That is, suppose I define a week as starting on Monday morning and ending on Sunday evening, which we will call a Monday to Monday cycle. (importantly, I want to be able to choose other conventions, such as Friday to Friday for instance).

Then, I would simply like to count the mean of values for each week.

For instance, in the example above, one would compute the average of values between Monday February 24th to Sunday March 2nd, and so on.

How can I do that?

Thanks!

EDIT: thanks to all of you who contributed an idea. Somewhat unusual, I think my late solution is probably more appropriate here. Thanks again!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T00:10:15+0000

In the tidyverse,

df2 %>% group_by(week = week(time)) %>% summarise(value = mean(values))

## # A tibble: 5 × 2
##    week    value
##   <dbl>    <dbl>
## 1     8 37.50000
## 2     9 38.57143
## 3    10 38.57143
## 4    11 36.42857
## 5    12 45.00000

Or use isoweek instead:

df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##    week    value
##   <int>    <dbl>
## 1     9 37.14286
## 2    10 40.71429
## 3    11 35.00000
## 4    12 42.50000

Or cut.Date:

df2 %>% group_by(week = cut(time, "week")) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-24 37.14286
## 2 2014-03-03 40.71429
## 3 2014-03-10 35.00000
## 4 2014-03-17 42.50000

which you can tell to start on Sunday, if you prefer:

df2 %>% group_by(week = cut(time, "week", start.on.monday = FALSE)) %>% 
    summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-23 37.50000
## 2 2014-03-02 40.00000
## 3 2014-03-09 33.57143
## 4 2014-03-16 44.00000

If you want to shift to, say, Tuesday start, add one to your dates:

df2 %>% group_by(week = cut(time + 1, "week")) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-24 37.50000
## 2 2014-03-03 40.00000
## 3 2014-03-10 33.57143
## 4 2014-03-17 44.00000

Labels will be off, though. If using cut, consider the implications of its include.lowest and right parameters, documented at ?cut.

Categories

r - dplyr, lubridate : how to aggregate a dataframe by week?

r - dplyr, lubridate : how to aggregate a dataframe by week?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags