Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
652 views
in Technique[技术] by (71.8m points)

r - dplyr, lubridate : how to aggregate a dataframe by week?

Consider the following example

library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))

Source: local data frame [25 x 3]

         time values day_of_week
       <date>  <dbl>      <fctr>
1  2014-02-24     30         Mon
2  2014-02-25     45        Tues
3  2014-02-26     30         Wed
4  2014-02-27     50       Thurs
5  2014-02-28     50         Fri
6  2014-03-01     20         Sat
7  2014-03-02     35         Sun
8  2014-03-03     50         Mon
9  2014-03-04     35        Tues
10 2014-03-05     35         Wed

I would like to aggregate this dataframe by week.

That is, suppose I define a week as starting on Monday morning and ending on Sunday evening, which we will call a Monday to Monday cycle. (importantly, I want to be able to choose other conventions, such as Friday to Friday for instance).

Then, I would simply like to count the mean of values for each week.

For instance, in the example above, one would compute the average of values between Monday February 24th to Sunday March 2nd, and so on.

How can I do that?

Thanks!

EDIT: thanks to all of you who contributed an idea. Somewhat unusual, I think my late solution is probably more appropriate here. Thanks again!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In the tidyverse,

df2 %>% group_by(week = week(time)) %>% summarise(value = mean(values))

## # A tibble: 5 × 2
##    week    value
##   <dbl>    <dbl>
## 1     8 37.50000
## 2     9 38.57143
## 3    10 38.57143
## 4    11 36.42857
## 5    12 45.00000

Or use isoweek instead:

df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##    week    value
##   <int>    <dbl>
## 1     9 37.14286
## 2    10 40.71429
## 3    11 35.00000
## 4    12 42.50000

Or cut.Date:

df2 %>% group_by(week = cut(time, "week")) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-24 37.14286
## 2 2014-03-03 40.71429
## 3 2014-03-10 35.00000
## 4 2014-03-17 42.50000

which you can tell to start on Sunday, if you prefer:

df2 %>% group_by(week = cut(time, "week", start.on.monday = FALSE)) %>% 
    summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-23 37.50000
## 2 2014-03-02 40.00000
## 3 2014-03-09 33.57143
## 4 2014-03-16 44.00000

If you want to shift to, say, Tuesday start, add one to your dates:

df2 %>% group_by(week = cut(time + 1, "week")) %>% summarise(value = mean(values))

## # A tibble: 4 × 2
##         week    value
##       <fctr>    <dbl>
## 1 2014-02-24 37.50000
## 2 2014-03-03 40.00000
## 3 2014-03-10 33.57143
## 4 2014-03-17 44.00000

Labels will be off, though. If using cut, consider the implications of its include.lowest and right parameters, documented at ?cut.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...