This is a tricky one. The solution below uses non-equi joins to aggregate by 36 hours periods, dcast()
to reshape from long to wide format, and a second join with the original dat
. There can be an arbitrary number of brands.
library(data.table)
library(lubridate)
setDT(dat)[, shoptime := as_datetime(shoptime)]
setorder(dat, shoptime) # not required, just for convenience of observers
dat[.(lb = shoptime - hours(36), ub = shoptime), on = .(shoptime >= lb, shoptime < ub),
nomatch = 0L, by = .EACHI,
.SD[, .N, by = brand][, rank := frank(-N, ties.method="dense")]][
, dcast(unique(.SD[, -1]), shoptime ~ brand, value.var = "rank")][
dat, on = "shoptime"]
shoptime brand1 brand2 brand3 brand5 brand6 brand
1: 2013-09-01 08:35:00 NA NA NA NA NA brand1
2: 2013-09-01 08:54:00 1 NA NA NA NA brand1
3: 2013-09-01 09:07:00 1 NA NA NA NA brand2
4: 2013-09-01 09:08:00 1 2 NA NA NA brand3
5: 2013-09-01 09:11:00 1 2 2 NA NA brand5
6: 2013-09-01 09:14:00 1 2 2 2 NA brand2
7: 2013-09-01 09:26:00 1 1 2 2 NA brand6
8: 2013-09-01 09:26:00 1 1 2 2 NA brand2
9: 2013-09-01 09:29:00 2 1 3 3 3 brand2
10: 2013-09-01 09:32:00 2 1 3 3 3 brand4
Explanation
dat[.(lb = shoptime - hours(36), ub = shoptime), on = .(shoptime >= lb, shoptime < ub),
nomatch = 0L, by = .EACHI,
.SD[, .N, by = brand][, rank := frank(-N, ties.method="dense")]]
returns the aggregated results per 36 hours periods:
shoptime shoptime brand N rank
1: 2013-08-30 20:54:00 2013-09-01 08:54:00 brand1 1 1
2: 2013-08-30 21:07:00 2013-09-01 09:07:00 brand1 2 1
3: 2013-08-30 21:08:00 2013-09-01 09:08:00 brand1 2 1
4: 2013-08-30 21:08:00 2013-09-01 09:08:00 brand2 1 2
5: 2013-08-30 21:11:00 2013-09-01 09:11:00 brand1 2 1
6: 2013-08-30 21:11:00 2013-09-01 09:11:00 brand2 1 2
7: 2013-08-30 21:11:00 2013-09-01 09:11:00 brand3 1 2
8: 2013-08-30 21:14:00 2013-09-01 09:14:00 brand1 2 1
9: 2013-08-30 21:14:00 2013-09-01 09:14:00 brand2 1 2
10: 2013-08-30 21:14:00 2013-09-01 09:14:00 brand3 1 2
11: 2013-08-30 21:14:00 2013-09-01 09:14:00 brand5 1 2
12: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand1 2 1
13: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand2 2 1
14: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand3 1 2
15: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand5 1 2
16: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand1 2 1
17: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand2 2 1
18: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand3 1 2
19: 2013-08-30 21:26:00 2013-09-01 09:26:00 brand5 1 2
20: 2013-08-30 21:29:00 2013-09-01 09:29:00 brand1 2 2
21: 2013-08-30 21:29:00 2013-09-01 09:29:00 brand2 3 1
22: 2013-08-30 21:29:00 2013-09-01 09:29:00 brand3 1 3
23: 2013-08-30 21:29:00 2013-09-01 09:29:00 brand5 1 3
24: 2013-08-30 21:29:00 2013-09-01 09:29:00 brand6 1 3
25: 2013-08-30 21:32:00 2013-09-01 09:32:00 brand1 2 2
26: 2013-08-30 21:32:00 2013-09-01 09:32:00 brand2 4 1
27: 2013-08-30 21:32:00 2013-09-01 09:32:00 brand3 1 3
28: 2013-08-30 21:32:00 2013-09-01 09:32:00 brand5 1 3
29: 2013-08-30 21:32:00 2013-09-01 09:32:00 brand6 1 3
shoptime shoptime brand N rank
Then, this intermediate result is reshaped from long to wide format:
dat[.(lb = shoptime - hours(36), ub = shoptime), on = .(shoptime >= lb, shoptime < ub),
nomatch = 0L, by = .EACHI,
.SD[, .N, by = brand][, rank := frank(-N, ties.method="dense")]][
, dcast(unique(.SD[, -1]), shoptime ~ brand, value.var = "rank")]
shoptime brand1 brand2 brand3 brand5 brand6
1: 2013-09-01 08:54:00 1 NA NA NA NA
2: 2013-09-01 09:07:00 1 NA NA NA NA
3: 2013-09-01 09:08:00 1 2 NA NA NA
4: 2013-09-01 09:11:00 1 2 2 NA NA
5: 2013-09-01 09:14:00 1 2 2 2 NA
6: 2013-09-01 09:26:00 1 1 2 2 NA
7: 2013-09-01 09:29:00 2 1 3 3 3
8: 2013-09-01 09:32:00 2 1 3 3 3
The final right join with the original dat
data frame completes the missing rows and columns (see code and result above).
Data
dat <- data.frame(
shoptime = c("2013-09-01 08:35:00 UTC", "2013-09-01 08:54:00 UTC", "2013-09-01 09:07:00 UTC" ,"2013-09-01 09:08:00 UTC", "2013-09-01 09:11:00 UTC", "2013-09-01 09:14:00 UTC",
"2013-09-01 09:26:00 UTC", "2013-09-01 09:26:00 UTC" ,"2013-09-01 09:29:00 UTC", "2013-09-01 09:32:00 UTC"),
brand = c("brand1", "brand1", "brand2", "brand3", "brand5", "brand2", "brand6", "brand2" , "brand2" , "brand4" ),
stringsAsFactors = FALSE)