I have a dataset from a PostgreSQL
DB, which I have two columns:
id
= the ID of a participant
time_stamp
= the time stamp of the recorded measurement
I need to work with dbplyr
in order to mutate a new column based on the sequence of time_stamp
. In other words, if the time_stamp
is in sequence (meaning that it is in one minute intervals), this is recognized as one event.
For example, this is my dataset:
library(dplyr)
library(dbplyr)
library(lubridate)
mf <- memdb_frame(
id = "id001",
time_stamp = c(
seq(from = as_datetime("2021-01-01 08:00:00"), to = as_datetime("2021-01-01 08:03:00"), by = "1 min"),
seq(from = as_datetime("2021-01-01 08:05:00"), to = as_datetime("2021-01-01 08:08:00"), by = "1 min"),
seq(from = as_datetime("2021-01-01 08:12:00"), to = as_datetime("2021-01-01 08:18:00"), by = "1 min")
)
)
mf %>%
collect() %>%
mutate(time_stamp = as_datetime(time_stamp))
#> # A tibble: 15 x 2
#> id time_stamp
#> <chr> <dttm>
#> 1 id001 2021-01-01 08:00:00
#> 2 id001 2021-01-01 08:01:00
#> 3 id001 2021-01-01 08:02:00
#> 4 id001 2021-01-01 08:03:00
#> 5 id001 2021-01-01 08:05:00
#> 6 id001 2021-01-01 08:06:00
#> 7 id001 2021-01-01 08:07:00
#> 8 id001 2021-01-01 08:08:00
#> 9 id001 2021-01-01 08:12:00
#> 10 id001 2021-01-01 08:13:00
#> 11 id001 2021-01-01 08:14:00
#> 12 id001 2021-01-01 08:15:00
#> 13 id001 2021-01-01 08:16:00
#> 14 id001 2021-01-01 08:17:00
#> 15 id001 2021-01-01 08:18:00
Now, I need to recognize the events. That means, find time_stamps
that happened in sequence (sequence = 1 minute interval). As an example, this would be my expected output:
#> # A tibble: 15 x 3
#> id time_stamp events
#> <chr> <dttm> <chr>
#> 1 id001 2021-01-01 08:00:00 event_1
#> 2 id001 2021-01-01 08:01:00 event_1
#> 3 id001 2021-01-01 08:02:00 event_1
#> 4 id001 2021-01-01 08:03:00 event_1
#> 5 id001 2021-01-01 08:05:00 event_2
#> 6 id001 2021-01-01 08:06:00 event_2
#> 7 id001 2021-01-01 08:07:00 event_2
#> 8 id001 2021-01-01 08:08:00 event_2
#> 9 id001 2021-01-01 08:12:00 event_3
#> 10 id001 2021-01-01 08:13:00 event_3
#> 11 id001 2021-01-01 08:14:00 event_3
#> 12 id001 2021-01-01 08:15:00 event_3
#> 13 id001 2021-01-01 08:16:00 event_3
#> 14 id001 2021-01-01 08:17:00 event_3
#> 15 id001 2021-01-01 08:18:00 event_3
Note that from row 4
to 5
there was an interval of 2 minutes – causing the next event to start. Same thing from row 8
to 9
: there was an interval of 4 minutes and then the next event started.
PS: I need it to work fully in dbplyr
, that is: without using collect()
Any ideas would be very appreciated!
Thank you!
question from:
https://stackoverflow.com/questions/65835364/mutate-new-column-based-on-sequence-of-time-stamp-in-dbplyr 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…