Left join the input DF
to itself joining on other persons in the same household and on the overlap condition. Then group by row concatenating the matched persons into a comma separated string.
In the absence of an explanation of what constitutes overlap we try three different definitions of overlap. The third is the closest to the output shown in the question.
if end_time < start_time
then everything before end_time
and after start_time
are in the interval to be checked for overlap. The overlap condition then decomposes into 4 cases according to whether the left and right hand sides of the join satisfy this or not.
if start_time > end_time
on either the left or right hand side then we regard the two as not overlapping
If end_time > start_time then reverse them and perform overlap as before.
First overlap definition of overlap
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
when a.start_time <= a.end_time and b.start_time > b.end_time then
not (a.start_time between b.end_time and b.start_time and
a.end_time between b.end_time and b.start_time)
when a.start_time > a.end_time and b.start_time <= b.end_time then
not (b.start_time between a.end_time and a.start_time and
b.end_time between a.end_time and a.start_time)
else 1 end)
group by a.rowid")
giving:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1,4
3 1 3 22:45:00 23:45:00 4
4 1 4 08:45:00 01:45:00 2,3
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
Second overlap definition of overlap
library(sqldf)
sqldf("select a.*, group_concat(distinct b.person) as overlap
from DF a
left join DF b
on a.household = b.household and
a.person != b.person and
(case
when a.start_time <= a.end_time and b.start_time <= b.end_time then
(a.start_time between b.start_time and b.end_time or
b.start_time between a.start_time and a.end_time)
else 0 end)
group by a.rowid")
giving:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 <NA>
5 1 1 06:45:00 19:45:00 2
6 2 1 07:45:00 21:45:00 2
7 2 2 016:45:00 22:45:00 1
Third definition of overlap
sqldf("with DF2(rowid, household, person, start_time, end_time, st, en) as (
select rowid, *,
min(start_time, end_time) as st,
max(start_time, end_time) as en
from DF)
select a.household, a.person, a.start_time, a.end_time,
group_concat(distinct b.person) as overlap
from DF2 a
left join DF2 b
on a.household = b.household and
a.person != b.person and
(a.st between b.st and b.en or
b.st between a.st and a.en)
group by a.rowid")
giving:
household person start_time end_time overlap
1 1 1 07:45:00 21:45:00 2,4
2 1 2 09:45:00 17:45:00 1
3 1 3 22:45:00 23:45:00 <NA>
4 1 4 08:45:00 01:45:00 1
5 1 1 06:45:00 19:45:00 2,4
6 2 1 07:45:00 21:45:00 2
7 2 2 16:45:00 22:45:00 1
Note
We assume that the input DF
in reproducible form is:
DF <- structure(list(household = c(1L, 1L, 1L, 1L, 1L, 2L, 2L), person = c(1L,
2L, 3L, 4L, 1L, 1L, 2L), start_time = c("07:45:00", "09:45:00",
"22:45:00", "08:45:00", "06:45:00", "07:45:00", "16:45:00"),
end_time = c("21:45:00", "17:45:00", "23:45:00", "01:45:00",
"19:45:00", "21:45:00", "22:45:00")), class = "data.frame", row.names = c(NA,
-7L))