r - Intersect all possible combinations of list elements

Question

Welcome To Ask or Share your Answers For Others

r - Intersect all possible combinations of list elements

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Intersect all possible combinations of list elements

I have a list of vectors:

> l <- list(A=c("one", "two", "three", "four"), B=c("one", "two"), C=c("two", "four", "five", "six"), D=c("six", "seven"))

> l
$A
[1] "one"   "two"   "three" "four"

$B
[1] "one" "two"

$C
[1] "two"  "four" "five" "six"

$D
[1] "six"   "seven"

I would like to calculate the length of the overlap between all possible pairwise combinations of the list elements, i.e. (the format of the result doesn't matter):

AintB 2
AintC 2
AintD 0
BintC 1
BintD 0
CintD 1

I know combn(x, 2) can be used to get a matrix of all possible pairwise combinations in a vector and that length(intersect(a, b)) would give me the length of the overlap of two vectors, but I can't think of a way to put the two things together.

Any help is much appreciated! Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-17T03:05:45+0000

If I understand correctly, you can look at crossprod and stack:

crossprod(table(stack(l)))
#    ind
# ind A B C D
#   A 4 2 2 0
#   B 2 2 1 0
#   C 2 1 4 1
#   D 0 0 1 2

You can extend the idea if you want a data.frame of just the relevant values as follows:

Write a spiffy function

listIntersect <- function(inList) {
  X <- crossprod(table(stack(inList)))
  X[lower.tri(X)] <- NA
  diag(X) <- NA
  out <- na.omit(data.frame(as.table(X)))
  out[order(out$ind), ]
}

Apply it

listIntersect(l)
#    ind ind.1 Freq
# 5    A     B    2
# 9    A     C    2
# 13   A     D    0
# 10   B     C    1
# 14   B     D    0
# 15   C     D    1

Performance seems pretty decent.

Expand the list:

L <- unlist(replicate(100, l, FALSE), recursive=FALSE)
names(L) <- make.unique(names(L))

Set up some functions to test:

fun1 <- function(l) listIntersect(l)
fun2 <- function(l) apply( combn( l , 2 ) , 2 , function(x) length( intersect( unlist( x[1]) , unlist(x[2]) ) ) )
fun3 <- function(l) {
  m1 <- combn(names(l),2)
  val <- sapply(split(m1, col(m1)),function(x) {x1 <- l[[x[1]]]; x2 <- l[[x[2]]]; length(intersect(x1, x2))})
  Ind <- apply(m1,2,paste,collapse="int")
  data.frame(Ind, val, stringsAsFactors=F) 
}

Check out the timings:

system.time(F1 <- fun1(L))
#    user  system elapsed 
#    0.33    0.00    0.33
system.time(F2 <- fun2(L))
#    user  system elapsed 
#    4.32    0.00    4.31 
system.time(F3 <- fun3(L))
#    user  system elapsed 
#    6.33    0.00    6.33

Everyone seems to be sorting the result differently, but the numbers match:

table(F1$Freq)
# 
#     0     1     2     4 
# 20000 20000 29900  9900 
table(F2)
# F2
#     0     1     2     4 
# 20000 20000 29900  9900 
table(F3$val)
# 
#     0     1     2     4 
# 20000 20000 29900  9900

Categories

r - Intersect all possible combinations of list elements

r - Intersect all possible combinations of list elements

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags