Here are a few approaches:
movies <- data.frame(genre_list = I(list(
c("drama", "action", "romance"),
c("crime", "drama"),
c("crime", "drama", "mystery"),
c("thriller", "indie"),
c("thriller"),
c("drama", "family"))))
Update, years later....
You can use the mtabulate
function from "qdapTools" or the unexported charMat
function from my "splitstackshape" package.
Syntax would be:
library(qdapTools)
mtabulate(movies$genre_list)
# action crime drama family indie mystery romance thriller
# 1 1 0 1 0 0 0 1 0
# 2 0 1 1 0 0 0 0 0
# 3 0 1 1 0 0 1 0 0
# 4 0 0 0 0 1 0 0 1
# 5 0 0 0 0 0 0 0 1
# 6 0 0 1 1 0 0 0 0
or
splitstackshape:::charMat(movies$genre_list, fill = 0)
# action crime drama family indie mystery romance thriller
# [1,] 1 0 1 0 0 0 1 0
# [2,] 0 1 1 0 0 0 0 0
# [3,] 0 1 1 0 0 1 0 0
# [4,] 0 0 0 0 1 0 0 1
# [5,] 0 0 0 0 0 0 0 1
# [6,] 0 0 1 1 0 0 0 0
Update: A couple of more direct approaches
Improved option 1: Use table
somewhat directly:
table(rep(1:nrow(movies), sapply(movies$genre_list, length)),
unlist(movies$genre_list, use.names=FALSE))
Improved option 2: Use a for
loop.
x <- unique(unlist(movies$genre_list, use.names=FALSE))
m <- matrix(0, ncol = length(x), nrow = nrow(movies), dimnames = list(NULL, x))
for (i in 1:nrow(m)) {
m[i, movies$genre_list[[i]]] <- 1
}
m
Below is the OLD answer
Convert the list to a list of table
s (in turn converted to data.frame
s):
tables <- lapply(seq_along(movies$genre_list), function(x) {
temp <- as.data.frame.table(table(movies$genre_list[[x]]))
names(temp) <- c("Genre", paste("Record", x, sep = "_"))
temp
})
Use Reduce
to merge
the resulting list. If I understand your end goal correctly, this results in the transposed form of the result you are interested in.
merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables)
merged_tables
# Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6
# 1 action 1 NA NA NA NA NA
# 2 drama 1 1 1 NA NA 1
# 3 romance 1 NA NA NA NA NA
# 4 crime NA 1 1 NA NA NA
# 5 mystery NA NA 1 NA NA NA
# 6 indie NA NA NA 1 NA NA
# 7 thriller NA NA NA 1 1 NA
# 8 family NA NA NA NA NA 1
Transposing and converting NA
to 0
is pretty straightforward. Just drop the first column and re-use it as the column names
for the new data.frame
movie_genres <- setNames(data.frame(t(merged_tables[-1])), merged_tables[[1]])
movie_genres[is.na(movie_genres)] <- 0
movie_genres