r - Converting a column of type 'list' to multiple columns in a data frame

Question

Welcome To Ask or Share your Answers For Others

r - Converting a column of type 'list' to multiple columns in a data frame

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Converting a column of type 'list' to multiple columns in a data frame

I have a data frame with one column which is a list, like so:

>head(movies$genre_list)
[[1]]
[1] "drama"   "action"  "romance"
[[2]]
[1] "crime" "drama"
[[3]]
[1] "crime"   "drama"   "mystery"
[[4]]
[1] "thriller" "indie"  
[[5]]
[1] "thriller"
[[6]]
[1] "drama"  "family"

I want to convert this one column to multiple columns, one for each unique element across the lists (in this case, genres), and have them as binary columns. I'm looking for an elegant solution, which doesn't involve first finding out how many genres are there, and then creating a column for each, and then checking each list element to then populate the genre columns. I tried unlist, but it doesn't work with a vector of lists in the way I want.

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:27:29+0000

Here are a few approaches:

movies <- data.frame(genre_list = I(list(
   c("drama",   "action",  "romance"),
   c("crime", "drama"),
   c("crime",   "drama",   "mystery"),
   c("thriller", "indie"),  
   c("thriller"),
   c("drama",  "family"))))

Update, years later....

You can use the mtabulate function from "qdapTools" or the unexported charMat function from my "splitstackshape" package.

Syntax would be:

library(qdapTools)
mtabulate(movies$genre_list)
#   action crime drama family indie mystery romance thriller
# 1      1     0     1      0     0       0       1        0
# 2      0     1     1      0     0       0       0        0
# 3      0     1     1      0     0       1       0        0
# 4      0     0     0      0     1       0       0        1
# 5      0     0     0      0     0       0       0        1
# 6      0     0     1      1     0       0       0        0

or

splitstackshape:::charMat(movies$genre_list, fill = 0)
#      action crime drama family indie mystery romance thriller
# [1,]      1     0     1      0     0       0       1        0
# [2,]      0     1     1      0     0       0       0        0
# [3,]      0     1     1      0     0       1       0        0
# [4,]      0     0     0      0     1       0       0        1
# [5,]      0     0     0      0     0       0       0        1
# [6,]      0     0     1      1     0       0       0        0

Update: A couple of more direct approaches

Improved option 1: Use table somewhat directly:

table(rep(1:nrow(movies), sapply(movies$genre_list, length)), 
      unlist(movies$genre_list, use.names=FALSE))

Improved option 2: Use a for loop.

x <- unique(unlist(movies$genre_list, use.names=FALSE))
m <- matrix(0, ncol = length(x), nrow = nrow(movies), dimnames = list(NULL, x))
for (i in 1:nrow(m)) {
  m[i, movies$genre_list[[i]]] <- 1
}
m

Below is the OLD answer

Convert the list to a list of tables (in turn converted to data.frames):

tables <- lapply(seq_along(movies$genre_list), function(x) {
  temp <- as.data.frame.table(table(movies$genre_list[[x]]))
  names(temp) <- c("Genre", paste("Record", x, sep = "_"))
  temp
})

Use Reduce to merge the resulting list. If I understand your end goal correctly, this results in the transposed form of the result you are interested in.

merged_tables <- Reduce(function(x, y) merge(x, y, all = TRUE), tables)
merged_tables
#      Genre Record_1 Record_2 Record_3 Record_4 Record_5 Record_6
# 1   action        1       NA       NA       NA       NA       NA
# 2    drama        1        1        1       NA       NA        1
# 3  romance        1       NA       NA       NA       NA       NA
# 4    crime       NA        1        1       NA       NA       NA
# 5  mystery       NA       NA        1       NA       NA       NA
# 6    indie       NA       NA       NA        1       NA       NA
# 7 thriller       NA       NA       NA        1        1       NA
# 8   family       NA       NA       NA       NA       NA        1

Transposing and converting NA to 0 is pretty straightforward. Just drop the first column and re-use it as the column names for the new data.frame

movie_genres <- setNames(data.frame(t(merged_tables[-1])), merged_tables[[1]])
movie_genres[is.na(movie_genres)] <- 0
movie_genres

Categories

r - Converting a column of type 'list' to multiple columns in a data frame

r - Converting a column of type 'list' to multiple columns in a data frame

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Update, years later....

Update: A couple of more direct approaches

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags