Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
125 views
in Technique[技术] by (71.8m points)

What is the best way to turn a text variable that contains many variables into separate variables in R?

I am working with a dataset where one variable has a number of variables within it. I need to parse through the text to extract each variable value in R.

Current:

|-------------|-------------|-----------------------------------------|
|  Customer   |    Group    |    Extra                                |
|-------------|-------------|-----------------------------------------|
|     1       |    A1       | {"Field1":"A","Field2":"B","Field3":"C"}|
|-------------|-------------|-----------------------------------------|
|     2       |    A2       | {"Field1":"D","Field2":"E","Field3":"F"}|
|-------------|-------------|-----------------------------------------|
|     3       |    A3       | {"Field1":"A","Field2":"G","Field3":"D"}|
|-------------|-------------|-----------------------------------------|

Desired:

|-------------|-------------|------------|-----------|-----------|
|  Customer   |    Group    |    Field1  |   Field2  |   Field3  |
|-------------|-------------|------------|-----------|-----------|
|     1       |    A1       |    A       |   B       |   C       |
|-------------|-------------|------------|-----------|-----------|
|     2       |    A2       |    D       |   E       |   F       |
|-------------|-------------|------------|-----------|-----------|
|     3       |    A3       |    A       |   G       |   D       |
|-------------|-------------|------------|-----------|-----------|

This seems like it should be easy, but I haven't been able to find a solution. Thank you for any help you can provide! I'm still new to R :)

question from:https://stackoverflow.com/questions/65867301/what-is-the-best-way-to-turn-a-text-variable-that-contains-many-variables-into-s

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

We can perform some data cleaning, split the data on comma (,) into separate rows and different columns on colon (:) and get the data in wide format.

library(dplyr)
library(tidyr)

df %>%
  mutate(Extra = gsub('[{}]', '', Extra)) %>%
  separate_rows(Extra, sep = ',') %>%
  separate(Extra, c('col', 'value'), sep = ':') %>%
  pivot_wider(names_from = col, values_from = value)

#  Customer Group Field1 Field2 Field3
#     <int> <chr> <chr>  <chr>  <chr> 
#1        1 A1    A      B      C     
#2        2 A2    D      E      F     
#3        3 A3    A      G      D     

data

df <- structure(list(Customer = 1:3, Group = c("A1", "A2", "A3"), 
Extra = c("{Field1:A,Field2:B,Field3:C}", 
"{Field1:D,Field2:E,Field3:F}", "{Field1:A,Field2:G,Field3:D}"
)), class = "data.frame", row.names = c(NA, -3L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...