Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
728 views
in Technique[技术] by (71.8m points)

r - Transform from Wide to Long without sorting columns

I want to convert a dataframe from wide format to long format.

Here it is a toy example:

mydata <- data.frame(ID=1:5, ZA_1=1:5, 
            ZA_2=5:1,BB_1=rep(3,5),BB_2=rep(6,5),CC_7=6:2)

ID ZA_1 ZA_2 BB_1 BB_2 CC_7
1    1    5    3    6    6
2    2    4    3    6    5
3    3    3    3    6    4
4    4    2    3    6    3
5    5    1    3    6    2

There are some variables that will remain as is (here only ID) and some that will be transformed to long format (here all other variables, all ending with _1, _2 or _7)

In order to transform it to long format I'm using data.table melt and dcast, a generic way able to detect the variables automatically. Other solutions are welcome too.

library(data.table)
setDT(mydata)
idvars =  grep("_[1-7]$",names(mydata) , invert = TRUE)
temp <- melt(mydata, id.vars = idvars)  
nuevo <- dcast(
  temp[, `:=`(var = sub("_[1-7]$", '', variable),
  measure = sub('.*_', '', variable), variable = NULL)],  
  ... ~ var, value.var='value') 



ID measure BB  CC  ZA
 1      1   3  NA   1
 1      2   6  NA   5
 1      7  NA   6  NA
 2      1   3  NA   2
 2      2   6  NA   4
 2      7  NA   5  NA
 3      1   3  NA   3
 3      2   6  NA   3
 3      7  NA   4  NA
 4      1   3  NA   4
 4      2   6  NA   2
 4      7  NA   3  NA
 5      1   3  NA   5
 5      2   6  NA   1
 5      7  NA   2  NA

As you can see the columns are reoredered alphabetically, but I would prefer to keep the original order as far as possible, for example taking into account the order of the first appearance of the variable.

ID ZA_1 ZA_2 BB_1 BB_2 CC_7

Should be

ID ZA BB CC

I don't mind if the idvars columns come alltogether at the beginning or if they also stay in their original position.

ID ZA_1 ZA_2 TEMP BB_1 BB_2 CC_2 CC_1

would be

ID ZA TEMP BB CC

or

ID TEMP ZA BB CC

I prefer the last option.

Another problem is that everything gets transformed to character.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You can melt several columns simultaneously, if you pass a list of column names to the argument measure =. One approach to do this in a scalable manner would be to:

  1. Extract the column names and the corresponding first two letters:

    measurevars <- names(mydata)[grepl("_[1-9]$",names(mydata))]
    groups <- gsub("_[1-9]$","",measurevars)
    
  2. Turn groups into a factor object and make sure levels aren't ordered alphabetically. We'll use this in the next step to create a list object with the correct structure.

    split_on <- factor(groups, levels = unique(groups))
    
  3. Create a list using measurevars with split(), and create vector for the value.name = argument in melt().

    measure_list <- split(measurevars, split_on)
    measurenames <- unique(groups)
    

Bringing it all together:

melt(setDT(mydata), 
     measure = measure_list, 
     value.name = measurenames,
     variable.name = "measure")
#    ID measure ZA BB
# 1:  1       1  1  3
# 2:  2       1  2  3
# 3:  3       1  3  3
# 4:  4       1  4  3
# 5:  5       1  5  3
# 6:  1       2  5  6
# 7:  2       2  4  6
# 8:  3       2  3  6
# 9:  4       2  2  6
#10:  5       2  1  6

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...