Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

statistics - Reordering groups of rows in R

I'm doing some user studies and everyone is given parts in different orders. The data looks a little like this. Currently importing all into a datatable:

Question 3
1        6
2        9

Question 1
1        2
3        5

Question 2
2        5
1        2

I now have multiple CSV files with each 'question' row in different orders. Can I search for the 'question' rows, group them with everything between each 'question' and put them in order? There will be multiple lines between, and we don't know how many exactly. So it would look like this


Question 1
1        2
3        5 

Question 2
2        5
1        2

Question 3
1        6
2        9


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Parsing the file is your first step, ordering from there should be rather direct with order or dplyr::arrange.

txt <- readLines("quux.txt")
txt
# [1] "Question 3" "1        6" "2        9" "Question 1" "1        2" "3        5" "Question 2" "2        5" "1        2"

lst_of_frames <- lapply(
  split(txt, cumsum(grepl("^Question", txt))),
  function(z) {
    out <- read.table(header = FALSE, text = z[-1])
    cbind(question = z[1], out)
  })
lst_of_frames
# $`1`
#     question V1 V2
# 1 Question 3  1  6
# 2 Question 3  2  9
# $`2`
#     question V1 V2
# 1 Question 1  1  2
# 2 Question 1  3  5
# $`3`
#     question V1 V2
# 1 Question 2  2  5
# 2 Question 2  1  2

We now have a list of multiple frames. If you want them combined, then multiple options exist:

results <- do.call(rbind, lst_of_frames)
results
#       question V1 V2
# 1.1 Question 3  1  6
# 1.2 Question 3  2  9
# 2.1 Question 1  1  2
# 2.2 Question 1  3  5
# 3.1 Question 2  2  5
# 3.2 Question 2  1  2
dplyr::bind_rows(lst_of_frames)        # similar results
data.table::rbindlist(lst_of_frames)   # similar results

I'll use the results from the first, and then order with

results[order(results$question, results$V1),]
#       question V1 V2
# 2.1 Question 1  1  2
# 2.2 Question 1  3  5
# 3.2 Question 2  1  2
# 3.1 Question 2  2  5
# 1.1 Question 3  1  6
# 1.2 Question 3  2  9
dplyr::arrange(results, question, V1)  # similar results

Note: this is sensitive to the number of columns within each question. If there are different number of columns ...

Question 3
1        6
2        9

Question 1
1        2       10
3        5       11

Question 2
2        5
1        2

Then you have some options.

  1. Keep it wide. The simple base R do.call(rbind,...) no longer works as easily:

    do.call(rbind, lst_of_frames)
    # Error in rbind(deparse.level, ...) : 
    #   numbers of columns of arguments do not match
    

    But the others work fine:

    dplyr::bind_rows(lst_of_frames)
    #     question V1 V2 V3
    # 1 Question 3  1  6 NA
    # 2 Question 3  2  9 NA
    # 3 Question 1  1  2 10
    # 4 Question 1  3  5 11
    # 5 Question 2  2  5 NA
    # 6 Question 2  1  2 NA
    data.table::rbindlist(lst_of_frames, fill = TRUE)  # similar results
    
  2. Pivot to long. (This is a "wide-vs-long" data discussion.)

    dplyr::bind_rows(lapply(lst_of_frames, function(z) tidyr::pivot_longer(z, -question)))
    # # A tibble: 14 x 3
    #    question   name  value
    #    <chr>      <chr> <int>
    #  1 Question 3 V1        1
    #  2 Question 3 V2        6
    #  3 Question 3 V1        2
    #  4 Question 3 V2        9
    #  5 Question 1 V1        1
    #  6 Question 1 V2        2
    #  7 Question 1 V3       10
    #  8 Question 1 V1        3
    #  9 Question 1 V2        5
    # 10 Question 1 V3       11
    # 11 Question 2 V1        2
    # 12 Question 2 V2        5
    # 13 Question 2 V1        1
    # 14 Question 2 V2        2
    
    # similar results
    library(data.table)
    rbindlist(lapply(lst_of_frames, function(z) melt(as.data.table(z), id = "question")))
    

    This method has several advantages in other realms (e.g., ggplot2, tidy data management, easy summarization, etc).



与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...