statistics - Reordering groups of rows in R

Question

Welcome To Ask or Share your Answers For Others

statistics - Reordering groups of rows in R

asked Jan 27, 2021 in Technique[技术] by 深蓝 (71.8m points)

statistics - Reordering groups of rows in R

I'm doing some user studies and everyone is given parts in different orders. The data looks a little like this. Currently importing all into a datatable:

Question 3
1        6
2        9

Question 1
1        2
3        5

Question 2
2        5
1        2

I now have multiple CSV files with each 'question' row in different orders. Can I search for the 'question' rows, group them with everything between each 'question' and put them in order? There will be multiple lines between, and we don't know how many exactly. So it would look like this


Question 1
1        2
3        5 

Question 2
2        5
1        2

Question 3
1        6
2        9

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-01-27T03:53:03+0000

Parsing the file is your first step, ordering from there should be rather direct with order or dplyr::arrange.

txt <- readLines("quux.txt")
txt
# [1] "Question 3" "1        6" "2        9" "Question 1" "1        2" "3        5" "Question 2" "2        5" "1        2"

lst_of_frames <- lapply(
  split(txt, cumsum(grepl("^Question", txt))),
  function(z) {
    out <- read.table(header = FALSE, text = z[-1])
    cbind(question = z[1], out)
  })
lst_of_frames
# $`1`
#     question V1 V2
# 1 Question 3  1  6
# 2 Question 3  2  9
# $`2`
#     question V1 V2
# 1 Question 1  1  2
# 2 Question 1  3  5
# $`3`
#     question V1 V2
# 1 Question 2  2  5
# 2 Question 2  1  2

We now have a list of multiple frames. If you want them combined, then multiple options exist:

results <- do.call(rbind, lst_of_frames)
results
#       question V1 V2
# 1.1 Question 3  1  6
# 1.2 Question 3  2  9
# 2.1 Question 1  1  2
# 2.2 Question 1  3  5
# 3.1 Question 2  2  5
# 3.2 Question 2  1  2
dplyr::bind_rows(lst_of_frames)        # similar results
data.table::rbindlist(lst_of_frames)   # similar results

I'll use the results from the first, and then order with

results[order(results$question, results$V1),]
#       question V1 V2
# 2.1 Question 1  1  2
# 2.2 Question 1  3  5
# 3.2 Question 2  1  2
# 3.1 Question 2  2  5
# 1.1 Question 3  1  6
# 1.2 Question 3  2  9
dplyr::arrange(results, question, V1)  # similar results

Note: this is sensitive to the number of columns within each question. If there are different number of columns ...

Question 3
1        6
2        9

Question 1
1        2       10
3        5       11

Question 2
2        5
1        2

Then you have some options.

Keep it wide. The simple base R do.call(rbind,...) no longer works as easily:

do.call(rbind, lst_of_frames)
# Error in rbind(deparse.level, ...) : 
#   numbers of columns of arguments do not match

But the others work fine:

dplyr::bind_rows(lst_of_frames)
#     question V1 V2 V3
# 1 Question 3  1  6 NA
# 2 Question 3  2  9 NA
# 3 Question 1  1  2 10
# 4 Question 1  3  5 11
# 5 Question 2  2  5 NA
# 6 Question 2  1  2 NA
data.table::rbindlist(lst_of_frames, fill = TRUE)  # similar results

Pivot to long. (This is a "wide-vs-long" data discussion.)

dplyr::bind_rows(lapply(lst_of_frames, function(z) tidyr::pivot_longer(z, -question)))
# # A tibble: 14 x 3
#    question   name  value
#    <chr>      <chr> <int>
#  1 Question 3 V1        1
#  2 Question 3 V2        6
#  3 Question 3 V1        2
#  4 Question 3 V2        9
#  5 Question 1 V1        1
#  6 Question 1 V2        2
#  7 Question 1 V3       10
#  8 Question 1 V1        3
#  9 Question 1 V2        5
# 10 Question 1 V3       11
# 11 Question 2 V1        2
# 12 Question 2 V2        5
# 13 Question 2 V1        1
# 14 Question 2 V2        2

# similar results
library(data.table)
rbindlist(lapply(lst_of_frames, function(z) melt(as.data.table(z), id = "question")))

This method has several advantages in other realms (e.g., ggplot2, tidy data management, easy summarization, etc).

Categories

statistics - Reordering groups of rows in R

statistics - Reordering groups of rows in R

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags