Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

r - How to sort a matrix/data.frame by all columns

I have a matrix, e.g.:

a = rep(0:1, each=4)
b = rep(rep(0:1, each=2), 2)
c = rep(0:1, times=4)
mat = cbind(c,b,a)

I need to sort all columns of this matrix. I know how to do this by sorting specific columns (i.e. a limited number of columns).

mat[order(mat[,"c"],mat[,"b"],mat[,"a"]),]
     c b a
[1,] 0 0 0
[2,] 0 0 1
[3,] 0 1 0
[4,] 0 1 1
[5,] 1 0 0
[6,] 1 0 1
[7,] 1 1 0
[8,] 1 1 1

However, I need a generic way of doing this without calling any column names, because I could have any number of columns. How can I sort by a large number of columns?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Here's a concise solution:

mat[do.call(order,as.data.frame(mat)),];
##      c b a
## [1,] 0 0 0
## [2,] 0 0 1
## [3,] 0 1 0
## [4,] 0 1 1
## [5,] 1 0 0
## [6,] 1 0 1
## [7,] 1 1 0
## [8,] 1 1 1

The call to as.data.frame() converts the matrix to a data.frame in the intuitive way, i.e. each matrix column becomes a list component in the new data.frame. From that, you can effectively pass each matrix column to a single invocation of order() by passing the listified form of the matrix as the second argument of do.call().

This will work for any number of columns.


It's not a dumb question. The reason that mat[order(as.data.frame(mat)),] does not work is because order() does not order data.frames by row.

Instead of returning a row order for the data.frame based on ordering the column vectors from left to right (which is what my solution does), it basically flattens the data.frame to a single big vector and orders that.

So, in fact, order(as.data.frame(mat)) is equivalent to order(mat), as a matrix is treated as a flat vector as well.

For your particular data, this returns 24 indexes, which could theoretically be used to index (as a vector) the original matrix mat, but since in the expression mat[order(as.data.frame(mat)),] you're trying to use them to index just the row dimension of mat, some of the indexes are past the highest row index, so you get a "subscript out of bounds" error.

See ?do.call.

I don't think I can explain it better than the help page; take a look at the examples, play with them until you get how it works. Basically, you need to call it when the arguments you want to pass to a single invocation of a function are trapped inside a list.

You can't pass the list itself (because then you're not passing the intended arguments, you're passing a list containing the intended arguments), so there must be a primitive function that "unwraps" the arguments from the list for the function call.

This is a common primitive in programming languages where functions are first-class objects, notably (besides R's do.call()) JavaScript's apply(), Python's (deprecated) apply(), and vim's call().


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...