Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
311 views
in Technique[技术] by (71.8m points)

r - Select the first and last row by group in a data frame

How can I select the first and last row for each unique id in the following dataframe?

tmp <- structure(list(id = c(15L, 15L, 15L, 15L, 21L, 21L, 22L, 22L, 
22L, 23L, 23L, 23L, 24L, 24L, 24L, 24L), d = c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), gr = c(2L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L), mm = c(3.4, 
4.9, 4.4, 5.5, 4, 3.8, 4, 4.9, 4.6, 2.7, 4, 3, 3, 2, 4, 2), area = c(1L, 
2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 2L, 3L)), .Names = c("id", 
"d", "gr", "mm", "area"), class = "data.frame", row.names = c(NA, 
-16L))
tmp
#>    id d gr  mm area
#> 1  15 1  2 3.4    1
#> 2  15 1  1 4.9    2
#> 3  15 1  1 4.4    1
#> 4  15 1  1 5.5    2
#> 5  21 1  1 4.0    2
#> 6  21 1  2 3.8    2
#> 7  22 1  1 4.0    2
#> 8  22 1  1 4.9    2
#> 9  22 1  2 4.6    2
#> 10 23 1  1 2.7    2
#> 11 23 1  1 4.0    2
#> 12 23 1  2 3.0    2
#> 13 24 1  1 3.0    2
#> 14 24 1  1 2.0    3
#> 15 24 1  1 4.0    2
#> 16 24 1  2 2.0    3
Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A fast and short data.table solution :

tmp[, .SD[c(1,.N)], by=id]

where .SD represents each (S)ubset of (D)ata, .N is the number of rows in each group and tmp is a data.table; e.g. as provided by fread() by default or by converting a data.frame using setDT().

Note that if a group only contains one row, that row will appear twice in the output because that row is both the first and last row of that group. To avoid the repetition in that case, thanks to @Thell:

tmp[, .SD[unique(c(1,.N))], by=id]

Alternatively, the following makes the logic explicit for the .N==1 special case :

tmp[, if (.N==1) .SD else .SD[c(1,.N)], by=id]

You don't need .SD[1] in the first part of the if because in that case .N is 1 so .SD must be just one row anyway.

You can wrap j in {} and have a whole page of code inside {} if you like. Just as long as the last expression inside {} returns a list- like object to be stacked (such as a plain list, data.table or data.frame).

tmp[, { ...; if (.N==1) .SD else .SD[c(1,.N)] } , by=id]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...