Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
374 views
in Technique[技术] by (71.8m points)

r - Subset data frame to include only levels of one factor that have values in both levels of another factor

I am working with a data frame that deals with numeric measurements. Some individuals have been measured several times, both as juveniles and adults. A reproducible example:

ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)

# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)

My goal is to subset that data frame by selecting the IDs that appear at least once as a juvenile and at least once as an adult. Not sure how to do that..?

The resulting dataframe would contain all measurements for individuals a1, a2 and a3, but would exclude a4, a5 and a6, as they were not measured at both stages.

A similar question was asked 7 months ago but never had an answer (Subset data frame to include only levels one factor that have values in both levels of another factor)

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

With dplyr, you can use group_by %>% filter:

library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))

# A tibble: 7 x 3
# Groups:   ID [3]
#      ID      age       size
#  <fctr>   <fctr>      <dbl>
#1     a1 juvenile -0.6947697
#2     a2 juvenile -0.3665272
#3     a3 juvenile  1.0293555
#4     a1 juvenile  0.2745224
#5     a2    adult  0.5299029
#6     a1    adult  2.2247802
#7     a3    adult -0.4717160

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...