Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
388 views
in Technique[技术] by (71.8m points)

r - dplyr filter with condition on multiple columns

Here's a dummy data :

father<- c(1, 1, 1, 1, 1)
mother<- c(1, 1, 1, NA, NA) 
children <- c(NA, NA, 2, 5, 2) 
cousins   <- c(NA, 5, 1, 1, 4) 


dataset <- data.frame(father, mother, children, cousins)  
dataset


father  mother  children cousins
1      1       NA      NA
1      1       NA       5
1      1        2       1
1     NA        5       1
1     NA        2       4

I want to filter this row :

  father  mother  children cousins
    1      1       NA      NA

I can do it with :

test <- dataset %>% 
filter(father==1 & mother==1) %>%
filter (is.na(children)) %>%
filter (is.na(cousins))
test  

My question : I have many columns like grand father, uncle1, uncle2, uncle3 and I want to avoid something like that:

  filter (is.na(children)) %>%
  filter (is.na(cousins)) %>%
  filter (is.na(uncle1)) %>%
  filter (is.na(uncle2)) %>%
  filter (is.na(uncle3)) 
  and so on...

How can I use dplyr to say filter all the column with na (except father==1 & mother==1)

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

A possible dplyr(0.5.0.9004 <= version < 1.0) solution is:

# > packageVersion('dplyr')
# [1] ‘0.5.0.9004’

dataset %>%
    filter(!is.na(father), !is.na(mother)) %>%
    filter_at(vars(-father, -mother), all_vars(is.na(.)))

Explanation:

  • vars(-father, -mother): select all columns except father and mother.
  • all_vars(is.na(.)): keep rows where is.na is TRUE for all the selected columns.

note: any_vars should be used instead of all_vars if rows where is.na is TRUE for any column are to be kept.


Update (2020-11-28)

As the _at functions and vars have been superseded by the use of across since dplyr 1.0, the following way (or similar) is recommended now:

dataset %>%
    filter(across(c(father, mother), ~ !is.na(.x))) %>%
    filter(across(c(-father, -mother), is.na))

See more example of across and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise") in R after installing the latest version of dplyr.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...