Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
557 views
in Technique[技术] by (71.8m points)

dataframe - R dplyr subset with missing columns

I have the following code and would like to select columns into a new data.frame.

library(dplyr)
df = data.frame(
    Manhattan=c(1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0), 
    Brooklyn=c(0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0), 
    The_Bronx=c(1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0), 
    Staten_Island=c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0), 
    "2012"=c("P", "P", "P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), 
    "2013"=c("P", "P", "P", "P", "P", "P", "P", "P", "Q", "Q", "P", "P", "P", "P", "Q", "Q", "Q", "Q", "Q"), 
    "2014"=c("P", "P", "P", "Q", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "P", "Q", "P", "P", "P", "Q", "Q"), 
    "2015"=c("P", "P", "P", "P", "P", "Q", "Q", "Q", "P", "Q", "P", "P", "Q", "Q", "Q", "Q", "Q", "Q", "Q"), check.names=FALSE)
df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx"))

This throws the error:

Error in [.data.frame`(x, r, vars, drop = drop) : 
   undefined columns selected

Because the column "Queens" is missing from df. How can I can override the error, so that R proceeds to create df2 with columns "Manhattan" and "The_Bronx" only?

Very important: My real data have hundreds of columns, so it is not doable to manually remove columns like "Queens" from the command df2 <- subset(df, select = c("Manhattan", "Queens", "The_Bronx")) (unless there is a trick for that?). Is there a way to solve this? Thank you.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In base R, you can use intersect to select only the names which are present.

cols <- c("Manhattan", "Queens", "The_Bronx")
subset(df, select = intersect(names(df), cols))

#   Manhattan The_Bronx
#1          1         1
#2          1         1
#3          0         0
#4          1         0
#5          1         0
#6          1         0
#7          1         0
#8          0         0
#...
#....

Or use any_of in dplyr :

library(dplyr)
df %>% select(tidyselect::any_of(cols))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...