Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
828 views
in Technique[技术] by (71.8m points)

r - Subsetting from a Data Frame

I'm early in the process of learning R. Say I have a data frame with a column named "Gender". If I want to retrieve all rows where Gender is "female" there are at least two ways I can do this:

FemaleSmokers <- df[df$Gender=="female", , drop = FALSE]
FemaleSmokers <- subset(df, Gender=="female")

1) Is there a best practice on when to use one over the other? 2) In the first approach, why do I need to preface the column with the name of the data frame when R should know which data frame I working with.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Hope this worked out example will help you

df<-data.frame( Name = c("mark", "joe", "cathy","zoya"), 
               Gender = c("Male","Male","Female", "Female"))
  Name Gender
1  mark   Male
2   joe   Male
3 cathy Female
4  zoya Female

subsetting of a dataframe (df) is done by 
df[row,column] 
For example, df[1:2,1:2]
 Name Gender
1 mark   Male
2  joe   Male

In your case, we are evaluating a condition on the dataframe
# both are valid
df[df$Gender == "Female",] or  df[df[,2] == "Female",] 

which is nothing but indexing the df as

df[c(3,4),] or df[c(FALSE,FALSE,TRUE,TRUE),]
df$Gender == "Female"
[1] FALSE FALSE  TRUE  TRUE

df[c(3,4),] Which basically rows 3 and 4, and all columns So, you are basically extracting variables to pass them as index. To extract variables of specific column from a data frame we use $ on dataframe help("$") help("[").

one more useful resource http://www.ats.ucla.edu/stat/r/modules/subsetting.htm

Rethinking about your Q, Why to preface the Column with df when R needs to know the df you are working with ! I could not have a better explanation than above, You need to extract the variable to pass row indexes where your condition has been evaluated TRUE. Probably in dataframe columns are not referred as variables.

But, I have a good news, where things work like you think. Where, columns are referred to as variables. It is datatable. Where columns are referred as variables, thus making easy to understand syntax for indexing, joining and other data manipulations. It is an amazing package, and easy to master it.

require(data.table)
DT<-data.table(df)
 Name Gender
1:  mark   Male
2:   joe   Male
3: cathy Female
4:  zoya Female

DT[Gender == "Female"]
    Name Gender
1: cathy Female
2:  zoya Female

Yes, you don't need to preface the df again, just passing columns. Best part is, it is more efficient, faster and easier to use compared to data.frame I hope it helps.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

2.1m questions

2.1m answers

60 comments

57.0k users

...