dataframe - Faster way to subset on rows of a data frame in R?

Question

Welcome To Ask or Share your Answers For Others

dataframe - Faster way to subset on rows of a data frame in R?

1 Answer

深蓝 · Answer 1 · 2021-10-17T02:51:19+0000

The question asks for a faster way to subset rows of a data frame. The fastest way is with data.table.

set.seed(1)  # for reproducible example
# 1 million rows - big enough?
df <- data.frame(age=sample(1:65,1e6,replace=TRUE),x=rnorm(1e6),y=rpois(1e6,25))

library(microbenchmark)
microbenchmark(result<-df[which(df$age>5),],
               result<-subset(df, age>5), 
               result<-df[df$age>5,],
               times=10)
# Unit: milliseconds
#                               expr       min        lq    median       uq      max neval
#  result <- df[which(df$age > 5), ]  77.01055  80.62678  81.43786 133.7753 145.4756    10
#      result <- subset(df, age > 5) 190.89829 193.04221 197.49973 203.7571 263.7738    10
#         result <- df[df$age > 5, ] 169.85649 171.02084 176.47480 185.9394 191.2803    10

library(data.table)
DT <- as.data.table(df)     # data.table
microbenchmark(DT[age > 5],times=10)
# Unit: milliseconds
#         expr      min       lq  median       uq      max neval
#  DT[age > 5] 29.49726 29.93907 30.1813 30.67168 32.81204    10

So in this simple case data.table is a little more than twice as fast as which(...), and more than 6 times faster than subset(...).

Categories

dataframe - Faster way to subset on rows of a data frame in R?

dataframe - Faster way to subset on rows of a data frame in R?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags