I always thought that the lm
function was extremely fast in R, but as this example would suggest, the closed solution computed using the solve
function is way faster.
data<-data.frame(y=rnorm(1000),x1=rnorm(1000),x2=rnorm(1000))
X = cbind(1,data$x1,data$x2)
library(microbenchmark)
microbenchmark(
solve(t(X) %*% X, t(X) %*% data$y),
lm(y ~ .,data=data))
Can someone explain me if this toy example is a bad example or it is the case that lm
is actually slow?
EDIT: As suggested by Dirk Eddelbuettel, as lm
needs to resolve the formula, the comparison is unfair, so better to use lm.fit
which doesn't need to resolve the formula
microbenchmark(
solve(t(X) %*% X, t(X) %*% data$y),
lm.fit(X,data$y))
Unit: microseconds
expr min lq mean median uq max neval cld
solve(t(X) %*% X, t(X) %*% data$y) 99.083 108.754 125.1398 118.0305 131.2545 236.060 100 a
lm.fit(X, y) 125.136 136.978 151.4656 143.4915 156.7155 262.114 100 b
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…