Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
650 views
in Technique[技术] by (71.8m points)

r - Changing behaviour of stats::lag when loading dplyr package

I am having trouble with the stats::lag function when using the dplyr package. Specifically, I get different results from the lag function before and after loading dplyr.

For example, here is a sample time series. If I calculate the lag with k = -1, the lagged series starts in 1971.

data <- ts(1:10, start = 1970, frequency = 1)
lag1 <- stats::lag(data, k = -1)
start(lag1)[1]

## [1] 1971

Now, if I load dplyr, the same call yields a lagged series starting in 1970.

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

lag2 <- stats::lag(data, k = -1)
start(lag2)[1]

## [1] 1970

start(lag1)[1] == start(lag2)[1]

## [1] FALSE

Given the warnings when loading dplyr, my guess is that this has to do with Environments. But, detaching dplyr doesn't seem to help.

detach("package:dplyr", unload = TRUE, character.only = TRUE)
lag3 <- stats::lag(data, k = -1)
start(lag3)[1]

## [1] 1970

start(lag1)[1] == start(lag3)[1]

## [1] FALSE

Any suggestions are greatly appreciated. My only solution so far is to restart the R session between calculating lag1 and lag2.

Here's my session:

##  setting  value                       
##  version  R version 3.2.0 (2015-04-16)
##  system   i386, mingw32               
##  ui       RTerm                       
##  language (EN)                        
##  collate  English_Canada.1252         
##  tz       America/New_York            
## 
##  package    * version  date       source        
##  assertthat   0.1      2013-12-06 CRAN (R 3.2.0)
##  bitops       1.0-6    2013-08-17 CRAN (R 3.2.0)
##  DBI          0.3.1    2014-09-24 CRAN (R 3.2.0)
##  devtools     1.8.0    2015-05-09 CRAN (R 3.2.0)
##  digest       0.6.8    2014-12-31 CRAN (R 3.2.0)
##  dplyr        0.4.1    2015-01-14 CRAN (R 3.2.0)
##  evaluate     0.7      2015-04-21 CRAN (R 3.2.0)
##  formatR      1.2      2015-04-21 CRAN (R 3.2.0)
##  git2r        0.10.1   2015-05-07 CRAN (R 3.2.0)
##  htmltools    0.2.6    2014-09-08 CRAN (R 3.2.0)
##  httr       * 0.6.1    2015-01-01 CRAN (R 3.2.0)
##  knitr        1.10.5   2015-05-06 CRAN (R 3.2.0)
##  magrittr     1.5      2014-11-22 CRAN (R 3.2.0)
##  memoise      0.2.1    2014-04-22 CRAN (R 3.2.0)
##  Rcpp         0.11.6   2015-05-01 CRAN (R 3.2.0)
##  RCurl        1.95-4.6 2015-04-24 CRAN (R 3.2.0)
##  rmarkdown    0.6.1    2015-05-07 CRAN (R 3.2.0)
##  rversions    1.0.0    2015-04-22 CRAN (R 3.2.0)
##  stringi      0.4-1    2014-12-14 CRAN (R 3.2.0)
##  stringr      1.0.0    2015-04-30 CRAN (R 3.2.0)
##  XML          3.98-1.1 2013-06-20 CRAN (R 3.2.0)
##  yaml         2.1.13   2014-06-12 CRAN (R 3.2.0)

I've also tried unloadNamespace, as suggested by @BondedDust:

unloadNamespace("dplyr")  
lag4 <- stats::lag(data, k = -1)  

## Warning: namespace 'dplyr' is not available and has been replaced  
## by .GlobalEnv when processing object 'sep'  

start(lag4)[1]  

## [1] 1970  

start(lag1)[1] == start(lag4)[1]  

## [1] FALSE
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The dplyr package is effectively overwriting 'lag'. The dispatch mechanism is not finding lag because there really is no function by that name, just two copies of lag.default, one in 'stats' and one in 'dplyr' and the 'dplyr' copy is being found first. You can force the stats version to be found with the use of the :::-mechanism:

> lag2 <- stats::lag.default(data, k = -1)
Error: 'lag.default' is not an exported object from 'namespace:stats'

> lag2 <- stats:::lag.default(data, k = -1)
> stats::start(lag2)[1]
[1] 1971

The dplyr:::lag.default does not use the time-series specific functions. I'm not able to explain why unloadNamespace fails to remove the function's definition but it is still there:

> unloadNamespace("dplyr")
> getAnywhere(lag.default)
2 differing objects matching ‘lag.default’ were found
in the following places
  registered S3 method for lag from namespace dplyr
  namespace:stats
Use [] to view one of them

Further weirdness: After unloading the dply-namespace I see this:

> environment(getAnywhere(lag.default)[1])
<environment: namespace:dplyr>
> environment(getAnywhere(lag.default)[2])
<environment: namespace:dplyr>
> environment(getAnywhere(lag.default)[3])
<environment: namespace:stats>

(And then restarting and loading dplyr, I see the same apparent double-entry.)

There's also something weird about the help page for dplyr::lag:

> help(lag,pac=dplyr)
No documentation for ‘lag’ in specified packages and libraries:
you could try ‘??lag’
> help(`lag`,pac=`dplyr`)
No documentation for ‘lag’ in specified packages and libraries:
you could try ‘??lag’
> help(`lag.default`,pac=`dplyr`)  # This finally succeeds!

Looking at github (after determining that I had the latest version of dplyr on CRAN), I see that this was an issue for the R CMD check process: https://github.com/hadley/dplyr/commit/f8a46e030b7b899900f2091f41071619d0a46288 . Apparently lag.default will not be over-written in future versions, but lag will mask the stats-version. I wonder what happens to lag.zoo and lag.zooreg. Maybe it will also announce that over-writing or masking when the package is loaded?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...