Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
110 views
in Technique[技术] by (71.8m points)

get(x) does not work in R data.table when x is also a column in the data table

I noticed that get(x) does not work in R data table when x is also a column in the same data table. See the code snippet below. This is hard to avoid completely when writing an R function which takes the data table as an input. Is this a bug in the R data.table package? Thanks!

library(data.table)

dt = data.table(x=1:3, y=2:4)

var = 'y'
x = 'y'

dt[, 3*get(var)]      # [1] 6 9 12
dt[, 3*get(x)]        # Error in get(x): invalid first argument
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

In general, when there is a naming conflict between columns and variables, columns will take precedence. Since v1.10.2 (31 Jan 2017) of data.table, the preferred approach to clarify that a name is a not a column name is to use the .. prefix [1]:

When j is a symbol prefixed with .. it will be looked up in calling scope and its value taken to be column names or numbers. When you see the .. prefix think one-level-up, like the directory .. in all operating systems means the parent directory. In future the .. prefix could be made to work on all symbols apearing anywhere inside DT[...]. ...

Our main focus here which we believe .. achieves is to resolve the more common ambiguity when var is in calling scope and var is a column name too. Further, we have not forgotten that in the past we recommended prefixing the variable in calling scope with .. yourself. If you did that and ..var exists in calling scope, that still works, provided neither var exists in calling scope nor ..var exists as a column name. Please now remove the .. prefix on ..var in calling scope to tidy this up. In future data.table will start to warn/error on such usage.

In your case, you can get(..x) to force the name x to be resolved in calling scope rather than within the data.table environment:

library(data.table)

dt = data.table(x=1:3, y=2:4)

var = 'y'
x = 'y'

dt[, 3*get(var)]      # [1] 6 9 12
dt[, 3*get(x)]        # Error in get(x): invalid first argument
dt[, 3*get(..x)]      # [1]  6  9 12

The .. prefix is still somewhat experimental and thus has limited documentation, but it is mentioned briefly on the help page for data.table:

By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. In case of overlapping variables names inside dataset and in parent scope you can use double dot prefix ..cols to explicitly refer to 'cols variable parent scope and not from your dataset.

This is less a bug and more an unfortunate but natural consequence of with = T to allow using columns as variables in a data environment. Indeed, you could avoid this issue in a more base R way by using the pos or envir argument of get().


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...