Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
457 views
in Technique[技术] by (71.8m points)

r - 如何在不丢失信息的情况下将因子转换为整数\数字?(How to convert a factor to integer umeric without loss of information?)

When I convert a factor to a numeric or integer, I get the underlying level codes, not the values as numbers.

(当我将因子转换为数字或整数时,我得到的是底层的级别代码,而不是数值。)

f <- factor(sample(runif(5), 20, replace = TRUE))
##  [1] 0.0248644019011408 0.0248644019011408 0.179684827337041 
##  [4] 0.0284090070053935 0.363644931698218  0.363644931698218 
##  [7] 0.179684827337041  0.249704354675487  0.249704354675487 
## [10] 0.0248644019011408 0.249704354675487  0.0284090070053935
## [13] 0.179684827337041  0.0248644019011408 0.179684827337041 
## [16] 0.363644931698218  0.249704354675487  0.363644931698218 
## [19] 0.179684827337041  0.0284090070053935
## 5 Levels: 0.0248644019011408 0.0284090070053935 ... 0.363644931698218

as.numeric(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

as.integer(f)
##  [1] 1 1 3 2 5 5 3 4 4 1 4 2 3 1 3 5 4 5 3 2

I have to resort to paste to get the real values:

(我必须求助于paste以获得真实的价值:)

as.numeric(paste(f))
##  [1] 0.02486440 0.02486440 0.17968483 0.02840901 0.36364493 0.36364493
##  [7] 0.17968483 0.24970435 0.24970435 0.02486440 0.24970435 0.02840901
## [13] 0.17968483 0.02486440 0.17968483 0.36364493 0.24970435 0.36364493
## [19] 0.17968483 0.02840901

Is there a better way to convert a factor to numeric?

(有没有更好的方法可以将因子转换为数值?)

  ask by Adam SO translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

See the Warning section of ?factor :

(请参阅?factor的警告部分:)

In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion.

(特别是, as.numeric施加到一个因素是没有意义的,并且可以通过隐式强制发生。)

To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)) .

(为了将因子f转换为近似于其原始数值,建议使用as.numeric(levels(f))[f]其效率要比as.numeric(as.character(f)) 。)

The FAQ on R has similar advice .

(关于R的FAQ 也有类似的建议 。)


Why is as.numeric(levels(f))[f] more efficent than as.numeric(as.character(f)) ?

(为什么as.numeric(levels(f))[f]as.numeric(as.character(f))更有效?)

as.numeric(as.character(f)) is effectively as.numeric(levels(f)[f]) , so you are performing the conversion to numeric on length(x) values, rather than on nlevels(x) values.

(as.numeric(as.character(f))实际上是as.numeric(levels(f)[f]) ,因此您正在执行对length(x)值而不是nlevels(x)值的数字转换。)

The speed difference will be most apparent for long vectors with few levels.

(对于水平少的长矢量,速度差异最为明显。)

If the values are mostly unique, there won't be much difference in speed.

(如果这些值大多是唯一的,则速度不会有太大差异。)

However you do the conversion, this operation is unlikely to be the bottleneck in your code, so don't worry too much about it.

(无论您进行转换,此操作都不大可能成为代码中的瓶颈,因此不必担心太多。)


Some timings

(一些时机)

library(microbenchmark)
microbenchmark(
  as.numeric(levels(f))[f],
  as.numeric(levels(f)[f]),
  as.numeric(as.character(f)),
  paste0(x),
  paste(x),
  times = 1e5
)
## Unit: microseconds
##                         expr   min    lq      mean median     uq      max neval
##     as.numeric(levels(f))[f] 3.982 5.120  6.088624  5.405  5.974 1981.418 1e+05
##     as.numeric(levels(f)[f]) 5.973 7.111  8.352032  7.396  8.250 4256.380 1e+05
##  as.numeric(as.character(f)) 6.827 8.249  9.628264  8.534  9.671 1983.694 1e+05
##                    paste0(x) 7.964 9.387 11.026351  9.956 10.810 2911.257 1e+05
##                     paste(x) 7.965 9.387 11.127308  9.956 11.093 2419.458 1e+05

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...