Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
451 views
in Technique[技术] by (71.8m points)

r - Avoid copying the whole vector when replacing an element (a[1] <- 2)

When replacing an element in a vector such as

a <- 1:1000000
a[1] <- 2

R copies the whole vector, replaces the element in the new vector, and then do the variable name re-association. I was wondering anyway to override or prevent this to make it behave something more like c array?

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The tracemem function (R needs to be compiled to support it) provides an indication of when copying occurs. Here's what you do

> a <- 1:1000000; tracemem(a)
[1] "<0x7f791b39e010>"
> a[1] = 2
tracemem[0x7f791b39e010 -> 0x7f791a9d4010]: 

and indeed there's a copy. But this is because you're coercing a from an integer vector (1:1000000 creates a sequence of integers) to a numeric vector (because 2 is a numeric value, and R coerces to a common type). If instead you update your integer vector with an integer value, or a numeric vector with a numeric value, there is no copying

> a <- 1:1000000; tracemem(a)
[1] "<0x7f791a4ef010>"
> a[1] = 2L
> a = c(1, 2, 3); tracemem(a)
[1] "<0x5180470>"
> a[1] = 2
>

A little bit further insight comes from understanding at a superficial level how R's memory management works. Each allocation has a NAMED level associated with it. NAMED=0 or 1 indicates that there is at most 1 symbol that refers to it; it is therefore safe to copy in place. NAMED=2 means that there is, or has been, at least 2 symbols pointing to the same location, and that any attempt to update the value requires a duplication to preserve R's illusion of 'copy on change'. The following reveals some of the internal structure of a, including that it of type INTSXP (integer) with NAM(1) (NAMED level 1) and that it's being TRaced. Hence updating (with an integer!) does not require a copy.

> a = 1:10; tracemem(a); .Internal(inspect(a))
[1] "<0x5170818>"
@5170818 13 INTSXP g0c4 [NAM(1),TR] (len=10, tl=0) 1,2,3,4,5,...
> a[1] = 2L
> 

On the other had, here two symbols refer to the location in memory, hence NAMED is 2 and a copy is required

> a = b = 1:10; tracemem(a); .Internal(inspect(a))
[1] "<0x576d1a0>"
@576d1a0 13 INTSXP g0c4 [NAM(2),TR] (len=10, tl=0) 1,2,3,4,5,...
> a[1] = 2L
tracemem[0x576d1a0 -> 0x576d148]: 

It is difficult to reason about NAMED, so at some level these types of games have a level of futility about them.

inspect returns other information. Each R type is represented internally as an 'SEXP' (S-expression) type. These are enumerate, and the 13th SEXP type is an integer SEXP -- hence 13 INTSXP. Check out .Internal(inspect(...)) for a numeric vector, character vector, or even function .Internal(inspect(function() {})).

R manages memory by periodically running a 'garbage collector' that checks to see if memory is currently referenced; if it is not, then it is reclaimed for use by another symbol. The garbage collector is 'generational', which means that recently allocated memory is checked for reclamation more frequently than older memory (this is because, empirically, variables tend to have a short half-life, e.g., for the duration of a function call, so recently allocated memory is more likely to be available for reclamation than memory that has been in use for a longer time). The g0c4 and similar annotations are providing information about the generation the SEXP belongs to.

The TR represents a 'bit' set in the SEXP to indicate that the variable is being traced; it was set when we said tracemem(a).

Some of these topics are discussed in the documentation of R's internal implementation RShowDoc("R-ints") and in the C header file Rinternals.h.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...