statistics - What is difference between replicate n times and generate n directly in sampling of R?

Question

Welcome To Ask or Share your Answers For Others

statistics - What is difference between replicate n times and generate n directly in sampling of R?

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

statistics - What is difference between replicate n times and generate n directly in sampling of R?

I am asked to "simulate x as an independent identically distributed (iid) normal variable with mean=0, std=1.5 with sample length 500"

I am doing the sampling in following two ways:

set.seed(8402)
X <- rnorm(500, 0, 1.5)
head(X)

and I got

-1.8297969 -0.1862884 1.4219400 -1.0841421 -1.5276701 1.6159368

However, if I do

X <- replicate(500, rnorm(1,0,1.5))
head(X)

and I got

-0.04032755 0.92002552 -2.28001943 -1.36840869 1.49820718 0.06205003

My question is what is the right way to generate iid normal variable? What is the difference between those two ways?

Many thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Answer

深蓝 · Answer 1 · 2021-10-23T21:43:09+0000

R Internal

Internally in R, the C function from <Rmath.h>: double rnorm (double mean, double sd) function generates one random number at a time. When you call its R wrapper function rnorm(n, mean, sd), it calls the C level function n times.

This is as same as you call R level function only once with n = 1, but replicate it n times using replicate.

The first method is much faster (possibly the difference will be seen when n is really large), as everything is done at C level. replicate however, is a wrapper of sapply, so it is not really a vectorized function (read on Is the "*apply" family really not vectorized?).

In addition, if you set the same random seed for both, you are going to get the same set of random numbers.

A more illustrative experiment

In my comment below, I say that random seed is only set once on entry. To help people understand this, I provide this example. There is no need to use large n. n = 4 is sufficient.

First, let's set seed at 0, while generating 4 standard normal samples:

set.seed(0); rnorm(4, 0, 1)
## we get
[1]  1.2629543 -0.3262334  1.3297993  1.2724293

Note that in this case, all 4 numbers are obtained from the entry seed 0.

Now, let's do this:

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## do not reset seed, but continue with the previous seed
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.329799 1.272429

See?

But if we reset seed in the middle, for example, set it back to 0

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## reset seed
set.seed(0)
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.2629543 -0.3262334

This is what I mean by "entry".

Categories

statistics - What is difference between replicate n times and generate n directly in sampling of R?

statistics - What is difference between replicate n times and generate n directly in sampling of R?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags