Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
218 views
in Technique[技术] by (71.8m points)

r - How to simulate the sampling distribution?

I'm trying to gain a deeper understanding of the sampling distribution, and I've been working through some simulations to that end. For this exercise, the distribution I'm working with is a log-normal distribution with mean=0.1 and sigma=0.17. My code is below:

n_sims <- 1000

mu <- rep(NA, n_sims)
lo95 <- rep(NA, n_sims)
hi95 <- rep(NA, n_sims)

data <- rlnorm(1000, 0.1, 0.17)

for (i in 1:n_sims){
  sim <- sample(data, 1000)
  mu[i] <- mean(sim)
  lo95[i] <- mean(sim) - 2*sd(sim)
  hi95[i] <- mean(sim) + 2*sd(sim)
}

xs <- seq(1,n_sims,1)

plot(xs, mu, pch=16, ylim = c(min(lo95)-0.05, max(hi95)+0.05))
segments(xs, lo95, xs, hi95, lwd = 0.5, col = "gray")

sum((lo95 <= 1.1) & (hi95 >= 1.1))

I'm expecting 95% of the samples to contain the true value of the distribution (1.1 on the transformed scale), but the last line of code reveals that all of the 1000 samples contain the true mean? My understanding is that only 95% of these simulations should contain the correct mean. Is there something I'm not understanding?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

The bug is located here: sample(data, 1000).
The default for the sample function is "replace=FALSE" thus every iteration is using the same exact samples. To properly bootstrap your analysis you need to sample with replacement: sim <- sample(data, 1000, replace=TRUE).

Also to calculate the confidence limits of your estimated mean, I believe you want to use mu +/- 2*sd/sqrt(n), where n is the number of samples.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...