I'm trying to gain a deeper understanding of the sampling distribution, and I've been working through some simulations to that end. For this exercise, the distribution I'm working with is a log-normal distribution with mean=0.1 and sigma=0.17. My code is below:
n_sims <- 1000
mu <- rep(NA, n_sims)
lo95 <- rep(NA, n_sims)
hi95 <- rep(NA, n_sims)
data <- rlnorm(1000, 0.1, 0.17)
for (i in 1:n_sims){
sim <- sample(data, 1000)
mu[i] <- mean(sim)
lo95[i] <- mean(sim) - 2*sd(sim)
hi95[i] <- mean(sim) + 2*sd(sim)
}
xs <- seq(1,n_sims,1)
plot(xs, mu, pch=16, ylim = c(min(lo95)-0.05, max(hi95)+0.05))
segments(xs, lo95, xs, hi95, lwd = 0.5, col = "gray")
sum((lo95 <= 1.1) & (hi95 >= 1.1))
I'm expecting 95% of the samples to contain the true value of the distribution (1.1 on the transformed scale), but the last line of code reveals that all of the 1000 samples contain the true mean? My understanding is that only 95% of these simulations should contain the correct mean. Is there something I'm not understanding?
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…