"simulate" does not include variability in parameter estimation
Hello, All: ????? The default "simulate" method for lm and glm seems to ignore the sampling variance of the parameter estimates;? see the trivial lm and glm examples below.? Both these examples estimate a mean with formula = x~1.? In both cases, the variance of the estimated mean is 1. ??? ??????? * In the lm example with x0 = c(-1, 1), var(x0) = 2, and var(unlist(simulate(lm(x0~1), 10000, 1))) is 2.0064.? Shouldn't it be 3 = var(mean(x0)) + var(x0) = (2/2) + 2? ??? ??????? * In the glm example with x1=1, var(unlist(simulate(glm(x1~1, poisson), 10000, 1))) = 1.006. Shouldn't it be 2 = var(glm estimate of the mean) + var(simulated Poisson distribution) = 1 + 1? ????? I'm asking, because I've recently written "simulate" methods for objects of class stats::glm and BMA::bic.glm, where my primary interest was simulating the predicted mean with "newdata".? I'm doing this, so I can get Monte Carlo prediction intervals.? My current code for "simulate.glm" and "simulate.bic.glm" are available in the development version of the "Ecfun" package on GitHub: https://github.com/sbgraves237/Ecfun ????? Comparing my new code with "stats:::simulate.lm" raises the following questions in my mind regarding "simulate" of a fit object: ??? ??????? 1.? Shouldn't "simulate" start by simulating the random variability in the estimated parameters?? I need that for my current application.? If a generic "simulate" function should NOT include this, what should we call something that does include this?? And how does the current stats:::simulate.lm behavior fit with this? ???? ?????? 2.? Shouldn't "simulate" of a model fit include an option for "newdata"?? I need that for my application. ??? ??????? 3.? By comparing with "predict.glm", I felt I needed an argument 'type = c("link", "response")'.? "predict.glm" has an argument 'type = c("link", "response", "terms")'.? I didn't need "terms", so I didn't take the time to code that.? However, a general "simulate" function should probably include that. ??? ??????? 4.? My application involves assumed Poisson counts.? I need to simulate those as well.? If I combined those with "simulate.glm", what would I call them?? I can't use the word "response", because that's already used with a different meaning. Might "observations" be the appropriate term? ????? What do you think? ????? Thanks, ????? Spencer Graves > x0 <- c(-1, 1) > var(x0) [1] 2 > fit0 <- lm(x0~1) > vcov(fit0) ??????????? (Intercept) (Intercept)?????????? 1 > sim0 <- simulate(fit0, 10000, 1) > var(unlist(sim0)) [1] 2.006408 > x1 <- 1 > fit1 <- glm(x1~1, poisson) > coef(fit1) ?(Intercept) 4.676016e-11 > exp(coef(fit1)) (Intercept) ????????? 1 > vcov(fit1) ??????????? (Intercept) (Intercept)?? 0.9999903 > sim1 <- simulate(fit1, 10000, 1) > var(unlist(sim1)) [1] 1.00617 > sessionInfo() R version 3.6.2 (2019-12-12) Platform: x86_64-apple-darwin15.6.0 (64-bit) Running under: macOS Catalina 10.15.2 Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats???? graphics? grDevices utils???? datasets? methods base loaded via a namespace (and not attached): [1] compiler_3.6.2 tools_3.6.2