Bootstrap Methods for Confidence Intervals -- glmnet
Lorenzo: This is a complicated and subtle question that I believe is mostly about statistical methodology, not R. I would suggest that you post your query to stats.stackexchange.com rather than here in order to determine *what* you should do. Then, if necessary, you can come back here to ask about *how* to do it in R (with code from your failed attempts, etc.). Better yet, you might wish to have this discussion with a local expert, if you can find one. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Thu, May 12, 2016 at 7:49 AM, Lorenzo Isella
<lorenzo.isella at gmail.com> wrote:
Dear All, Please have a look at the code at the end of the email. It is just an example of regression based on glmnet with some artificial data. My question is how I can evaluate the uncertainty of the prediction yhat. It looks like there are some reasons for not providing a standard error estimate, see e.g. http://stackoverflow.com/questions/12937331/how-to-get-statistical-summary-information-from-glmnet-model and https://www.reddit.com/r/statistics/comments/1vg8k0/standard_errors_in_glmnet/ However, from what I read in this thesis https://air.unimi.it/retrieve/handle/2434/153099/133417/phd_unimi_R07738.pdf (see sections 3.2 and 3.3) and in the quoted papers http://www.stat.cmu.edu/~fienberg/Statistics36-756/Efron1979.pdf and http://www.ams.org/journals/proc/2010-138-12/S0002-9939-2010-10474-4/S0002-9939-2010-10474-4.pdf there are some bootstrap methods that are quite general and applicable well beyond the case of glmnet. Is there anything already implemented to help me out? Is anybody aware of this? Cheers Lorenzo ######################################################################### ######################################################################### ######################################################################### ######################################################################### ######################################################################### library(glmnet) # Generate data set.seed(19875) # Set seed for reproducibility n <- 1000 # Number of observations p <- 5000 # Number of predictors included in model real_p <- 15 # Number of true predictors x <- matrix(rnorm(n*p), nrow=n, ncol=p) y <- apply(x[,1:real_p], 1, sum) + rnorm(n) # Split data into train (2/3) and test (1/3) sets train_rows <- sample(1:n, .66*n) x.train <- x[train_rows, ] x.test <- x[-train_rows, ] y.train <- y[train_rows] y.test <- y[-train_rows] fit.elnet <- glmnet(x.train, y.train, family="gaussian", alpha=.5) yhat <- predict(fit.elnet, s=fit.elnet$lambda, newx=x.test)
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.