Hi, Please apologize if my questions sounds somewhat 'stupid' to the trained and experienced statisticians of you. Also I am not sure if I used all terms correctly, if not then corrections are welcome. I have asked myself the following question regarding bootstrapping in regression: Say for whatever reason one does not want to take the p-values for regression coefficients from the established test statistics distributions (t-distr for individual coefficients, F-values for whole-model-comparisons), but instead apply a more robust approach by bootstrapping. In the simple linear regression case, one possibility is to randomly rearrange the X/Y data pairs, estimate the model and take the beta1-coefficient. Do this many many times, and so derive the null distribution for beta1. Finally compare beta1 for the observed data against this null-distribution. What I now wonder is how the situation looks like in the multiple regression case. Assume there are two predictors, X1 and X2. Is it then possible to do the same, but just only rearranging the values of one predictor (the one of interest) at a time? Say I want again to test beta1. Is it then valid to many times randomly rearrange the X1 data (and keeping Y and X2 as observed), fit the model, take the beta1 coefficient, and finally compare the beta1 of the observed data against the distributions of these beta1s ? For X2, do the same, randomly rearrange X2 all the time while keeping Y and X1 as observed etc. Is this valid ? Second, if this is valid for the 'normal', fixed-effects only regression, is it also valid to derive null distributions for the regression coefficients of the fixed effects in a mixed model this way? Or does the quite different parameters estimation calculation forbid this approach (Forbid in the sense of bogus outcome) ? Thanks, Thomas
bootstrapping in regression
7 messages · Thomas Mang, Chuck Cleland, Greg Snow +3 more
On 1/29/2009 11:43 AM, Thomas Mang wrote:
Hi, Please apologize if my questions sounds somewhat 'stupid' to the trained and experienced statisticians of you. Also I am not sure if I used all terms correctly, if not then corrections are welcome. I have asked myself the following question regarding bootstrapping in regression: Say for whatever reason one does not want to take the p-values for regression coefficients from the established test statistics distributions (t-distr for individual coefficients, F-values for whole-model-comparisons), but instead apply a more robust approach by bootstrapping. In the simple linear regression case, one possibility is to randomly rearrange the X/Y data pairs, estimate the model and take the beta1-coefficient. Do this many many times, and so derive the null distribution for beta1. Finally compare beta1 for the observed data against this null-distribution. What I now wonder is how the situation looks like in the multiple regression case. Assume there are two predictors, X1 and X2. Is it then possible to do the same, but just only rearranging the values of one predictor (the one of interest) at a time? Say I want again to test beta1. Is it then valid to many times randomly rearrange the X1 data (and keeping Y and X2 as observed), fit the model, take the beta1 coefficient, and finally compare the beta1 of the observed data against the distributions of these beta1s ? For X2, do the same, randomly rearrange X2 all the time while keeping Y and X1 as observed etc. Is this valid ? Second, if this is valid for the 'normal', fixed-effects only regression, is it also valid to derive null distributions for the regression coefficients of the fixed effects in a mixed model this way? Or does the quite different parameters estimation calculation forbid this approach (Forbid in the sense of bogus outcome) ? Thanks, Thomas
Have a look at the following document by John Fox: http://cran.r-project.org/doc/contrib/Fox-Companion/appendix-bootstrapping.pdf
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Chuck Cleland, Ph.D. NDRI, Inc. (www.ndri.org) 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894
What you are describing is actually a permutation test rather than a bootstrap (related concepts but with a subtle but important difference). The way to do a permutation test with multiple x's is to fit the reduced model (use all x's other than x1 if you want to test x1) on the original data and store the fitted values and the residuals. Permute the residuals (randomize their order) and add them back to the fitted values and fit the full model (including x1 this time) to the permuted data set. Do this a bunch of times and it will give you the sampling distribution for the slope on x1 (or whatever your set of interest is) when the null hypothesis that it is 0 given the other variables in the model is true. Permuting just x1 only works if x1 is orthogonal to all the other predictors, otherwise the permuting destroys the relationship with the other predictors and does not do the test you want. Bootstrapping depends on sampling with replacement, not permuting, and is used more for confidence intervals than for tests (the reference by John Fox given to you in another reply can help if that is the approach you want to take). Hope this helps,
Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Thomas Mang > Sent: Thursday, January 29, 2009 9:44 AM > To: r-help at stat.math.ethz.ch > Subject: [R] bootstrapping in regression > > Hi, > > Please apologize if my questions sounds somewhat 'stupid' to the > trained > and experienced statisticians of you. Also I am not sure if I used all > terms correctly, if not then corrections are welcome. > > I have asked myself the following question regarding bootstrapping in > regression: > Say for whatever reason one does not want to take the p-values for > regression coefficients from the established test statistics > distributions (t-distr for individual coefficients, F-values for > whole-model-comparisons), but instead apply a more robust approach by > bootstrapping. > > In the simple linear regression case, one possibility is to randomly > rearrange the X/Y data pairs, estimate the model and take the > beta1-coefficient. Do this many many times, and so derive the null > distribution for beta1. Finally compare beta1 for the observed data > against this null-distribution. > > What I now wonder is how the situation looks like in the multiple > regression case. Assume there are two predictors, X1 and X2. Is it then > possible to do the same, but just only rearranging the values of one > predictor (the one of interest) at a time? Say I want again to test > beta1. Is it then valid to many times randomly rearrange the X1 data > (and keeping Y and X2 as observed), fit the model, take the beta1 > coefficient, and finally compare the beta1 of the observed data against > the distributions of these beta1s ? > For X2, do the same, randomly rearrange X2 all the time while keeping Y > and X1 as observed etc. > Is this valid ? > > Second, if this is valid for the 'normal', fixed-effects only > regression, is it also valid to derive null distributions for the > regression coefficients of the fixed effects in a mixed model this way? > Or does the quite different parameters estimation calculation forbid > this approach (Forbid in the sense of bogus outcome) ? > > Thanks, Thomas > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
Greg Snow wrote:
What you are describing is actually a permutation test rather than a bootstrap (related concepts but with a subtle but important difference). The way to do a permutation test with multiple x's is to fit the reduced model (use all x's other than x1 if you want to test x1) on the original data and store the fitted values and the residuals. Permute the residuals (randomize their order) and add them back to the fitted values and fit the full model (including x1 this time) to the permuted data set. Do this a bunch of times and it will give you the sampling distribution for the slope on x1 (or whatever your set of interest is) when the null hypothesis that it is 0 given the other variables in the model is true.
Hi, Thanks to you and Tom for the correction regarding bootstrapping vs permutation, and to Chuck for the cool link. Yes of course I described a permutation. I have a question here: I am not sure if I understand your 'fit the full model ... to the permuted data set'. Am I correct to suppose that once the residuals of the reduced-model fit have been permuted and added back to the fitted values, the values obtained this way (fitted + permuted residuals) now constitute the new y-values to which the full model is fitted? Is that correct ? Do you know if this procedure is also valid for a mixed-effects model ? thanks a lot, Thomas
Permuting just x1 only works if x1 is orthogonal to all the other predictors, otherwise the permuting destroys the relationship with the other predictors and does not do the test you want. Bootstrapping depends on sampling with replacement, not permuting, and is used more for confidence intervals than for tests (the reference by John Fox given to you in another reply can help if that is the approach you want to take). Hope this helps,
Hi Thomas, Thomas Mang schrieb:
I have a question here: I am not sure if I understand your 'fit the full model ... to the permuted data set'. Am I correct to suppose that once the residuals of the reduced-model fit have been permuted and added back to the fitted values, the values obtained this way (fitted + permuted residuals) now constitute the new y-values to which the full model is fitted? Is that correct ?
It is. Look at section 2.2, "Permutation of Residuals under the Reduced Model" here: Anderson, M. J. & Legendre, P. An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. Journal of Statistical Computation and Simulation, 1999, 62, 271-303
Do you know if this procedure is also valid for a mixed-effects model ?
That's a good question... if you find out anything about this, please let me know. HTH, Stephan
1 day later
On Fri, 30 Jan 2009, Stephan Kolassa wrote:
Hi Thomas, Thomas Mang schrieb:
I have a question here: I am not sure if I understand your 'fit the full model ... to the permuted data set'. Am I correct to suppose that once the residuals of the reduced-model fit have been permuted and added back to the fitted values, the values obtained this way (fitted + permuted residuals) now constitute the new y-values to which the full model is fitted? Is that correct ?
It is. Look at section 2.2, "Permutation of Residuals under the Reduced Model" here: Anderson, M. J. & Legendre, P. An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. Journal of Statistical Computation and Simulation, 1999, 62, 271-303
Do you know if this procedure is also valid for a mixed-effects model ?
That's a good question... if you find out anything about this, please let me know.
There are various kinds of residuals in mixed effects models. But mostly
they are not what you want.
What you need are the type of residuals used in the section on
significance tests in Beran and Srivastava:
@article{beran1985bta,
title={{BOOTSTRAP TESTS AND CONFIDENCE REGIONS FOR FUNCTIONS OF A
COVARIANCE MATRIX1}},
author={Beran, R. and Srivastava, M.S.},
journal={The Annals of Statistics},
volume={13},
number={1},
pages={95--115},
year={1985}
}
HTH,
Chuck
HTH, Stephan
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
On Jan 31, 2009, at 1:27 PM, Charles C. Berry wrote:
On Fri, 30 Jan 2009, Stephan Kolassa wrote:
Hi Thomas, Thomas Mang schrieb:
I have a question here: I am not sure if I understand your 'fit the full model ... to the permuted data set'. Am I correct to suppose that once the residuals of the reduced-model fit have been permuted and added back to the fitted values, the values obtained this way (fitted + permuted residuals) now constitute the new y-values to which the full model is fitted? Is that correct ?
It is. Look at section 2.2, "Permutation of Residuals under the Reduced Model" here: Anderson, M. J. & Legendre, P. An empirical comparison of permutation methods for tests of partial regression coefficients in a linear model. Journal of Statistical Computation and Simulation, 1999, 62, 271-303
Do you know if this procedure is also valid for a mixed-effects model ?
That's a good question... if you find out anything about this, please let me know.
There are various kinds of residuals in mixed effects models. But
mostly they are not what you want.
What you need are the type of residuals used in the section on
significance tests in Beran and Srivastava:
@article{beran1985bta,
title={{BOOTSTRAP TESTS AND CONFIDENCE REGIONS FOR FUNCTIONS OF A
COVARIANCE MATRIX1}},
author={Beran, R. and Srivastava, M.S.},
journal={The Annals of Statistics},
volume={13},
number={1},
pages={95--115},
year={1985}
}
Thank you, Chuck. A search brings up a link to an open access version through Project Euclid: <http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdf_1&handle=euclid.aos/1176346579 >
David Winsemius > HTH, > > Chuck > >> >> HTH, >> Stephan >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:cberry at tajo.ucsd.edu UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego > 92093-0901 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.