bootstrapping coefficient p-values under null hypothesis in case-resampling multiple linear mixed-effects regression
If your null hypothesis is that variable X has a coefficient of zero in the model, would not sampling under the null hypothesis be done by case resampling of every variable except X, and then resample X from its set of values? It would appear wise to just do case resampling and construct a confidence interval for the coefficient from the bootstrap. I avoid statistical testing as completely as possible.
On 1/27/2018 5:19 PM, Aleksander G??wka wrote:
Dear mixed-effects community,
I am fitting a multiple linear mixed-effects regression model in lme4.
The residual fit is near-linear, enough to warrant not assuming
residual homoscedasticity. One way to model regression without
explicitly making this assumption is to use case-resampling regression
(Davison & Hinkley 1997), an application of the bootstrap (Efron &
Tibshirani 1993).
In case-resampling regression, rather than assuming a normal
distribution for the T-statistic, we estimate the distribution of T
empirically. We mimic sampling from the original population by
treating the original sample as if it were the population: for each
bootstrap sample of size n we randomly select n values with
replacement from the original sample and then fit regression giving
estimates, repeating this procedure R times.
Having applied this procedure, I am trying to calculate empirical
p-values for my regression coefficients. As in parametric regression,
I want to conduct the two-tailed hypothesis test of significance for
slope with test statistic T under the null hypothesis H0:?^1=0. Since
we are treating the original sample as the population, our T=t is the
observed value from the original sample. For ?^{0,1,?,p} We calculate
the p-value as follows:
(1) min(p=(1{T?t}/R),p=(1{T?t})/R)
Davison and Hinkley take t=?^1
so that, in practice
(2) min(p=(1{??^1??^1}+1)/(R+1),p=(1{??^1??^1}+1)/(R+1))
The major problem here is that the bootstrap samples were not sampled
under the null hypothesis, so in (1) and (2) we are evaluating the
alternative hypothesis rather than the null. Efron & Tibshirani (1993)
indeed caution that all hypothesis testing must be performed by
sampling under the null. This is relatively simple for, say, testing
the difference between two means, where the null H0:?1=?2, and which
requires a simple transformation of the data prior to sampling.
So my question here is: how do I perform significance testing under
the null hypothesis in case-resampling regression? As far as I could
see, neither Davison & Hinkley (1997) nor Efron & Tibshirani (1993)
seem to mention how to sample under the null. Is there some adjustment
that I can introduce before (to the data) or after case-resampling (to
the least-squares formula) in a way that is easily implementable in R
and lme4? Any ideas and or algorithms would be greatly appreciated.
N.B. With all due respect, please don?t advise me to fit a GLM instead
or to talk directly with Rob Tibshirani.
Thank you,
Aleksander Glowka
PhD Candidate in Linguistics
Stanford University
Works cited:
Davison, A. C. and D. V. Hinkley (1997). Bootstrap Methods and their
Applications. Cambridge, England: Cambridge University Press.
Efron, B. and Tibshirani, R.J. (1993). An Introduction to the
Bootstrap. New York: Champman & Hall.
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Robert A LaBudde, BS, MS, PhD, ChDipl ACAFS President Least Cost Formulations, Ltd URL: lcfltd.com 824 Timberlake Dr Tel: 757-467-0954 Virginia Beach, VA 23464 Fax: 757-467-2947