binomial fixed-effect p-values by simulation
I don't quite understand your question about the t-distribution -- do you mean naively using the t-statistic to test the null hypothesis? A few issues here -- (1) this is a Wald test, hence not generally as accurate as model comparison; (2) assuming the scale parameter is fixed at 1, I *think* technically one should just do a Z-test rather than a t-test here -- but I'm not sure. I have a vague memory of a reference saying that Z- is actually better than t- in this case (besides just being theoretically justified). (3) if you do quasilikelihood, then you should do the t-test, but then you have to decide on appropriate degrees of freedom (oh joy). Arguably the null-model simulations could give quite a bit of insight into the _real_ shape of the distribution of (est)/(s.e.) under the null hypothesis -- i.e., is it t? With how many df? Ben
Daniel Ezra Johnson wrote:
Yes, that would be very helpful if you'd sent your simulation function. I'm encouraged to know that I'm on the right track with this, at least in theory. For empirical p-values I can't see most people being satisfied with, say, 100 replications and the level of error that would bring. So the slowness is a problem. When would someone use the t-statistics distribution, versus the way I did it by comparing fixed effect sizes between the simulation and the data? Or are those two essentially the same thing, statistically? Thanks, D On Sat, Aug 23, 2008 at 12:40 AM, Ben Bolker <bolker at ufl.edu> wrote:
I just wrote a "simulate" method for glmer objects, to extend the
version that's already in lme4 for lmer objects. Given a nested pair
of models m0, m1, the simulation distribution of t-statistics would
be
replicate(1000,get.t.statistic(refit(m1,simulate(m0))))
See simulate.lme in nlme, which also has a plot method
that displays the nominal and empirical p-values for REML
and ML estimates.
The bad news is that it's essentially doing the same thing you
discuss below -- and therefore it's not any faster (I was able
to do about 10 replicates/minute ...)
If you're interested let me know and I can send my
"glmersim.R"
Ben Bolker
Daniel Ezra Johnson wrote:
Consider the common situation of modeling multiple responses from
multiple subjects, where we want to know whether a fixed-effect
between-subjects variable -- for example, gender -- has a significant
effect.
In my field (quantitative sociolinguistics) the response is usually
binomial, and for some 30 years the usual procedure has been to carry
out logistic regression, setting the subject variable aside, and
assessing fixed effect significance with a likelihood-ratio test. When
between-subject variation is high, this produces exceptionally
anti-conservative p-values (see old.p.value below).
I am writing an article and some wrapper functions designed to
introduce the use of mixed models to this audience. With subject
included as a random effect, the p-values for fixed between-subject
effects are much more conservative, even using the likelihood-ratio
test (see lrt.p.value below). In other words, this is an improvement.
But as has been discussed many times on this list, likelihood-ratio
tests are still anti-conservative for fixed effect significance. I
believe the extent of anti-conservatism depends on the number of
observations relative to the number of groups, the balance of the
data, and probably other factors too.
I have seen the MCMC method discussed, but I think this is not yet
available for binomial data with lmer(), is that correct? In addition,
that method, which operates in the parameter space, seems to be
testing a slightly different null hypothesis than the usual
frequentist one, though I may be misunderstanding this.
Would it be appropriate for me to carry out a simulation of the type
given below, to assess fixed effect significance?
The code starts by generating a dataset which happens to have a
potentially significant gender effect, according to the LRT. (In fact
the data is randomly generated and has no "real" gender effect, but
assume it was real data and we did not know that.)
A null model (no fixed effect) and an alternative model (fixed gender
effect) are fit to the data by lmer(), using a random effect for
subject in both cases.
It then takes the two parameters from the null model -- intercept and
subject s.d. -- and randomly generates new data. The simulated data is
fit with the alternative-style model and the proportion of times the
fixed effect magnitude exceeds that from the 'observed' data is the
p-value.
Does this procedure make sense? It would generalize to linear models
easily. Making it work for unbalanced data and when there are other
fixed effects in the model would not be that hard either, I think.
To my mind, this is a direct implementation of the frequentist p-value
concept via simulation. Am I missing something important?
Of course, especially with binomial data this simulation is quite slow
(i.e. compared to the unimplemented MCMC method). The code below, with
1000 simulations of a 2000-observation dataset, took 15 minutes to run
on a MacBook Pro. (The result was a p-value of 0.027, compared to
0.015 from the LRT, 0.009 from the Wald statistic... and 1.1e-56 from
the fixed-effects model!)
It's important to me to be able to give p-values that are
accurate/conservative and that I understand. So if anyone can give me
guidance on whether this simulation procedure makes sense, it would be
much appreciated.
And if there's a faster way to do something like this for binomial
data, please let me know.
Thanks,
Daniel
library(lme4)
library(boot)
set.seed(10)
observed <- data.frame(subject = rep(letters[1:20],each=100),
gender = rep(c("M","F"),each=1000),
response = rbinom(2000,1,rep(plogis(rnorm(20,0,3)),each=100)))
old <- glm(response~gender,binomial,observed)
old.p.value <- pchisq(old$null.deviance-old$deviance,df=1,lower.tail=F)
# 1.083e-56
null <- lmer(response ~ (1|subject),observed,binomial)
fixed <- lmer(response ~ gender + (1|subject),observed,binomial)
wald.p.value <- summary(fixed)@coefs["genderM","Pr(>|z|)"] # 0.00894
lrt.p.value <- anova(null,fixed)[,"Pr(>Chisq)"][2] # 0.0152
null.intercept <- fixef(null) # -0.0914
null.subject.sd <- attr(VarCorr(null)$subject,"stddev") # 2.276
gender.observed <- fixef(fixed)[2] # -2.319
gender.simulated <- vector()
for (i in 1:1000){
simulated <- data.frame(subject = rep(letters[1:20],each=100),
gender = rep(c("M","F"),each=1000),
response =
rbinom(2000,1,rep(plogis(rnorm(20,0,null.subject.sd)),each=100)))
fixed.sim <- lmer(response~gender+(1|subject),simulated,binomial)
gender.simulated[i] <- fixef(fixed.sim)[2]}
# minutes later...
simulation.p.value <- mean(abs(gender.simulated)>=abs(gender.observed)) #
0.027
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: glmersim.R URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20080822/925dc8fe/attachment.pl>