An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20091203/37a3f84b/attachment.pl>
lmer models-confusing results - more information!
10 messages · Gwyneth Wilson, Jarrod Hadfield, Ben Bolker +4 more
Dear Gwyneth, Since you're not getting any answers - I'l give it a go, at the risk of being wrong. The likelihood for non-Gaussian GLMM cannot be obtained in closed form and needs to be approximated. Often the approximation is good, but in some cases it can be bad, particularly with binary data when the incidence is extreme (low/high) and/or there is little replication within factor levels. In extreme cases the parameter estimates +/- the 2*SE's do not even include the "true" values. From your fixed effect summary it appears that reproductive successes within some factor levels are all zero. If this is the case, this may well be what is causing the problem and treating year as a random effect may help. MCMC solutions are probably more robust for these types of data because they use approximations which get more exact the longer you run the analysis. With regards to an earlier email, over-dispersed binary data does not occur, because the mean determines the variance completely. This does not mean that the probability of success is constant (after conditioning on the model), it just means that any heterogeneity cannot be observed and therefore estimated. In short, you don't need to worry about it. Cheers, Jarrod
On 3 Dec 2009, at 06:33, Gwyneth Wilson wrote:
I have been running lmer models in R, looking at what effects reproductive success in Ground Hornbills (a South African Bird). My response variable is breeding success and is binomial (0-1) and my random effect is group ID. My response variables include rainfall, vegetation, group size, year, nests, and proportion of open woodland. I have run numerous models with success but I am confused about what the outputs are. When I run my first model with all my variables (all additive) then i get a low AIC value with only a few of the variables being significant. When i take out the varaibles that are not significant then my AIC becomes higher but I have more significant variables! When I keep taking out the unsignificant variables, I am left with a model that has nests, open woodland, and group size as being extremely significant BUT the AIC is high! Why is my AIC value increasing when I have fewer varaibles that are all significant and seem to be best explaining my data? Do i look at only the AIC when choosing the 'best' model or do I look at only the p-values? or both? The model with the lowest AIC at the moment has the most variables and most are not significant? Please help. Any suggestions would be great!! Here is some more information and some of my outputs: The first model has all my variables included and i get a low AIC with only grp.sz and wood being significant: model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1| grp.id),data=hornbill,family=binomial)
summary(model1)
Generalized linear mixed model fit by the Laplace approximation
Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg +
wood + (1 | grp.id)
Data: hornbill
AIC BIC logLik deviance
138.5 182.3 -55.26 110.5
Random effects:
Groups Name Variance Std.Dev.
grp.id (Intercept) 1.2913 1.1364
Number of obs: 169, groups: grp.id, 23
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.930736 3.672337 -1.070 0.2845
factor(art.n)1 1.462829 0.903328 1.619 0.1054
factor(yr)2002 -2.592315 1.764551 -1.469 0.1418
factor(yr)2003 -3.169365 1.759981 -1.801 0.0717 .
factor(yr)2004 0.702210 1.341524 0.523 0.6007
factor(yr)2005 -2.264257 1.722130 -1.315 0.1886
factor(yr)2006 2.129728 1.270996 1.676 0.0938 .
factor(yr)2007 -0.579961 1.390345 -0.417 0.6766
factor(yr)2008 -1.062933 1.640774 -0.648 0.5171
grp.sz 1.882616 0.368317 5.111 3.2e-07 ***
rain -0.005896 0.003561 -1.656 0.0977 .
veg -1.993443 1.948738 -1.023 0.3063
wood 6.832543 3.050573 2.240 0.0251 *
Then i carry on and remove varaibles i think are not having an
influence on breeding success like the year, vegetation and rain.
And i get this:
model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1|
grp.id),data=hornbill,family=binomial)
summary(model3)
Generalized linear mixed model fit by the Laplace approximation
Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id)
Data: hornbill
AIC BIC logLik deviance
143.8 159.4 -66.88 133.8
Random effects:
Groups Name Variance Std.Dev.
grp.id (Intercept) 0.75607 0.86953
Number of obs: 169, groups: grp.id, 23
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.6619 1.3528 -6.403 1.52e-10 ***
factor(art.n)1 1.5337 0.6420 2.389 0.0169 *
grp.sz 1.6631 0.2968 5.604 2.09e-08 ***
wood 3.2177 1.5793 2.037 0.0416 *
So all the variables are significant but the AIC value is higher!
I thought that with fewer variables and they are all showing
significance which means they are influencing breeding success-then
why is my AIC higher in this model??
Do i only look at the AIC values and ignore the p-values? or only
look at the p-values??
Thanks!!
_________________________________________________________________ [[alternative HTML version deleted]] _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Gwyneth Wilson wrote:
I have been running lmer models in R, looking at what effects reproductive success in Ground Hornbills (a South African Bird). My response variable is breeding success and is binomial (0-1) and my random effect is group ID. My response variables include rainfall, vegetation, group size, year, nests, and proportion of open woodland. I have run numerous models with success but I am confused about what the outputs are. When I run my first model with all my variables (all additive) then i get a low AIC value with only a few of the variables being significant. When i take out the varaibles that are not significant then my AIC becomes higher but I have more significant variables! When I keep taking out the unsignificant variables, I am left with a model that has nests, open woodland, and group size as being extremely significant BUT the AIC is high! Why is my AIC value increasing when I have fewer varaibles that are all significant and seem to be best explaining my data? Do i look at only the AIC when choosing the 'best' model or do I look at only the p-values? or both? The model with the lowest AIC at the moment has the most variables and most are not significant?
This happens a lot when you have correlated variables: although I don't agree with absolutely everything it says, Zuur et al 2009 is a good start for looking at this. When you have correlated variables, it's easy for them collectively to explain a lot of the pattern but individually not to explain much. Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution. doi: 10.1111/j.2041-210X.2009.00001.x. In general, you should *either* (1)fit all sensible models and model-average the results (if you are interested in prediction) or (2) use the full model to evaluate p-values, test hypotheses etc. (providing you have _already_ removed correlated variables). In general (although Murtaugh 2009 provides a counterexample of sorts), you should **not** select a model and then (afterwards) evaluate the significance of the parameters in the model ... Murtaugh, P. A. 2009. Performance of several variable-selection methods applied to real ecological data. Ecology Letters 12:1061-1068. doi: 10.1111/j.1461-0248.2009.01361.x.
Please help. Any suggestions would be great!! Here is some more information and some of my outputs: The first model has all my variables included and i get a low AIC with only grp.sz and wood being significant: model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1|grp.id),data=hornbill,family=binomial)
summary(model1)
Generalized linear mixed model fit by the Laplace approximation Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg + wood + (1 | grp.id) Data: hornbill AIC BIC logLik deviance 138.5 182.3 -55.26 110.5 Random effects: Groups Name Variance Std.Dev. grp.id (Intercept) 1.2913 1.1364 Number of obs: 169, groups: grp.id, 23 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.930736 3.672337 -1.070 0.2845 factor(art.n)1 1.462829 0.903328 1.619 0.1054 factor(yr)2002 -2.592315 1.764551 -1.469 0.1418 factor(yr)2003 -3.169365 1.759981 -1.801 0.0717 . factor(yr)2004 0.702210 1.341524 0.523 0.6007 factor(yr)2005 -2.264257 1.722130 -1.315 0.1886 factor(yr)2006 2.129728 1.270996 1.676 0.0938 . factor(yr)2007 -0.579961 1.390345 -0.417 0.6766 factor(yr)2008 -1.062933 1.640774 -0.648 0.5171 grp.sz 1.882616 0.368317 5.111 3.2e-07 *** rain -0.005896 0.003561 -1.656 0.0977 . veg -1.993443 1.948738 -1.023 0.3063 wood 6.832543 3.050573 2.240 0.0251 * Then i carry on and remove varaibles i think are not having an influence on breeding success like the year, vegetation and rain. And i get this: model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1|grp.id),data=hornbill,family=binomial)
summary(model3)
Generalized linear mixed model fit by the Laplace approximation Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id) Data: hornbill AIC BIC logLik deviance 143.8 159.4 -66.88 133.8 Random effects: Groups Name Variance Std.Dev. grp.id (Intercept) 0.75607 0.86953 Number of obs: 169, groups: grp.id, 23 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.6619 1.3528 -6.403 1.52e-10 *** factor(art.n)1 1.5337 0.6420 2.389 0.0169 * grp.sz 1.6631 0.2968 5.604 2.09e-08 *** wood 3.2177 1.5793 2.037 0.0416 * So all the variables are significant but the AIC value is higher! I thought that with fewer variables and they are all showing significance which means they are influencing breeding success-then why is my AIC higher in this model?? Do i only look at the AIC values and ignore the p-values? or only look at the p-values?? Thanks!!
_________________________________________________________________ [[alternative HTML version deleted]] _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bolker at ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
On Thu, 2009-12-03 at 14:43 -0500, Ben Bolker wrote:
Gwyneth Wilson wrote:
I have been running lmer models in R, looking at what effects reproductive success in Ground Hornbills (a South African Bird). My response variable is breeding success and is binomial (0-1) and my random effect is group ID. My response variables include rainfall, vegetation, group size, year, nests, and proportion of open woodland. I have run numerous models with success but I am confused about what the outputs are. When I run my first model with all my variables (all additive) then i get a low AIC value with only a few of the variables being significant. When i take out the varaibles that are not significant then my AIC becomes higher but I have more significant variables! When I keep taking out the unsignificant variables, I am left with a model that has nests, open woodland, and group size as being extremely significant BUT the AIC is high! Why is my AIC value increasing when I have fewer varaibles that are all significant and seem to be best explaining my data? Do i look at only the AIC when choosing the 'best' model or do I look at only the p-values? or both? The model with the lowest AIC at the moment has the most variables and most are not significant?
This happens a lot when you have correlated variables: although I don't agree with absolutely everything it says, Zuur et al 2009 is a good start for looking at this. When you have correlated variables, it's easy for them collectively to explain a lot of the pattern but individually not to explain much. Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution. doi: 10.1111/j.2041-210X.2009.00001.x. In general, you should *either* (1)fit all sensible models and model-average the results (if you are interested in prediction) or (2) use the full model to evaluate p-values, test hypotheses etc. (providing you have _already_ removed correlated variables). In general (although Murtaugh 2009 provides a counterexample of sorts), you should **not** select a model and then (afterwards) evaluate the significance of the parameters in the model ...
Is this in the context of non-nested models? Otherwise, a very common scenario is to test interaction terms first and then remove from the model if not significant (i.e., to test the significance of main effects).
Murtaugh, P. A. 2009. Performance of several variable-selection methods applied to real ecological data. Ecology Letters 12:1061-1068. doi: 10.1111/j.1461-0248.2009.01361.x.
Please help. Any suggestions would be great!! Here is some more information and some of my outputs: The first model has all my variables included and i get a low AIC with only grp.sz and wood being significant: model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1|grp.id),data=hornbill,family=binomial)
summary(model1)
Generalized linear mixed model fit by the Laplace approximation Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg + wood + (1 | grp.id) Data: hornbill AIC BIC logLik deviance 138.5 182.3 -55.26 110.5 Random effects: Groups Name Variance Std.Dev. grp.id (Intercept) 1.2913 1.1364 Number of obs: 169, groups: grp.id, 23 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.930736 3.672337 -1.070 0.2845 factor(art.n)1 1.462829 0.903328 1.619 0.1054 factor(yr)2002 -2.592315 1.764551 -1.469 0.1418 factor(yr)2003 -3.169365 1.759981 -1.801 0.0717 . factor(yr)2004 0.702210 1.341524 0.523 0.6007 factor(yr)2005 -2.264257 1.722130 -1.315 0.1886 factor(yr)2006 2.129728 1.270996 1.676 0.0938 . factor(yr)2007 -0.579961 1.390345 -0.417 0.6766 factor(yr)2008 -1.062933 1.640774 -0.648 0.5171 grp.sz 1.882616 0.368317 5.111 3.2e-07 *** rain -0.005896 0.003561 -1.656 0.0977 . veg -1.993443 1.948738 -1.023 0.3063 wood 6.832543 3.050573 2.240 0.0251 * Then i carry on and remove varaibles i think are not having an influence on breeding success like the year, vegetation and rain. And i get this: model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1|grp.id),data=hornbill,family=binomial)
summary(model3)
Generalized linear mixed model fit by the Laplace approximation Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id) Data: hornbill AIC BIC logLik deviance 143.8 159.4 -66.88 133.8 Random effects: Groups Name Variance Std.Dev. grp.id (Intercept) 0.75607 0.86953 Number of obs: 169, groups: grp.id, 23 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) -8.6619 1.3528 -6.403 1.52e-10 *** factor(art.n)1 1.5337 0.6420 2.389 0.0169 * grp.sz 1.6631 0.2968 5.604 2.09e-08 *** wood 3.2177 1.5793 2.037 0.0416 * So all the variables are significant but the AIC value is higher! I thought that with fewer variables and they are all showing significance which means they are influencing breeding success-then why is my AIC higher in this model?? Do i only look at the AIC values and ignore the p-values? or only look at the p-values?? Thanks!!
_________________________________________________________________ [[alternative HTML version deleted]] _______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Manuel Morales wrote:
On Thu, 2009-12-03 at 14:43 -0500, Ben Bolker wrote:
Gwyneth Wilson wrote:
I have been running lmer models in R, looking at what effects reproductive success in Ground Hornbills (a South African Bird). My response variable is breeding success and is binomial (0-1) and my random effect is group ID. My response variables include rainfall, vegetation, group size, year, nests, and proportion of open woodland. I have run numerous models with success but I am confused about what the outputs are. When I run my first model with all my variables (all additive) then i get a low AIC value with only a few of the variables being significant. When i take out the varaibles that are not significant then my AIC becomes higher but I have more significant variables! When I keep taking out the unsignificant variables, I am left with a model that has nests, open woodland, and group size as being extremely significant BUT the AIC is high! Why is my AIC value increasing when I have fewer varaibles that are all significant and seem to be best explaining my data? Do i look at only the AIC when choosing the 'best' model or do I look at only the p-values? or both? The model with the lowest AIC at the moment has the most variables and most are not significant?
This happens a lot when you have correlated variables: although I don't agree with absolutely everything it says, Zuur et al 2009 is a good start for looking at this. When you have correlated variables, it's easy for them collectively to explain a lot of the pattern but individually not to explain much. Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data exploration to avoid common statistical problems. Methods in Ecology and Evolution. doi: 10.1111/j.2041-210X.2009.00001.x. In general, you should *either* (1)fit all sensible models and model-average the results (if you are interested in prediction) or (2) use the full model to evaluate p-values, test hypotheses etc. (providing you have _already_ removed correlated variables). In general (although Murtaugh 2009 provides a counterexample of sorts), you should **not** select a model and then (afterwards) evaluate the significance of the parameters in the model ...
Is this in the context of non-nested models? Otherwise, a very common scenario is to test interaction terms first and then remove from the model if not significant (i.e., to test the significance of main effects).
Yes. I think removing interactions is technically violating the "don't select models and then test them" rule, but it also seems reasonable to remove a _small_ number of non-significant interactions on the grounds of interpretability. (I believe Pinheiro and Bates do this to some extent in the example in PB2000). cheers Ben
Below are two examples of the use of mcmcsamp() with output
from lmer(). The first set of results are believable.
For the second set of results, both variance estimates
(for 'site' and for 'Residual' or scale) lie outside of the
credible intervals that are obtained. Assuming I have
used the functions correctly, it seems surprising that
mcmcsamp() would 'fail' on the second example, which is
balanced and where both variances seem well away from
zero. These results are consistent over repeated
simulations.
I'd be interested to hear from list members who make regular use of
mcmcsamp(), as well as maybe from Douglas. Is there any advance on
the current routines in the development branch?
Questions:
(1) Are instances of less obvious failoure common? How does one check?
(2) Is there are more direct way to get the credible intervals?
(3) What insight is avaiable on why the second example fails?
John Maindonald.
## EXAMPLE 1:
library(DAAG)
science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class),
data = science)
science1.lmer ...
Random effects: Groups Name Variance Std.Dev. school:class (Intercept) 0.321 0.566 Residual 3.052 1.747 Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")
## The following are believable! t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5% 97.5% school:class 0.1442 0.4334 Residual 2.8601 3.3427 ## EXAMPLE 2: ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
... Random effects: Groups Name Variance Std.Dev. site (Intercept) 2.368 1.54 Residual 0.578 0.76 Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5% 97.5% site 0.2376 1.878 Residual 0.6370 2.488 John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
Hi All, There are 8 groups in the second example versus 66 in the first. Are we seeing a consistency/central limit problem. Thus, mcmcsamp() works when the posterior is plausibly mvnormal but not otherwise and in this particular example the information content of the data is too low? How's that for a guess? Jake http://jakebowers.org On Dec 5, 2009, at 2:21 AM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
Below are two examples of the use of mcmcsamp() with output
from lmer(). The first set of results are believable.
For the second set of results, both variance estimates
(for 'site' and for 'Residual' or scale) lie outside of the
credible intervals that are obtained. Assuming I have
used the functions correctly, it seems surprising that
mcmcsamp() would 'fail' on the second example, which is
balanced and where both variances seem well away from
zero. These results are consistent over repeated
simulations.
I'd be interested to hear from list members who make regular use of
mcmcsamp(), as well as maybe from Douglas. Is there any advance on
the current routines in the development branch?
Questions:
(1) Are instances of less obvious failoure common? How does one check?
(2) Is there are more direct way to get the credible intervals?
(3) What insight is avaiable on why the second example fails?
John Maindonald.
## EXAMPLE 1:
library(DAAG)
science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class),
data = science)
science1.lmer ...
Random effects: Groups Name Variance Std.Dev. school:class (Intercept) 0.321 0.566 Residual 3.052 1.747 Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")
## The following are believable! t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5% 97.5% school:class 0.1442 0.4334 Residual 2.8601 3.3427 ## EXAMPLE 2: ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
... Random effects: Groups Name Variance Std.Dev. site (Intercept) 2.368 1.54 Residual 0.578 0.76 Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5% 97.5% site 0.2376 1.878 Residual 0.6370 2.488 John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
1 day later
Reasonably recently, it used to be possible, with glmer, to
associate one random term with each observation. With the
current version of glmer(), I find that this is not allowed. In the
case I was using it for, this allowed me to fit a random between
observations variance that was additive on the scale of the
linear predictor, rather than as with the dispersion estimate
fudge which estimates a constant multiplier for the theoretical
variance. I am wondering what the reason may be for disallowing
this; does it unduly complicate code somewhere or other?
I am using lme4_0.999375-32; Matrix_0.999375-32
Here is what I had been doing:
library(DAAG)
moths$transect <- 1:41 # Each row is from a different transect
moths$habitat <- relevel(moths$habitat, ref="Lowerside")
(A.glmer <- glmer(A~habitat+log(meters)+(1|transect),
family=poisson, data=moths))
Generalized linear mixed model fit by the Laplace approximation
Formula: A ~ habitat + log(meters) + (1 | transect)
Data: moths
AIC BIC logLik deviance
95 112 -37.5 75
Random effects:
Groups Name Variance Std.Dev.
transect (Intercept) 0.234 0.483
Number of obs: 41, groups: transect, 41
...
Thanks
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
I agree that mcmcsamp in current versions of the lme4 package provide suspect results. I still have problems formulating an MCMC update for the variance components when there is a possibility the value getting close to zero. In the development branch of the lme4 package I have added profiling of the deviance with respect to the parameters as a method of determining the precision of the parameter estimates. This package is called lme4a in the R packages tab of http://lme4.r-forge.r-project.org/ Unfortunately it looks like the Windows binary is not available at R-forge (and the error message looks peculiar because it is an error from an R declarations file - it may be that I don't have the correct sequence of include files). I usually look at profiles of the change in the deviance by taking the signed square root transformation. Results for your models are enclosed. In these cases there aren't problems with the variance of the random effects approaching zero. In the case of the Dyestuff data we do get such behavior and the splom plot shows some rough edges as a result. On Sat, Dec 5, 2009 at 2:21 AM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
Below are two examples of the use of mcmcsamp() with output from lmer(). ?The first set of results are believable. For the second set of results, both variance estimates (for 'site' and for 'Residual' or scale) lie outside of the credible intervals that are obtained. ?Assuming I have used the functions correctly, it seems surprising that mcmcsamp() would 'fail' on the second example, which is balanced and where both variances seem well away from zero. ?These results are consistent over repeated simulations. I'd be interested to hear from list members who make regular use of mcmcsamp(), as well as maybe from Douglas. Is there any advance on the current routines in the development branch? Questions: (1) Are instances of less obvious failoure common? How does one check? (2) Is there are more direct way to get the credible intervals? (3) What insight is avaiable on why the second example fails? John Maindonald. ## EXAMPLE 1: library(DAAG) science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class), ? ? ? ? ? ? ? ? ? ? data = science)
science1.lmer ...
Random effects: Groups ? ? ? Name ? ? ? ?Variance Std.Dev. school:class (Intercept) 0.321 ? ?0.566 Residual ? ? ? ? ? ? ? ? 3.052 ? ?1.747 Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")
## The following are believable! t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
? ? ? ? ? ? ?2.5% ?97.5% school:class 0.1442 0.4334 Residual ? ? 2.8601 3.3427 ## EXAMPLE 2: ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
... Random effects: Groups ? Name ? ? ? ?Variance Std.Dev. site ? ? (Intercept) 2.368 ? ?1.54 Residual ? ? ? ? ? ? 0.578 ? ?0.76 Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
? ? ? ? ?2.5% 97.5% site ? ? 0.2376 1.878 Residual 0.6370 2.488 John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-------------- next part -------------- A non-text attachment was scrubbed... Name: Rplots.pdf Type: application/pdf Size: 175874 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20091207/e62db89c/attachment.pdf>
This looks promising, thanks for your efforts. Fortunately the MacOS X version runs fine, under R-devel (2.11.0 Under development). John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
On 08/12/2009, at 12:41 AM, Douglas Bates wrote:
I agree that mcmcsamp in current versions of the lme4 package provide suspect results. I still have problems formulating an MCMC update for the variance components when there is a possibility the value getting close to zero. In the development branch of the lme4 package I have added profiling of the deviance with respect to the parameters as a method of determining the precision of the parameter estimates. This package is called lme4a in the R packages tab of http://lme4.r-forge.r-project.org/ Unfortunately it looks like the Windows binary is not available at R-forge (and the error message looks peculiar because it is an error from an R declarations file - it may be that I don't have the correct sequence of include files). I usually look at profiles of the change in the deviance by taking the signed square root transformation. Results for your models are enclosed. In these cases there aren't problems with the variance of the random effects approaching zero. In the case of the Dyestuff data we do get such behavior and the splom plot shows some rough edges as a result. On Sat, Dec 5, 2009 at 2:21 AM, John Maindonald <john.maindonald at anu.edu.au> wrote:
Below are two examples of the use of mcmcsamp() with output
from lmer(). The first set of results are believable.
For the second set of results, both variance estimates
(for 'site' and for 'Residual' or scale) lie outside of the
credible intervals that are obtained. Assuming I have
used the functions correctly, it seems surprising that
mcmcsamp() would 'fail' on the second example, which is
balanced and where both variances seem well away from
zero. These results are consistent over repeated
simulations.
I'd be interested to hear from list members who make regular use of
mcmcsamp(), as well as maybe from Douglas. Is there any advance on
the current routines in the development branch?
Questions:
(1) Are instances of less obvious failoure common? How does one check?
(2) Is there are more direct way to get the credible intervals?
(3) What insight is avaiable on why the second example fails?
John Maindonald.
## EXAMPLE 1:
library(DAAG)
science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class),
data = science)
science1.lmer ...
Random effects: Groups Name Variance Std.Dev. school:class (Intercept) 0.321 0.566 Residual 3.052 1.747 Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")
## The following are believable! t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5% 97.5% school:class 0.1442 0.4334 Residual 2.8601 3.3427 ## EXAMPLE 2: ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
... Random effects: Groups Name Variance Std.Dev. site (Intercept) 2.368 1.54 Residual 0.578 0.76 Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5% 97.5% site 0.2376 1.878 Residual 0.6370 2.488 John Maindonald email: john.maindonald at anu.edu.au phone : +61 2 (6125)3473 fax : +61 2(6125)5549 Centre for Mathematics & Its Applications, Room 1194, John Dedman Mathematical Sciences Building (Building 27) Australian National University, Canberra ACT 0200. http://www.maths.anu.edu.au/~johnm
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
<john.Rout><Rplots.pdf>