lmer models-confusing results - more information!

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20091203/37a3f84b/attachment.pl>
Dear Gwyneth,

Since you're not getting any answers - I'l give it a go, at the risk  
of being wrong.

The likelihood for non-Gaussian GLMM cannot be obtained in closed  
form  and needs to be approximated. Often the approximation is good,  
but in some cases it can be bad, particularly with binary data when  
the incidence is extreme (low/high) and/or there is little replication  
within factor levels. In extreme cases the parameter estimates +/- the  
2*SE's do not even include the "true" values.

 From your fixed effect summary it appears that reproductive successes  
within some factor levels are all zero. If this is the case, this may  
well be what is causing the problem and treating year as a random  
effect may help. MCMC solutions are probably more robust for these  
types of data because they use approximations which get more exact the  
longer you run the analysis.

With regards to an earlier  email, over-dispersed binary data does not  
occur, because the mean determines the variance completely. This does  
not mean that the probability of success is constant (after  
conditioning on the model), it just means that any heterogeneity  
cannot be observed and therefore estimated. In short, you don't need  
to worry about it.

Cheers,

Jarrod

I have been running lmer models in R, looking at what effects  
reproductive success in Ground Hornbills (a South African Bird). My  
response variable is breeding success and is binomial (0-1) and my  
random effect is group ID. My response variables include rainfall,  
vegetation, group size, year, nests, and proportion of open woodland.

I have run numerous models with success but I am confused about what  
the outputs are. When I run my first model with all my variables  
(all additive) then i get a low AIC value with only a few of the  
variables being significant. When i take out the varaibles that are  
not significant then my AIC becomes higher but I have more  
significant variables! When I keep taking out the unsignificant  
variables, I am left with a model that has nests, open woodland, and  
group size as being extremely significant BUT the AIC is high! Why  
is my AIC value increasing when I have fewer varaibles that are all  
significant and seem to be best explaining my data? Do i look at  
only the AIC when choosing the 'best' model or do I look at only the  
p-values? or both? The model with the lowest AIC at the moment has  
the most variables and most are not significant?

Please help. Any suggestions would be great!!

Here is some more information and some of my outputs:

The first model has all my variables included and i get a low AIC  
with only grp.sz and wood being significant:

model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1| 
grp.id),data=hornbill,family=binomial)
summary(model1)
Generalized linear mixed model fit by the Laplace approximation
Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg +  
wood +      (1 | grp.id)
  Data: hornbill
  AIC   BIC    logLik   deviance
138.5 182.3  -55.26    110.5
Random effects:
Groups Name   Variance Std.Dev.
grp.id (Intercept) 1.2913   1.1364
Number of obs: 169, groups: grp.id, 23

Fixed effects:
                 Estimate      Std. Error  z value  Pr(>|z|)
(Intercept)      -3.930736   3.672337  -1.070    0.2845
factor(art.n)1  1.462829   0.903328   1.619     0.1054
factor(yr)2002 -2.592315   1.764551  -1.469   0.1418
factor(yr)2003 -3.169365   1.759981  -1.801   0.0717 .
factor(yr)2004  0.702210   1.341524   0.523   0.6007
factor(yr)2005 -2.264257   1.722130  -1.315   0.1886
factor(yr)2006  2.129728   1.270996   1.676   0.0938 .
factor(yr)2007 -0.579961   1.390345  -0.417   0.6766
factor(yr)2008 -1.062933   1.640774  -0.648   0.5171
grp.sz             1.882616    0.368317   5.111   3.2e-07 ***
rain                -0.005896   0.003561  -1.656   0.0977 .
veg                 -1.993443   1.948738  -1.023   0.3063
wood               6.832543   3.050573   2.240   0.0251 *

Then i carry on and remove varaibles i think are not having an  
influence on breeding success like the year, vegetation and rain.  
And i get this:

model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1| 
grp.id),data=hornbill,family=binomial)
summary(model3)
Generalized linear mixed model fit by the Laplace approximation
Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id)
  Data: hornbill
  AIC    BIC    logLik deviance
143.8  159.4  -66.88    133.8
Random effects:
Groups Name        Variance Std.Dev.
grp.id (Intercept)     0.75607  0.86953
Number of obs: 169, groups: grp.id, 23

Fixed effects:
                  Estimate Std. Error  z value   Pr(>|z|)
(Intercept)      -8.6619     1.3528   -6.403   1.52e-10 ***
factor(art.n)1   1.5337     0.6420    2.389    0.0169 *
grp.sz            1.6631     0.2968    5.604    2.09e-08 ***
wood              3.2177     1.5793    2.037   0.0416 *

So all the variables are significant but the AIC value is higher!

I thought that with fewer variables and they are all showing  
significance which means they are influencing breeding success-then  
why is my AIC higher in this model??
Do i only look at the AIC values and ignore the p-values? or only  
look at the p-values??

Thanks!!

_________________________________________________________________

	[[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
I have been running lmer models in R, looking at what effects
reproductive success in Ground Hornbills (a South African Bird). My
response variable is breeding success and is binomial (0-1) and my
random effect is group ID. My response variables include rainfall,
vegetation, group size, year, nests, and proportion of open woodland.

I have run numerous models with success but I am confused about what
the outputs are. When I run my first model with all my variables (all
additive) then i get a low AIC value with only a few of the variables
being significant. When i take out the varaibles that are not
significant then my AIC becomes higher but I have more significant
variables! When I keep taking out the unsignificant variables, I am
left with a model that has nests, open woodland, and group size as
being extremely significant BUT the AIC is high! Why is my AIC value
increasing when I have fewer varaibles that are all significant and
seem to be best explaining my data? Do i look at only the AIC when
choosing the 'best' model or do I look at only the p-values? or both?
The model with the lowest AIC at the moment has the most variables
and most are not significant?
This happens a lot when you have correlated variables: although I
don't agree with absolutely everything it says, Zuur et al 2009 is a
good start for looking at this. When you have correlated variables, it's
easy for them collectively to explain a lot of the pattern but
individually not to explain much.

Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data
exploration to avoid common statistical problems. Methods in Ecology and
Evolution. doi: 10.1111/j.2041-210X.2009.00001.x.

  In general, you should *either* (1)fit all sensible models and
model-average the results (if you are interested in prediction) or (2)
use the full model to evaluate p-values, test hypotheses etc. (providing
you have _already_ removed correlated variables).  In general (although
Murtaugh 2009 provides a counterexample of sorts), you should **not**
select a model and then (afterwards) evaluate the significance of the
parameters in the model ...

Murtaugh, P. A. 2009. Performance of several variable-selection methods
applied to real ecological data. Ecology Letters 12:1061-1068. doi:
10.1111/j.1461-0248.2009.01361.x.
Please help. Any suggestions would be great!!

Here is some more information and some of my outputs:

The first model has all my variables included and i get a low AIC
with only grp.sz and wood being significant:

model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1|grp.id),data=hornbill,family=binomial)

summary(model1)
Generalized linear mixed model fit by the Laplace approximation 
Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg +
wood +      (1 | grp.id) Data: hornbill AIC   BIC    logLik
deviance 138.5 182.3  -55.26    110.5 Random effects: Groups Name
Variance Std.Dev. grp.id (Intercept) 1.2913   1.1364 Number of obs:
169, groups: grp.id, 23

Fixed effects: Estimate      Std. Error  z value  Pr(>|z|) 
(Intercept)      -3.930736   3.672337  -1.070    0.2845 
factor(art.n)1  1.462829   0.903328   1.619     0.1054 factor(yr)2002
-2.592315   1.764551  -1.469   0.1418 factor(yr)2003 -3.169365
1.759981  -1.801   0.0717 . factor(yr)2004  0.702210   1.341524
0.523   0.6007 factor(yr)2005 -2.264257   1.722130  -1.315   0.1886
 factor(yr)2006  2.129728   1.270996   1.676   0.0938 . 
factor(yr)2007 -0.579961   1.390345  -0.417   0.6766 factor(yr)2008
-1.062933   1.640774  -0.648   0.5171 grp.sz             1.882616
0.368317   5.111   3.2e-07 *** rain                -0.005896
0.003561  -1.656   0.0977 . veg                 -1.993443   1.948738
-1.023   0.3063 wood               6.832543   3.050573   2.240
0.0251 *

Then i carry on and remove varaibles i think are not having an
influence on breeding success like the year, vegetation and rain. And
i get this:

model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1|grp.id),data=hornbill,family=binomial)

summary(model3)
Generalized linear mixed model fit by the Laplace approximation 
Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id) Data:
hornbill AIC    BIC    logLik deviance 143.8  159.4  -66.88    133.8 
Random effects: Groups Name        Variance Std.Dev. grp.id
(Intercept)     0.75607  0.86953 Number of obs: 169, groups: grp.id,
23

Fixed effects: Estimate Std. Error  z value   Pr(>|z|) (Intercept)
-8.6619     1.3528   -6.403   1.52e-10 *** factor(art.n)1   1.5337
0.6420    2.389    0.0169 * grp.sz            1.6631     0.2968
5.604    2.09e-08 *** wood              3.2177     1.5793    2.037
0.0416 *

So all the variables are significant but the AIC value is higher!

I thought that with fewer variables and they are all showing
significance which means they are influencing breeding success-then
why is my AIC higher in this model?? Do i only look at the AIC values
and ignore the p-values? or only look at the p-values??

Thanks!!

 _________________________________________________________________

[[alternative HTML version deleted]]

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bolker at ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
Gwyneth Wilson wrote:
I have been running lmer models in R, looking at what effects
reproductive success in Ground Hornbills (a South African Bird). My
response variable is breeding success and is binomial (0-1) and my
random effect is group ID. My response variables include rainfall,
vegetation, group size, year, nests, and proportion of open woodland.

I have run numerous models with success but I am confused about what
the outputs are. When I run my first model with all my variables (all
additive) then i get a low AIC value with only a few of the variables
being significant. When i take out the varaibles that are not
significant then my AIC becomes higher but I have more significant
variables! When I keep taking out the unsignificant variables, I am
left with a model that has nests, open woodland, and group size as
being extremely significant BUT the AIC is high! Why is my AIC value
increasing when I have fewer varaibles that are all significant and
seem to be best explaining my data? Do i look at only the AIC when
choosing the 'best' model or do I look at only the p-values? or both?
The model with the lowest AIC at the moment has the most variables
and most are not significant?
   This happens a lot when you have correlated variables: although I
don't agree with absolutely everything it says, Zuur et al 2009 is a
good start for looking at this. When you have correlated variables, it's
easy for them collectively to explain a lot of the pattern but
individually not to explain much.

Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data
exploration to avoid common statistical problems. Methods in Ecology and
Evolution. doi: 10.1111/j.2041-210X.2009.00001.x.

  In general, you should *either* (1)fit all sensible models and
model-average the results (if you are interested in prediction) or (2)
use the full model to evaluate p-values, test hypotheses etc. (providing
you have _already_ removed correlated variables).  In general (although
Murtaugh 2009 provides a counterexample of sorts), you should **not**
select a model and then (afterwards) evaluate the significance of the
parameters in the model ...
Is this in the context of non-nested models? Otherwise, a very common
scenario is to test interaction terms first and then remove from the
model if not significant (i.e., to test the significance of main
effects).
Murtaugh, P. A. 2009. Performance of several variable-selection methods
applied to real ecological data. Ecology Letters 12:1061-1068. doi:
10.1111/j.1461-0248.2009.01361.x.

Please help. Any suggestions would be great!!

Here is some more information and some of my outputs:

The first model has all my variables included and i get a low AIC
with only grp.sz and wood being significant:

model1<-lmer(br.su~factor(art.n)+factor(yr)+grp.sz+rain+veg+wood+(1|grp.id),data=hornbill,family=binomial)

summary(model1)
Generalized linear mixed model fit by the Laplace approximation 
Formula: br.su ~ factor(art.n) + factor(yr) + grp.sz + rain + veg +
wood +      (1 | grp.id) Data: hornbill AIC   BIC    logLik
deviance 138.5 182.3  -55.26    110.5 Random effects: Groups Name
Variance Std.Dev. grp.id (Intercept) 1.2913   1.1364 Number of obs:
169, groups: grp.id, 23

Fixed effects: Estimate      Std. Error  z value  Pr(>|z|) 
(Intercept)      -3.930736   3.672337  -1.070    0.2845 
factor(art.n)1  1.462829   0.903328   1.619     0.1054 factor(yr)2002
-2.592315   1.764551  -1.469   0.1418 factor(yr)2003 -3.169365
1.759981  -1.801   0.0717 . factor(yr)2004  0.702210   1.341524
0.523   0.6007 factor(yr)2005 -2.264257   1.722130  -1.315   0.1886
 factor(yr)2006  2.129728   1.270996   1.676   0.0938 . 
factor(yr)2007 -0.579961   1.390345  -0.417   0.6766 factor(yr)2008
-1.062933   1.640774  -0.648   0.5171 grp.sz             1.882616
0.368317   5.111   3.2e-07 *** rain                -0.005896
0.003561  -1.656   0.0977 . veg                 -1.993443   1.948738
-1.023   0.3063 wood               6.832543   3.050573   2.240
0.0251 *

Then i carry on and remove varaibles i think are not having an
influence on breeding success like the year, vegetation and rain. And
i get this:

model3<-lmer(br.su~factor(art.n)+grp.sz+wood+(1|grp.id),data=hornbill,family=binomial)

summary(model3)
Generalized linear mixed model fit by the Laplace approximation 
Formula: br.su ~ factor(art.n) + grp.sz + wood + (1 | grp.id) Data:
hornbill AIC    BIC    logLik deviance 143.8  159.4  -66.88    133.8 
Random effects: Groups Name        Variance Std.Dev. grp.id
(Intercept)     0.75607  0.86953 Number of obs: 169, groups: grp.id,
23

Fixed effects: Estimate Std. Error  z value   Pr(>|z|) (Intercept)
-8.6619     1.3528   -6.403   1.52e-10 *** factor(art.n)1   1.5337
0.6420    2.389    0.0169 * grp.sz            1.6631     0.2968
5.604    2.09e-08 *** wood              3.2177     1.5793    2.037
0.0416 *

So all the variables are significant but the AIC value is higher!

I thought that with fewer variables and they are all showing
significance which means they are influencing breeding success-then
why is my AIC higher in this model?? Do i only look at the AIC values
and ignore the p-values? or only look at the p-values??

Thanks!!

 _________________________________________________________________

[[alternative HTML version deleted]]

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

http://mutualism.williams.edu
On Thu, 2009-12-03 at 14:43 -0500, Ben Bolker wrote:
Gwyneth Wilson wrote:
I have been running lmer models in R, looking at what effects
reproductive success in Ground Hornbills (a South African Bird). My
response variable is breeding success and is binomial (0-1) and my
random effect is group ID. My response variables include rainfall,
vegetation, group size, year, nests, and proportion of open woodland.

I have run numerous models with success but I am confused about what
the outputs are. When I run my first model with all my variables (all
additive) then i get a low AIC value with only a few of the variables
being significant. When i take out the varaibles that are not
significant then my AIC becomes higher but I have more significant
variables! When I keep taking out the unsignificant variables, I am
left with a model that has nests, open woodland, and group size as
being extremely significant BUT the AIC is high! Why is my AIC value
increasing when I have fewer varaibles that are all significant and
seem to be best explaining my data? Do i look at only the AIC when
choosing the 'best' model or do I look at only the p-values? or both?
The model with the lowest AIC at the moment has the most variables
and most are not significant?
   This happens a lot when you have correlated variables: although I
don't agree with absolutely everything it says, Zuur et al 2009 is a
good start for looking at this. When you have correlated variables, it's
easy for them collectively to explain a lot of the pattern but
individually not to explain much.

Zuur, A. F., E. N. Ieno, and C. S. Elphick. 2009. A protocol for data
exploration to avoid common statistical problems. Methods in Ecology and
Evolution. doi: 10.1111/j.2041-210X.2009.00001.x.

  In general, you should *either* (1)fit all sensible models and
model-average the results (if you are interested in prediction) or (2)
use the full model to evaluate p-values, test hypotheses etc. (providing
you have _already_ removed correlated variables).  In general (although
Murtaugh 2009 provides a counterexample of sorts), you should **not**
select a model and then (afterwards) evaluate the significance of the
parameters in the model ...
Is this in the context of non-nested models? Otherwise, a very common
scenario is to test interaction terms first and then remove from the
model if not significant (i.e., to test the significance of main
effects).
Yes.  I think removing interactions is technically violating the
"don't select models and then test them" rule, but it also seems
reasonable to remove a _small_ number of non-significant interactions on
the grounds of interpretability.  (I believe Pinheiro and Bates do this
to some extent in the example in PB2000).

  cheers
   Ben
Below are two examples of the use of mcmcsamp() with output
from lmer().  The first set of results are believable.  
For the second set of results, both variance estimates
(for 'site' and for 'Residual' or scale) lie outside of the
credible intervals that are obtained.  Assuming I have
used the functions correctly, it seems surprising that
mcmcsamp() would 'fail' on the second example, which is
balanced and where both variances seem well away from
zero.  These results are consistent over repeated 
simulations.

I'd be interested to hear from list members who make regular use of
mcmcsamp(), as well as maybe from Douglas. Is there any advance on
the current routines in the development branch? 

Questions:

(1) Are instances of less obvious failoure common? How does one check?  
(2) Is there are more direct way to get the credible intervals?
(3) What insight is avaiable on why the second example fails?

John Maindonald.

## EXAMPLE 1:

library(DAAG)
science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class),
                     data = science)
science1.lmer
...
Random effects:
Groups       Name        Variance Std.Dev.
school:class (Intercept) 0.321    0.566   
Residual                 3.052    1.747   
Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")
## The following are believable!
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5%  97.5%
school:class 0.1442 0.4334
Residual     2.8601 3.3427

## EXAMPLE 2:
ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
...
Random effects:
Groups   Name        Variance Std.Dev.
site     (Intercept) 2.368    1.54    
Residual             0.578    0.76
Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
2.5% 97.5%
site     0.2376 1.878
Residual 0.6370 2.488

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
Hi All, There are 8 groups in the second example versus 66 in the  
first. Are we seeing a consistency/central limit problem. Thus,  
mcmcsamp() works when the posterior is plausibly mvnormal but not  
otherwise and in this particular example the information content of  
the data is too low? How's that for a guess? Jake

http://jakebowers.org

On Dec 5, 2009, at 2:21 AM, John Maindonald

Below are two examples of the use of mcmcsamp() with output
from lmer().  The first set of results are believable.
For the second set of results, both variance estimates
(for 'site' and for 'Residual' or scale) lie outside of the
credible intervals that are obtained.  Assuming I have
used the functions correctly, it seems surprising that
mcmcsamp() would 'fail' on the second example, which is
balanced and where both variances seem well away from
zero.  These results are consistent over repeated
simulations.

I'd be interested to hear from list members who make regular use of
mcmcsamp(), as well as maybe from Douglas. Is there any advance on
the current routines in the development branch?

Questions:

(1) Are instances of less obvious failoure common? How does one check?
(2) Is there are more direct way to get the credible intervals?
(3) What insight is avaiable on why the second example fails?

John Maindonald.

## EXAMPLE 1:

library(DAAG)
science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class),
                    data = science)
science1.lmer
...
Random effects:
Groups       Name        Variance Std.Dev.
school:class (Intercept) 0.321    0.566
Residual                 3.052    1.747
Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")

## The following are believable!
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
             2.5%  97.5%
school:class 0.1442 0.4334
Residual     2.8601 3.3427

## EXAMPLE 2:
ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
...
Random effects:
Groups   Name        Variance Std.Dev.
site     (Intercept) 2.368    1.54
Residual             0.578    0.76
Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
         2.5% 97.5%
site     0.2376 1.878
Residual 0.6370 2.488

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Reasonably recently, it used to be possible, with glmer, to
associate one random term with each observation.  With the
current version of glmer(), I find that this is not allowed.  In the
case I was using it for, this allowed me to fit a random between
observations variance that was additive on the scale of the
linear predictor, rather than as with the dispersion estimate
fudge which estimates a constant multiplier for the theoretical
variance.  I am wondering what the reason may be for disallowing
this; does it unduly complicate code somewhere or other?

I am using lme4_0.999375-32;    Matrix_0.999375-32

Here is what I had been doing:

library(DAAG)
moths$transect <- 1:41  # Each row is from a different transect
moths$habitat <- relevel(moths$habitat, ref="Lowerside")
(A.glmer <-  glmer(A~habitat+log(meters)+(1|transect), 
                                family=poisson, data=moths))

Generalized linear mixed model fit by the Laplace approximation 
Formula: A ~ habitat + log(meters) + (1 | transect) 
  Data: moths 
AIC BIC logLik deviance
 95 112  -37.5       75
Random effects:
Groups   Name        Variance Std.Dev.
transect (Intercept) 0.234    0.483   
Number of obs: 41, groups: transect, 41

...

Thanks

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm
I agree that mcmcsamp in current versions of the lme4 package provide
suspect results.  I still have problems formulating an MCMC update for
the variance components when there is a possibility the value getting
close to zero.

In the development branch of the lme4 package I have added profiling
of the deviance with respect to the parameters as a method of
determining the precision of the parameter estimates.  This package is
called lme4a in the R packages tab of
http://lme4.r-forge.r-project.org/  Unfortunately it looks like the
Windows binary is not available at R-forge (and the error message
looks peculiar because it is an error from an R declarations file - it
may be that I don't have the correct sequence of include files).

I usually look at profiles of the change in the deviance by taking the
signed square root transformation.  Results for your models are
enclosed.  In these cases there aren't problems with the variance of
the random effects approaching zero.  In the case of the Dyestuff data
we do get such behavior and the splom plot shows some rough edges as a
result.

On Sat, Dec 5, 2009 at 2:21 AM, John Maindonald
Below are two examples of the use of mcmcsamp() with output
from lmer(). ?The first set of results are believable.
For the second set of results, both variance estimates
(for 'site' and for 'Residual' or scale) lie outside of the
credible intervals that are obtained. ?Assuming I have
used the functions correctly, it seems surprising that
mcmcsamp() would 'fail' on the second example, which is
balanced and where both variances seem well away from
zero. ?These results are consistent over repeated
simulations.

I'd be interested to hear from list members who make regular use of
mcmcsamp(), as well as maybe from Douglas. Is there any advance on
the current routines in the development branch?

Questions:

(1) Are instances of less obvious failoure common? How does one check?
(2) Is there are more direct way to get the credible intervals?
(3) What insight is avaiable on why the second example fails?

John Maindonald.

## EXAMPLE 1:

library(DAAG)
science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class),
? ? ? ? ? ? ? ? ? ? data = science)
science1.lmer
...
Random effects:
Groups ? ? ? Name ? ? ? ?Variance Std.Dev.
school:class (Intercept) 0.321 ? ?0.566
Residual ? ? ? ? ? ? ? ? 3.052 ? ?1.747
Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")

## The following are believable!
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
? ? ? ? ? ? ?2.5% ?97.5%
school:class 0.1442 0.4334
Residual ? ? 2.8601 3.3427

## EXAMPLE 2:
ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
...
Random effects:
Groups ? Name ? ? ? ?Variance Std.Dev.
site ? ? (Intercept) 2.368 ? ?1.54
Residual ? ? ? ? ? ? 0.578 ? ?0.76
Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
? ? ? ? ?2.5% 97.5%
site ? ? 0.2376 1.878
Residual 0.6370 2.488

John Maindonald ? ? ? ? ? ? email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 ? ?fax ?: +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Rplots.pdf
Type: application/pdf
Size: 175874 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-sig-mixed-models/attachments/20091207/e62db89c/attachment.pdf>
This looks promising, thanks for your efforts.  Fortunately the MacOS X 
version runs fine, under R-devel (2.11.0 Under development).

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

I agree that mcmcsamp in current versions of the lme4 package provide
suspect results.  I still have problems formulating an MCMC update for
the variance components when there is a possibility the value getting
close to zero.

In the development branch of the lme4 package I have added profiling
of the deviance with respect to the parameters as a method of
determining the precision of the parameter estimates.  This package is
called lme4a in the R packages tab of
http://lme4.r-forge.r-project.org/  Unfortunately it looks like the
Windows binary is not available at R-forge (and the error message
looks peculiar because it is an error from an R declarations file - it
may be that I don't have the correct sequence of include files).

I usually look at profiles of the change in the deviance by taking the
signed square root transformation.  Results for your models are
enclosed.  In these cases there aren't problems with the variance of
the random effects approaching zero.  In the case of the Dyestuff data
we do get such behavior and the splom plot shows some rough edges as a
result.

On Sat, Dec 5, 2009 at 2:21 AM, John Maindonald
<john.maindonald at anu.edu.au> wrote:
Below are two examples of the use of mcmcsamp() with output
from lmer().  The first set of results are believable.
For the second set of results, both variance estimates
(for 'site' and for 'Residual' or scale) lie outside of the
credible intervals that are obtained.  Assuming I have
used the functions correctly, it seems surprising that
mcmcsamp() would 'fail' on the second example, which is
balanced and where both variances seem well away from
zero.  These results are consistent over repeated
simulations.

I'd be interested to hear from list members who make regular use of
mcmcsamp(), as well as maybe from Douglas. Is there any advance on
the current routines in the development branch?

Questions:

(1) Are instances of less obvious failoure common? How does one check?
(2) Is there are more direct way to get the credible intervals?
(3) What insight is avaiable on why the second example fails?

John Maindonald.

## EXAMPLE 1:

library(DAAG)
science1.lmer <- lmer(like ~ sex + PrivPub + (1 | school:class),
                    data = science)
science1.lmer
...
Random effects:
Groups       Name        Variance Std.Dev.
school:class (Intercept) 0.321    0.566
Residual                 3.052    1.747
Number of obs: 1383, groups: school:class, 66
...
science1.mcmc <- mcmcsamp(science1.lmer , n=1000)
z <- VarCorr(science1.mcmc, type="varcov")
colnames(z) <- c("school:class", "Residual")

## The following are believable!
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
             2.5%  97.5%
school:class 0.1442 0.4334
Residual     2.8601 3.3427

## EXAMPLE 2:
ant111b.lmer <- lmer(harvwt ~ 1 + (1 | site), data=ant111b)
ant111b.lmer
...
Random effects:
Groups   Name        Variance Std.Dev.
site     (Intercept) 2.368    1.54
Residual             0.578    0.76
Number of obs: 32, groups: site, 8
...
ant111b.mcmc <- mcmcsamp(ant111b.lmer, n=1000)
z <- VarCorr(ant111b.mcmc, type="varcov")
colnames(z) <- c("site", "Residual")
t(apply(z,2, function(x)quantile(x, prob=c(.025, .975))))
         2.5% 97.5%
site     0.2376 1.878
Residual 0.6370 2.488

John Maindonald             email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473    fax  : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
http://www.maths.anu.edu.au/~johnm

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

<john.Rout><Rplots.pdf>