Possible bug in lmer nested analysis with factors

I think you might have confused lme code with lmer code. Why do you have
c/d in the random portion?

I think what you want is
lmer(a ~ b + (1 | c)+(1|d))
Which gives the following using your data

Linear mixed-effects model fit by REML
Formula: a ~ b + (1 | c) + (1 | d) 
      AIC      BIC    logLik MLdeviance REMLdeviance
 108.0239 115.9415 -49.01193   94.57296     98.02386
Random effects:
 Groups   Name        Variance   Std.Dev.  
 d        (Intercept) 4.2877e-10 2.0707e-05
 c        (Intercept) 4.2877e-10 2.0707e-05
 Residual             8.5754e-01 9.2603e-01
# of obs: 36, groups: d, 3; c, 2

Fixed effects:
            Estimate Std. Error DF t value Pr(>|t|)  
(Intercept)  0.82928    0.40834 34  2.0309  0.05015 .
b           -0.37563    0.18903 34 -1.9872  0.05500 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Yan Wong
Sent: Friday, September 16, 2005 11:58 AM
To: R-help
Subject: [R] Possible bug in lmer nested analysis with factors

Hello,

Is this a bug in the lmer routine?

 > library(lme4)
 > ### test case based on rats data from Crawley  >
a<-rnorm(36);b<-rep(1:3,each=12);c<-rep(1:2,each=6,3);d<-rep
(1:3,each=2,6)
 >
 > ### mixed model works when c & d are numeric, lmer assumes they are
factors  > m <- lmer(a ~ b + (1|c/d))  >  > ### but bails out when they
are actually specified as factors  > c<-factor(c); d<-factor(d)  > m <-
lmer(a ~ b + (1| c / d))

Error in lmer(a ~ b + (1 | c/d)) : entry 0 in matrix[0,0] has row
2147483647 and column 2147483647
In addition: Warning message:
/ not meaningful for factors in: Ops.factor(c, d)

Cheers

Yan

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
My guess is he wants this:

c1 <- factor(c)
d1 <- factor(d)
m <- lmer(a ~ b + (1|c1:d1)+(1|c1))

which assumes d1 is nested within c1.

Take a look at Section 3 in the "MlmSoftRev" vignette:

library(mlmRev)
vignette("MlmSoftRev")

HTH,

--sundar
I think you might have confused lme code with lmer code. Why do you have
c/d in the random portion?

I think what you want is

lmer(a ~ b + (1 | c)+(1|d))

Which gives the following using your data

Linear mixed-effects model fit by REML
Formula: a ~ b + (1 | c) + (1 | d) 
      AIC      BIC    logLik MLdeviance REMLdeviance
 108.0239 115.9415 -49.01193   94.57296     98.02386
Random effects:
 Groups   Name        Variance   Std.Dev.  
 d        (Intercept) 4.2877e-10 2.0707e-05
 c        (Intercept) 4.2877e-10 2.0707e-05
 Residual             8.5754e-01 9.2603e-01
# of obs: 36, groups: d, 3; c, 2

Fixed effects:
            Estimate Std. Error DF t value Pr(>|t|)  
(Intercept)  0.82928    0.40834 34  2.0309  0.05015 .
b           -0.37563    0.18903 34 -1.9872  0.05500 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Yan Wong
Sent: Friday, September 16, 2005 11:58 AM
To: R-help
Subject: [R] Possible bug in lmer nested analysis with factors

Hello,

Is this a bug in the lmer routine?

 > library(lme4)
 > ### test case based on rats data from Crawley  >
a<-rnorm(36);b<-rep(1:3,each=12);c<-rep(1:2,each=6,3);d<-rep
(1:3,each=2,6)
 >
 > ### mixed model works when c & d are numeric, lmer assumes they are
factors  > m <- lmer(a ~ b + (1|c/d))  >  > ### but bails out when they
are actually specified as factors  > c<-factor(c); d<-factor(d)  > m <-
lmer(a ~ b + (1| c / d))

Error in lmer(a ~ b + (1 | c/d)) : entry 0 in matrix[0,0] has row
2147483647 and column 2147483647
In addition: Warning message:
/ not meaningful for factors in: Ops.factor(c, d)

Cheers

Yan

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

I think you might have confused lme code with lmer code. Why do you  
have
c/d in the random portion?
Apologies. I obviously have done something of the sort. I assumed  
that the 'random' assignment in lme could just be incorporated into  
an lmer call by placing it in brackets and removing the ~, so that

lme(a ~ b, random= ~ 1|c/d)

would be equivalent to

lmer(a ~ b + (1|c/d))

Is there a good guide somewhere to lmer calling conventions? I  
obviously don't understand them. As you can see, I would like to nest  
d within c, (and actually, c is nested in b too).

Perhaps it would be better with some explanation of the Crawley data.  
There are 3 fixed drug treatments ('b') given to 2 rats (6 rats in  
all: 'c'), and 3 samples ('d') are taken from each of the rat's  
livers, with some response variable recorded for each sample ('a':  
here just allocated a Normal distribution for testing purposes). I.e.  
c and d are random effects, with d %in% c and c %in% b.

He analyses it via
aov(a ~ b+c+d+Error(a/b/c))

I'm wondering what the lme and lmer equivalents are. I've been told  
that a reasonable form of analysis using lme is

a<-rnorm(36);b<-rep(1:3,each=12);d<-rep(1:3,each=2,6)
c <- rep(1:6, each=6) #use unique labels for each rat ## I got this  
wrong in my previous example
model1 <- lme(a ~ b, random= ~ 1|c/d)

Which gives what looks to be a reasonable output (but I'm new to all  
this mixed modelling stuff). How would I code the same thing using lmer?
I think what you want is

lmer(a ~ b + (1 | c)+(1|d))

Which gives the following using your data
I'm not sure this is what I wanted to do. But thanks for the all the  
help.

Yan

My guess is he wants this:

c1 <- factor(c)
d1 <- factor(d)
m <- lmer(a ~ b + (1|c1:d1)+(1|c1))

which assumes d1 is nested within c1.

Take a look at Section 3 in the "MlmSoftRev" vignette:

library(mlmRev)
vignette("MlmSoftRev")
Ah, that vignette is extremely useful: it deserves to be more widely  
known.
I mainly intended this reply to be a thank you to yourself and Harold.

Unfortunately, there is (as always), one last thing that is still  
puzzling me:
the df for fixed factors seems to vary between what I currently  
understand to
be equivalent calls to lme and lmer, e.g:

-------

a<-rnorm(36);
b<-factor(rep(1:3,each=12))
c<-factor(rep(1:2,each=6,3))
d<-factor(rep(1:3,each=2,6))
c <- evalq(b:c)[,drop=T] #make unique factor levels
d <- evalq(c:d)[,drop=T] #make unique factor levels

summary(lme(a ~ b, random=~1|c/d))
#  output includes estimates for fixed effects such as
#                    Value Std.Error DF    t-value p-value
#  (Intercept)  0.06908901 0.3318330 18  0.2082041  0.8374
#  b2           0.13279084 0.4692828  3  0.2829655  0.7956
#  b3          -0.26146698 0.4692828  3 -0.5571630  0.6163

# I understand the above lme model to be equivalent to
summary(lmer(a ~ b + (1|c) +(1|c:d))
#but this gives fixed effects estimates with differing DF:
#               Estimate Std. Error DF t value Pr(>|t|)
#  (Intercept)  0.069089   0.331724 33  0.2083   0.8363
#  b2           0.132791   0.469128 33  0.2831   0.7789
#  b3          -0.261467   0.469128 33 -0.5573   0.5811

Again, many thanks for your help: even more so if you or anyone
else can answer this last little niggle of mine.

Yan
On 16 Sep 2005, at 17:21, Sundar Dorai-Raj wrote:

My guess is he wants this:

c1 <- factor(c)
d1 <- factor(d)
m <- lmer(a ~ b + (1|c1:d1)+(1|c1))

which assumes d1 is nested within c1.

Take a look at Section 3 in the "MlmSoftRev" vignette:

library(mlmRev)
vignette("MlmSoftRev")
Ah, that vignette is extremely useful: it deserves to be more widely
known.
I mainly intended this reply to be a thank you to yourself and Harold.

Unfortunately, there is (as always), one last thing that is still
puzzling me:
the df for fixed factors seems to vary between what I currently
understand to
be equivalent calls to lme and lmer, e.g:

-------

a<-rnorm(36);
b<-factor(rep(1:3,each=12))
c<-factor(rep(1:2,each=6,3))
d<-factor(rep(1:3,each=2,6))
c <- evalq(b:c)[,drop=T] #make unique factor levels
d <- evalq(c:d)[,drop=T] #make unique factor levels

summary(lme(a ~ b, random=~1|c/d))
#  output includes estimates for fixed effects such as
#                    Value Std.Error DF    t-value p-value
#  (Intercept)  0.06908901 0.3318330 18  0.2082041  0.8374
#  b2           0.13279084 0.4692828  3  0.2829655  0.7956
#  b3          -0.26146698 0.4692828  3 -0.5571630  0.6163

# I understand the above lme model to be equivalent to
summary(lmer(a ~ b + (1|c) +(1|c:d))
#but this gives fixed effects estimates with differing DF:
#               Estimate Std. Error DF t value Pr(>|t|)
#  (Intercept)  0.069089   0.331724 33  0.2083   0.8363
#  b2           0.132791   0.469128 33  0.2831   0.7789
#  b3          -0.261467   0.469128 33 -0.5573   0.5811

Again, many thanks for your help: even more so if you or anyone
else can answer this last little niggle of mine.
I'm coming to the discussion late and would also like to thank Sundar
and Harold for their explanations.

You are correct that good documentation of the capabilities of lmer
does not currently exist. lmer is still under active development and
documentation is spread in several places.  The vignette in the mlmRev
package explores some of the capabilities of lmer.  Also see the
examples in that package.

You are correct that the denominator degrees of freedom associated
with terms in the fixed effects is different between lme and lmer. 
Neither of them is "right" because there is no correct answer but the
values from lmer are definitely more wrong than the values from lme.
The problem is that lmer allows a wider range of models than does lme.
 In lme the grouping factors can only be nested.  You can fake crossed
grouping factors but you do need to fake them.  Lmer allows nested or
crossed or partially crossed grouping factors.  Most of the subtlety
in the design of lmer is to handle the case of partially crossed
grouping factors in large data sets (think of value-added models that
are applied to longitudinal data on students crossed with teachers in
schools within school districts within states ...).  Some arguments on
degrees of freedom can be made for nested grouping factors but the
question of testing fixed effects terms for models with partially
crossed grouping factors is difficult.  I am aware of the 1997
Biometrics paper by Kenward and Roger but I find it difficult to
translate their formulae into something I can evaluate.  Their
representation of a linear mixed model is as a generalized least
squares problem whereas lmer uses a penalized least squares
representation.  These are equivalent but sometimes the translations
back and forth are difficult to write down.

This area could be a very fruitful research area for people with
strong mathematical and implementation skills.  I have a partially
completed writeup on the details of the lmer representation and
implementation (the description of the linear mixed model is done -
I'm still working on the description of the generalized linear mixed
model and the nonlinear mixed model) which I could forward to anyone
interested in such a challenge.  There are two or three approaches
that could be used and I can provide some references.  An extensive
comparison of the properties of these approaches across a range of
real problems would be a valuable contribution to the literature but
it would involve a lot of work in implementation.

There are already some facilities for lmer models such as mcmcsamp and
simulate which can be used for evaluating the posterior distribution
of a single coefficient or for a parametric bootstrap of the reference
distribution of a quantity like the likelihood ratio statistic for a
hypothesis test.
I have a couple of other comments.  You can write the nested grouping
factors as Sundar did without having to explicitly evaluate the
interaction term and drop unused levels.  The lmer function, like most
modeling functions in R, uses the optional argument drop.unused.levels
= TRUE when creating the model frame.

John Maindonald has already suggested the use of 

 (1|b/c) => (1|b:c) + (1|b)

as "syntactic sugar" for the lmer formula and it is a reasonable
request.  Unfortunately, implementing this is not high on my priority
list at present. (We just made a massive change in the sparse matrix
implementation in the Matrix package and shaking the bugs out of that
will take a while.)

In any case the general recommendation for nested grouping factors is
first to ensure that they are stored as factors and then to create the
sequence of interaction terms.
On 16 Sep 2005, at 17:21, Sundar Dorai-Raj wrote:

My guess is he wants this:

c1 <- factor(c)
d1 <- factor(d)
m <- lmer(a ~ b + (1|c1:d1)+(1|c1))

which assumes d1 is nested within c1.

Take a look at Section 3 in the "MlmSoftRev" vignette:

library(mlmRev)
vignette("MlmSoftRev")
Ah, that vignette is extremely useful: it deserves to be more widely
known.
I mainly intended this reply to be a thank you to yourself and Harold.

Unfortunately, there is (as always), one last thing that is still
puzzling me:
the df for fixed factors seems to vary between what I currently
understand to
be equivalent calls to lme and lmer, e.g:

-------

a<-rnorm(36);
b<-factor(rep(1:3,each=12))
c<-factor(rep(1:2,each=6,3))
d<-factor(rep(1:3,each=2,6))
c <- evalq(b:c)[,drop=T] #make unique factor levels
d <- evalq(c:d)[,drop=T] #make unique factor levels

summary(lme(a ~ b, random=~1|c/d))
#  output includes estimates for fixed effects such as
#                    Value Std.Error DF    t-value p-value
#  (Intercept)  0.06908901 0.3318330 18  0.2082041  0.8374
#  b2           0.13279084 0.4692828  3  0.2829655  0.7956
#  b3          -0.26146698 0.4692828  3 -0.5571630  0.6163

# I understand the above lme model to be equivalent to
summary(lmer(a ~ b + (1|c) +(1|c:d))
#but this gives fixed effects estimates with differing DF:
#               Estimate Std. Error DF t value Pr(>|t|)
#  (Intercept)  0.069089   0.331724 33  0.2083   0.8363
#  b2           0.132791   0.469128 33  0.2831   0.7789
#  b3          -0.261467   0.469128 33 -0.5573   0.5811

Again, many thanks for your help: even more so if you or anyone
else can answer this last little niggle of mine.

You are correct that good documentation of the capabilities of lmer
does not currently exist. lmer is still under active development and
documentation is spread in several places.  The vignette in the mlmRev
package explores some of the capabilities of lmer.  Also see the
examples in that package.
Yes. Thanks for this, and indeed for the development of the package.
I'm currently trying to do GLMMs (binary response), so I thought that
I should learn mixed modelling using a library with these capabilities.
You are correct that the denominator degrees of freedom associated
with terms in the fixed effects is different between lme and lmer.
...
Some arguments on
degrees of freedom can be made for nested grouping factors but the
question of testing fixed effects terms for models with partially
crossed grouping factors is difficult.
Would it not be possible to recognise when the model is fully nested,
and make this a special case? I was imagining using lmer as a
replacement for lme, so finding that they differ in this way came
as some surprise. When learning to use a new, relatively undescribed
routine, I usually try to see if I can reproduce known results. This
is where I was coming unstuck when trying to reproduce lme results
using lmer.

I suspect that many people (I know of one other in my group) will use
lmer as a drop-in replacement for lme specifically for its GLMM
capabilities rather than for its partial nesting. I realise, however,
that this might not be your priority.
This area could be a very fruitful research area for people with
strong mathematical and implementation skills.
That's not me, I'm afraid. I am only just working through Chapter 1
of your (excellent) "mixed effects models in S" book.
There are already some facilities for lmer models such as mcmcsamp and
simulate which can be used for evaluating the posterior distribution
of a single coefficient or for a parametric bootstrap of the reference
distribution of a quantity like the likelihood ratio statistic for a
hypothesis test.
This, again, is beyond me at the moment. But I do hope that someone
else can respond to the call, especially for "textbook" as well as
more complex examples of lmer usage.

Best wishes

Yan Wong

I have a couple of other comments.  You can write the nested grouping
factors as Sundar did without having to explicitly evaluate the
interaction term and drop unused levels.  The lmer function, like most
modeling functions in R, uses the optional argument drop.unused.levels
= TRUE when creating the model frame.
In other words, the use of "b:c" in a model formula, where both b and c
are factors, results in an internal call to evalq(b:c)[,drop=T] (or
equivalent), which is treated as a factor in a temporary model data
frame? I know little of the internals to R - that is new to me,
but does make sense for factors.

Thus I could use |a:b and |a:b:c as random terms in lme or lmer, even
though 'a' is a fixed, unnested factor.

Notation like this in the model formula does indeed aid clarity. By the
way, I noticed that in your mlmRev vignette you recommended this as good
practice for lea:school (page 3), but then omitted to do it on page 4.
John Maindonald has already suggested the use of

 (1|b/c) => (1|b:c) + (1|b)

as "syntactic sugar" for the lmer formula and it is a reasonable
request.
This is, indeed, the behaviour I was expecting.
Unfortunately, implementing this is not high on my priority
list at present. (We just made a massive change in the sparse matrix
implementation in the Matrix package and shaking the bugs out of that
will take a while.)
All your efforts in these areas are, I'm sure, much appreciated. I'm
certainly very interested in learning to use lmer, and welcome all the
improvements that are being made.
In any case the general recommendation for nested grouping factors is
first to ensure that they are stored as factors and then to create the
sequence of interaction terms.
As a brief aside, I know that some people assume that lme treats random
effects as factors even if they are of a numeric type. It might be worth
doing a check in lmer (and even lme) that random effects are factors,
producing a warning if not. Again, this is a non-vital suggestion, and I
don't wish to take up any more of your time!

Thanks

Yan