Possible bug in lmer nested analysis with factors
On 9/18/05, Yan Wong <h.y.wong at leeds.ac.uk> wrote:
On 16 Sep 2005, at 17:21, Sundar Dorai-Raj wrote:
My guess is he wants this:
c1 <- factor(c)
d1 <- factor(d)
m <- lmer(a ~ b + (1|c1:d1)+(1|c1))
which assumes d1 is nested within c1.
Take a look at Section 3 in the "MlmSoftRev" vignette:
library(mlmRev)
vignette("MlmSoftRev")
Ah, that vignette is extremely useful: it deserves to be more widely known. I mainly intended this reply to be a thank you to yourself and Harold. Unfortunately, there is (as always), one last thing that is still puzzling me: the df for fixed factors seems to vary between what I currently understand to be equivalent calls to lme and lmer, e.g: ------- a<-rnorm(36); b<-factor(rep(1:3,each=12)) c<-factor(rep(1:2,each=6,3)) d<-factor(rep(1:3,each=2,6)) c <- evalq(b:c)[,drop=T] #make unique factor levels d <- evalq(c:d)[,drop=T] #make unique factor levels summary(lme(a ~ b, random=~1|c/d)) # output includes estimates for fixed effects such as # Value Std.Error DF t-value p-value # (Intercept) 0.06908901 0.3318330 18 0.2082041 0.8374 # b2 0.13279084 0.4692828 3 0.2829655 0.7956 # b3 -0.26146698 0.4692828 3 -0.5571630 0.6163 # I understand the above lme model to be equivalent to summary(lmer(a ~ b + (1|c) +(1|c:d)) #but this gives fixed effects estimates with differing DF: # Estimate Std. Error DF t value Pr(>|t|) # (Intercept) 0.069089 0.331724 33 0.2083 0.8363 # b2 0.132791 0.469128 33 0.2831 0.7789 # b3 -0.261467 0.469128 33 -0.5573 0.5811 Again, many thanks for your help: even more so if you or anyone else can answer this last little niggle of mine.
I'm coming to the discussion late and would also like to thank Sundar and Harold for their explanations. You are correct that good documentation of the capabilities of lmer does not currently exist. lmer is still under active development and documentation is spread in several places. The vignette in the mlmRev package explores some of the capabilities of lmer. Also see the examples in that package. You are correct that the denominator degrees of freedom associated with terms in the fixed effects is different between lme and lmer. Neither of them is "right" because there is no correct answer but the values from lmer are definitely more wrong than the values from lme. The problem is that lmer allows a wider range of models than does lme. In lme the grouping factors can only be nested. You can fake crossed grouping factors but you do need to fake them. Lmer allows nested or crossed or partially crossed grouping factors. Most of the subtlety in the design of lmer is to handle the case of partially crossed grouping factors in large data sets (think of value-added models that are applied to longitudinal data on students crossed with teachers in schools within school districts within states ...). Some arguments on degrees of freedom can be made for nested grouping factors but the question of testing fixed effects terms for models with partially crossed grouping factors is difficult. I am aware of the 1997 Biometrics paper by Kenward and Roger but I find it difficult to translate their formulae into something I can evaluate. Their representation of a linear mixed model is as a generalized least squares problem whereas lmer uses a penalized least squares representation. These are equivalent but sometimes the translations back and forth are difficult to write down. This area could be a very fruitful research area for people with strong mathematical and implementation skills. I have a partially completed writeup on the details of the lmer representation and implementation (the description of the linear mixed model is done - I'm still working on the description of the generalized linear mixed model and the nonlinear mixed model) which I could forward to anyone interested in such a challenge. There are two or three approaches that could be used and I can provide some references. An extensive comparison of the properties of these approaches across a range of real problems would be a valuable contribution to the literature but it would involve a lot of work in implementation. There are already some facilities for lmer models such as mcmcsamp and simulate which can be used for evaluating the posterior distribution of a single coefficient or for a parametric bootstrap of the reference distribution of a quantity like the likelihood ratio statistic for a hypothesis test.