Nested Mixed Models in lme4 - R-SIG-mixed-models

Thu, Nov 8, 2007 9:40 AM #

On 11/8/07, Marco Chiarandini <marco at imada.sdu.dk> wrote:

Did you mean to write (size, dens, type) there?

Also, by "factor" do you mean that you regard all of these variables
as categorical?  If so, you should check the form of the size variable
in the data frame.  It is being stored as a numeric variable, not as a
factor.  If you want to interpret this  variable as a categorical
factor you should convert it to a factor or, as seems likely in this
case, an ordered factor.  (See ?factor and ?ordered)

Nesting is automatically handled appropriately in lmer as long as the
levels of the inst factor are distinct. That is, if each distinct
level of the inst factor has a distinct label, which also appears
likely in this case because I see the first label contains G and 200
and I assume that these parts of the name correspond to the level of
the type and size variables.  If so, then you simply need to use inst
as the grouping factor for the random effects.

The only time that nesting must be explicitly stated is when the
levels of the variable(s) at the inner level(s) are incomplete.  I
call this "implicit nesting".  Suppose I choose 20 different plants
from an experimental plot and extract several seeds from each plant
then perform multiple analyses on each seed, I could label the plants
"A", "B", "C", ...., "T" and the seeds "a", "b","c", ... for each
plant.  This is implicit nesting in that I only have, say, 12 seed
labels but there may be one or two hundred seeds.  To specify a
particular seed I must not only specify its seed label but also the
plant from which it came.  If I just specify "seed" in a model formula
I will get an inappropriate model fit.  I need to somehow specify seed
within plant.

In my view this is not a characteristic of the experiment - it's just
a dumb way of labeling the seeds.  If you use labels like "Aa", "Ab",
..., as I suspect you have done for your "inst" factor, then the
problem goes away.

I think that specifcation corresponds to the model that you describe
above, although I am not quite sure what the distinction between a
treatment factor and a group factor is.

It appears that the random effect for the inst factor may be
unnecessary.  You may want to check what the log-likelihood for a
model with only fixed-effects is.

I can't really comment on that without knowing how you specified the
model in SAS and what analysis of variance results from SAS you are
comparing.  This analysis of variance table, like most such tables in
R, is the decomposition of the variation in the response according to
the terms in the order they were given in the formula.  As Bill
Venables describes in his famous (and, regretably, unpublished) paper
"Exegeses on linear models" (just search for the title in a search
engine) this is the only decomposition that makes sense but that does
not deter many people, including the authors of SAS, from creating
other decompositions that may on the surface appear to make sense but
do not withstand careful scrutiny.

It appears that you may have a completely balanced experiment here
(I'm guessing that it is a computer experiment) in which case the
decomposition is invariant to reordering of the terms in the model.
In general, if type, size and dens are blocking factors or
environmental factors (that is, they represent a known source of
variability and you wish to control for these factors in examining
your experimental factors) then they should be entered first in the
formula.

Ah, that's a long story.  One can calculate F-ratios for fixed-effects
terms in a linear mixed model but they don't have an F distribution
except in certain balanced cases.  Determining a p-value for a
fixed-effects term in a mixed model fit to unbalanced data is not
trivial.  At one time I did list p-values in such a table but they
were approximations and rather coarse approximations that erred in the
wrong direction.  Some users, quite reasonably, objected that these
could be dangerously misleading so my current solution is not to
return a p-value at all.

You really, really don't want to try that.  The expression on the left
hand side of a random-effects term is treated as a linear model
formula from which a model matrix is evaluated.  The model matrix for
inst has 90 columns so each level of type is being modeled as having
90, possibly correlated, random effects associated with it.  The same
for size and dens.  90 correlated random effects requires estimation
of 90 variance parameters and 4005 covariance parameters.  The general
rule is that a factor on the left hand side of the '|' should have a
very small number of levels whereas a factor on the right hand side
should have a large number of levels.

Marco Chiarandini

Fri, Nov 9, 2007 8:41 AM #

Dear Prof. Bates,

yes, thank you a lot! All your corrections are 
appropriate! inst should have been type and all 
variables should have been categorical. My mistake.
Also: as you correctly pointed out, the data are 
from a computer experiment and perfectly balanced, 
and by group factors I meant blocking factors.

Your very clear explanation solved my concerns 
about the nesting! Thanks!

I've also redone the comparison with SAS and now 
results correspond.
The reason was mainly that I needed a quite 
different formula:

lmer(err~initial*neighborhood + initial*k + 
initial*type + initial*size + initial*dens + 
neighborhood*k + neighborhood*type + 
neighborhood*size + neighborhood*dens + k*type + 
k*size + k*dens + type*size + type*dens + 
size*dens + initial*neighborhood*k + 
(1|inst),data=Case3)

True also that we were using lsmeans in SAS that 
you discourage.

To me it would remain only to understand how I 
could obtain the results in a cell means format 
like those in SAS. But this seems to be a problem 
also in lm and hence I must probably study better 
how things work to find the way. Trying something 
of the kind:

fmm1 <- 
lmer(err~-1+ordered(size)+dens+type+(k+initial+neighborhood)^3+(1|inst),data=Case3)

does not seem to help much.

I left all the analysis I did, code + results, 
(SAS and R) at:

http://www.imada.sdu.dk/~marco/Mixed/


Thank you a lot very much for the help!

Best regards,

Marco

Marco Chiarandini 
http://www.imada.sdu.dk/~marco
Department of Mathematics	      Email: 
marco at imada.sdu.dk
and Computer Science,		      Phone: +45 6550 4031
University of Southern Denmark        Fax: +45 
6593 2691