unbalanced data in nested lmer model

Hello,

as Andrew and others already explained:
There is more than 500 cases
Fine, this might give you reasonable estimates about how your y is affected 
by your fixed effects covariates (x1,x2,...)
(...) on 8 farms in 6 regions.
#and, from your previos post
For 2 of 8 regions there is only 1 farm, the other regions have 2 farms.
thus no way to estimate a difference between region or farm effects for 2 
regions, and very, very limited power for the other 6 (just 2 farms per 
region). To make things worse your data are also quite unbalanced:
unbalance of case numbers in cells? Or would it be no problem if cell sizes 
vary between 0 and 53?
which I think means for some farms you got only one record? Anyway, to 
recap, probably OK data for understanding y~x1+x2 etc., insufficient data 
otherwise (should invest in getting data for more farms within regions, not 
more data for the farms you have already sampled).
Moreover I don't understand your argument that fitting random efects with 
less than 5 levels was dodgy, as often examples in the books have 3 
samples from one beach, or 3 laboratory workers within one laboratory. 
These are less than 5 levels, are they not?
These are usually toy datasets to exemplify how the approach works, I do not 
think they make a claim that the resulting variance estimates are very 
reliable (think in the Zuur etal. mixed effects book you can find more 
realistic examples, if I remember well). Plus, "level" refers to the number 
of beaches or the number of labs etc. and the resulting variance estimates - 
if less than say 5 it appears that you might be better off fitting it as a 
fixed effect and not trying to decompose the variance into between labs and 
within labs etc. Anyway, just my 2 cents and hope I explained this 
correctly...

See also the wiki page set up by Ben Bolker:
http://glmm.wikidot.com/faq

e.g. you might be interested in this entry therein:

Zero or very small random effects variance estimates;
(...)
Very small variance estimates, or very large correlation estimates, often 
indicates unidentifiability/lack of data (either due to exact 
identifiability [e.g. designs that are not replicated at an important level] 
or weak identifiable (designs that would be workable with more data of the 
same type)

HTH

Cheers,

Luca

----- Original Message ----- 
From: "Jana B?rger" <jana.buerger at uni-rostock.de>
To: "Andrew Dolman" <andydolman at gmail.com>
Cc: <r-sig-mixed-models at r-project.org>
Sent: Monday, March 29, 2010 10:17 AM
Subject: Re: [R-sig-ME] unbalanced data in nested lmer model
Dear Andrew and other list members,
As I described in an earlier 
post(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q1/003503.html)
my data is actually hierarchical down to the level of fields within farms.

There is more than 500 cases on 8 farms in 6 regions.
Would you not think that gives enough power to distinguish within region 
variability vs. between regions?

Moreover I don't understand your argument that fitting random efects with 
less than 5 levels was dodgy, as often examples in the books have 3 
samples from one beach, or 3 laboratory workers within one laboratory. 
These are less than 5 levels, are they not?

Regards, Jana

Andrew Dolman schrieb:
Dear Jana,

 >An anova(lm1, lm2)  lm1<-lmer(y~x1+x2+...+(1|region)+(1|region:farm)); 
lm2<-lmer(y+x1+x2+...+(1|farm)) said models did not differ significantly 
and AIC was about the same. So I know there is no additional explanatory 
power including the region term.

 >Yet, I would like to keep the region effect in the model to separate 
and compare the effect size of region vs. farm. Is it valid to do so even 
if  some of the regions are only represented by one farm?

I don't think you have the data to ask questions about differences 
between regions as distinct from differences between farms. Look at it 
this way. If you were just doing a normal comparison between regions and 
you only looked at 1 or 2 farms per region, would you have the 
statistical power to say that differences were due to region rather than 
farm? Answer = No.

Similarly, are the differences between the farms because they are in 
different regions or just normal variation between farms? Well you only 
have 2 farms per region so it's hard to tell. Maybe you just have enough 
data if pairs of farms within regions are always very similar and 
differences between regions large.

Also. Fitting random effects with fewer than 5 levels is dodgy, and you 
only have 2 levels of farm per region, sometimes 1.

Perhaps you could look at it this way.

compare

m1 <- lmer (y~(1|region))
m2 <- lmer (y~(1|farm))

If m2 is better then there is variation between farms within regions, if 
there's no difference then region accounts for most of the variation. BUT 
you've not got much power to detect farm effects within regions, so a 
null result is not strong evidence for the absence of farm variation 
within regions.

Andy.
 andydolman at gmail.com <mailto:andydolman at gmail.com>

-- 
Jana B?rger

Universit?t Rostock
Agrar-  und Umweltwissenschaftliche Fakult?t
FG Phytomedizin
Satower Stra?e 48
18059 Rostock

Tel. 0381-498 31 71
Fax.0381-498 31 62

_______________________________________________
R-sig-mixed-models at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

unbalanced data in nested lmer model

Thread (7 messages)