unbalanced data in nested lmer model
Hello, as Andrew and others already explained:
There is more than 500 cases
Fine, this might give you reasonable estimates about how your y is affected by your fixed effects covariates (x1,x2,...)
(...) on 8 farms in 6 regions.
#and, from your previos post
For 2 of 8 regions there is only 1 farm, the other regions have 2 farms.
thus no way to estimate a difference between region or farm effects for 2 regions, and very, very limited power for the other 6 (just 2 farms per region). To make things worse your data are also quite unbalanced:
unbalance of case numbers in cells? Or would it be no problem if cell sizes vary between 0 and 53?
which I think means for some farms you got only one record? Anyway, to recap, probably OK data for understanding y~x1+x2 etc., insufficient data otherwise (should invest in getting data for more farms within regions, not more data for the farms you have already sampled).
Moreover I don't understand your argument that fitting random efects with less than 5 levels was dodgy, as often examples in the books have 3 samples from one beach, or 3 laboratory workers within one laboratory. These are less than 5 levels, are they not?
These are usually toy datasets to exemplify how the approach works, I do not think they make a claim that the resulting variance estimates are very reliable (think in the Zuur etal. mixed effects book you can find more realistic examples, if I remember well). Plus, "level" refers to the number of beaches or the number of labs etc. and the resulting variance estimates - if less than say 5 it appears that you might be better off fitting it as a fixed effect and not trying to decompose the variance into between labs and within labs etc. Anyway, just my 2 cents and hope I explained this correctly... See also the wiki page set up by Ben Bolker: http://glmm.wikidot.com/faq e.g. you might be interested in this entry therein: Zero or very small random effects variance estimates; (...) Very small variance estimates, or very large correlation estimates, often indicates unidentifiability/lack of data (either due to exact identifiability [e.g. designs that are not replicated at an important level] or weak identifiable (designs that would be workable with more data of the same type) HTH Cheers, Luca ----- Original Message ----- From: "Jana B?rger" <jana.buerger at uni-rostock.de> To: "Andrew Dolman" <andydolman at gmail.com> Cc: <r-sig-mixed-models at r-project.org> Sent: Monday, March 29, 2010 10:17 AM Subject: Re: [R-sig-ME] unbalanced data in nested lmer model
Dear Andrew and other list members, As I described in an earlier post(https://stat.ethz.ch/pipermail/r-sig-mixed-models/2010q1/003503.html) my data is actually hierarchical down to the level of fields within farms. There is more than 500 cases on 8 farms in 6 regions. Would you not think that gives enough power to distinguish within region variability vs. between regions? Moreover I don't understand your argument that fitting random efects with less than 5 levels was dodgy, as often examples in the books have 3 samples from one beach, or 3 laboratory workers within one laboratory. These are less than 5 levels, are they not? Regards, Jana Andrew Dolman schrieb:
Dear Jana,
>An anova(lm1, lm2) lm1<-lmer(y~x1+x2+...+(1|region)+(1|region:farm));
lm2<-lmer(y+x1+x2+...+(1|farm)) said models did not differ significantly and AIC was about the same. So I know there is no additional explanatory power including the region term.
>Yet, I would like to keep the region effect in the model to separate
and compare the effect size of region vs. farm. Is it valid to do so even if some of the regions are only represented by one farm? I don't think you have the data to ask questions about differences between regions as distinct from differences between farms. Look at it this way. If you were just doing a normal comparison between regions and you only looked at 1 or 2 farms per region, would you have the statistical power to say that differences were due to region rather than farm? Answer = No. Similarly, are the differences between the farms because they are in different regions or just normal variation between farms? Well you only have 2 farms per region so it's hard to tell. Maybe you just have enough data if pairs of farms within regions are always very similar and differences between regions large. Also. Fitting random effects with fewer than 5 levels is dodgy, and you only have 2 levels of farm per region, sometimes 1. Perhaps you could look at it this way. compare m1 <- lmer (y~(1|region)) m2 <- lmer (y~(1|farm)) If m2 is better then there is variation between farms within regions, if there's no difference then region accounts for most of the variation. BUT you've not got much power to detect farm effects within regions, so a null result is not strong evidence for the absence of farm variation within regions. Andy. andydolman at gmail.com <mailto:andydolman at gmail.com>
-- Jana B?rger Universit?t Rostock Agrar- und Umweltwissenschaftliche Fakult?t FG Phytomedizin Satower Stra?e 48 18059 Rostock Tel. 0381-498 31 71 Fax.0381-498 31 62
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models