How to correctly specify a mixed model
You should get exactly the same answer whichever way you do it (try it!). The only thing you lose by aggregating is the estimate of the variance among observations within areas (which you might not care about anyway). The advantage is a simpler model, which is easier to do inference on and harder to screw up. This is the idea of Murtaugh's 2007 paper in Ecology, "Simplicity and Complexity in Ecological Data Analysis". The only reasons *not* to aggregate would be: - you're interested in the within-area variance; - you're doing a GLMM (count/binary responses can't always be aggregated as simply as Normal responses) - you have individual-level covariates that vary within areas - you have unbalanced data (this can be often be handled by assigning non-equal weights) A sample size of 7 is indeed somewhat low for a regression with 2 inputs, but whether you aggregate or not won't make a difference.
On 16-02-22 05:39 PM, christos mammides wrote:
Dear all, I have a possibly na?ve question on how to correctly specify a mixed model. I would appreciate any help you can provide. Let?s say I have data on plant growth from several individuals from 7 different areas (n=96), and I want to test the effect of two climatic variables (temperature and rain) on growth. For each of the 7 areas I have one measurement for temperature and one for rain. For example, the first few lines of my data look like this: Individual Growth Temperature Rain Area 1 10 15 300 A 2 12 15 300 A 3 20 15 300 A 4 16 25 500 B 5 29 25 500 B 6 10 25 500 B ? ? ? ? ? Would the following model be appropriate (in terms of the way the random effect is specified)? Model <- lmer(Growth~Temperature+Rain+(1|Area), data=Data) It was suggested to me that since I only have one measurement for each climatic variable per area it?s probably better to take the average of the plant growth for each area and run a simple regression model such as this: Model <- lm(AveragedGrowth~Temp+Rain, data=AveragedData). I am right to think that in doing that I am losing information, by averaging my plant growth data, and I am also reducing my sample size (n=7) to a point that it would be too difficult to run a regression? Hope my question makes sense. Thank you in advance, Christos ? [[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models