Dear R users, I have a question regarding how lmer (either lme4 or lmerTest) handles the degrees of freedom and calculation of the standard error for repeated observations. I have a dataset in wich I have multiple observations for 57 different idyears in two different regions (range). head(database) idyear range overlapok cut05 c0620 regen water roads elevaju36 GJ502006 charlevoix 1 0 0 0 0 0 223.888937 GJ502006 charlevoix 1 0 100 0 0 0 220.582938 GJ502006 charlevoix 1 0 100 0 0 0 219.411039 GJ502006 charlevoix 1 0 100 0 0 0 219.411040 GJ502006 charlevoix 1 0 100 0 0 0 219.411041 GJ502006 charlevoix 1 0 100 0 0 0 219.0555 Here is my lmer formula in which i nested idyears in range as random effects: fidint5 <- lmer(overlapok ~ natdist + cut05 + c0620 + regen +(1|range/idyear) , data=database)summary(fidint5) The summary identifies the good number of groups (57) for 2 range. However, the df shows that the error is computed on between 2404 and 2418 df which returns really high t values and therefore extremely small p values. Random effects: Groups Name Variance Std.Dev. idyear:range (Intercept) 0.16211 0.4026 range (Intercept) 0.00000 0.0000 Residual 0.01709 0.1307 Number of obs: 2429, groups: idyear:range, 33; range, 2 Fixed effects: Estimate Std. Error df t value(Intercept) 0.6691140 0.0704710 32.1000000 9.495natdist -0.0023431 0.0015088 2404.1000000 -1.553cut05 0.0092084 0.0005473 2407.9000000 16.824c0620 0.0041097 0.0004459 2418.0000000 9.217regen -0.0089785 0.0003203 2407.3000000 -28.027 Pr(>|t|) (Intercept) 0.0000000000765 ***natdist 0.121 cut05 < 0.0000000000000002 ***c0620 < 0.0000000000000002 ***regen < 0.0000000000000002 *** Are the groups specified in the random term considered in this result? Is the way I specified the random effects incorrect or is this the way lmer function is designed? I am really only beginning to use mixed models and would really appreciate any help on this. Thanks a lot for your time and wisdom, Alexandre Lafontaine
lmer and standard error
2 messages · Alexandre Lafontaine, Ben Bolker
2 days later
Alexandre Lafontaine <a_lafontaine at ...> writes:
Dear R users,
I have a question regarding how lmer (either lme4 or lmerTest) handles the degrees of freedom and calculation of the standard error for repeated observations. I have a dataset in wich I have multiple observations for 57 different idyears in two different regions (range).
Your formatting got mangled; it's best to try to send to the mailing list using the simplest format you have available (plain text, monospace font). (I'm further mangling it because I'm posting via Gmane, which doesn't like lines > 80 characters)
head(database)
idyear range overlapok cut05 c0620 regen water
roads elevaju36 GJ502006 charlevoix
1 0 0 0
0 0 223.888937 GJ502006 charlevoix 1
0 100 0 0 0 220.582938 GJ502006 charlevoix 1 0 100 0 0 0 219.411039
GJ502006 charlevoix 1 0 100 0 0
0 219.411040 GJ502006 charlevoix 1 0 100 0 0 0 219.411041 GJ502006
charlevoix 1 0 100 0 0 0 219.0555
Here is my lmer formula in which i nested idyears in range as random effects:
fidint5 <- lmer(overlapok ~ natdist + cut05 + c0620 + regen +
(1|range/idyear) , data=database) summary(fidint5) The summary identifies the good number of groups (57) for 2 range. However, the df shows that the error is computed on between 2404 and 2418 df which returns really high t values and therefore extremely small p values. Random effects: Groups Name Variance Std.Dev. idyear:range (Intercept) 0.16211 0.4026 range (Intercept) 0.00000 0.0000 Residual 0.01709 0.1307 Number of obs: 2429, groups: idyear:range, 33; range, 2 It doesn't make sense to use range as a random effect, since there are only two levels. Most practical to treat it as fixed instead. [snip]
Are the groups specified in the random term considered in this result? Is the way I specified the random effects incorrect or is this the way lmer function is designed? I am really only beginning to use mixed models and would really appreciate any help on this.
The plain old lme4 package gives no df, leaving it to you to work it out for yourself. lmerTest uses Satterthwaite approximations, which are generally pretty good but might have failed you here. You could try the pbkrtest and/or afex packages to get Kenward-Roger approximations, which are slower but more reliable (if they're very close to the Satterthwaite results you could fall back on the Satterthwaite approx for practical use rather than slowing yourself down all the time). If your covariates (natdist + cut05 + c0620 + regen) vary within years, then this is more or less a randomized-block design, in which case the df given will be about right. By treating 'range' as a fixed effect, you won't be able to make inferences beyond the two ranges considered -- but practically speaking you wouldn't be able to extrapolate to other ranges if you had only measured two in the first place ... Please try not to post in HTML ...