lmer and standard error - R-SIG-mixed-models

Sat, Mar 28, 2015 8:14 AM #

Dear R users,
I have a question regarding how lmer (either lme4 or lmerTest) handles the degrees of freedom and calculation of the standard error for repeated observations.
I have a dataset in wich I have multiple observations for 57 different idyears in two different regions (range).
head(database) idyear      range overlapok cut05 c0620 regen water roads  elevaju36 GJ502006 charlevoix         1     0     0     0     0     0 223.888937 GJ502006 charlevoix         1     0   100     0     0     0 220.582938 GJ502006 charlevoix         1     0   100     0     0     0 219.411039 GJ502006 charlevoix         1     0   100     0     0     0 219.411040 GJ502006 charlevoix         1     0   100     0     0     0 219.411041 GJ502006 charlevoix         1     0   100     0     0     0 219.0555
Here is my lmer formula in which i nested idyears in range as random effects:
fidint5 <- lmer(overlapok ~ natdist + cut05 + c0620 + regen +(1|range/idyear) , data=database)summary(fidint5)
The summary identifies the good number of  groups (57) for 2 range. However, the df shows that the error is computed on between 2404 and 2418 df which returns really high t values and therefore extremely small p values.

Random effects: Groups       Name        Variance Std.Dev. idyear:range (Intercept) 0.16211  0.4026   range        (Intercept) 0.00000  0.0000   Residual                 0.01709  0.1307  Number of obs: 2429, groups:  idyear:range, 33; range, 2
Fixed effects:                Estimate   Std. Error           df t value(Intercept)    0.6691140    0.0704710   32.1000000   9.495natdist       -0.0023431    0.0015088 2404.1000000  -1.553cut05          0.0092084    0.0005473 2407.9000000  16.824c0620          0.0041097    0.0004459 2418.0000000   9.217regen         -0.0089785    0.0003203 2407.3000000 -28.027                        Pr(>|t|)    (Intercept)      0.0000000000765 ***natdist                    0.121    cut05       < 0.0000000000000002 ***c0620       < 0.0000000000000002 ***regen       < 0.0000000000000002 ***
Are the groups specified in the random term considered in this result? Is the way I specified the random effects incorrect or is this the way lmer function is designed? I am really only beginning to use mixed models and would really appreciate any help on this.
Thanks a lot for your time and wisdom, 
Alexandre Lafontaine

Ben Bolker

Mon, Mar 30, 2015 1:44 PM #

Alexandre Lafontaine <a_lafontaine at ...> writes:

Your formatting got mangled; it's best to try to send to the
mailing list using the simplest format you have available (plain
text, monospace font).  (I'm further mangling it because I'm posting
via Gmane, which doesn't like lines > 80 characters)

idyear      range overlapok cut05 c0620 regen water
     roads  elevaju36 GJ502006 charlevoix        
 1     0     0     0

0   100     0     0     0 220.582938 GJ502006 charlevoix         
1     0   100     0     0     0 219.411039

0 219.411040 GJ502006 charlevoix         1     0   100     0     0     0 
219.411041 GJ502006

(1|range/idyear) , data=database)

summary(fidint5)

The summary identifies the good number of  groups (57) for 2 range. 
However, the df shows that the error is
computed on between 2404 and 2418 df which returns really high t values 
and therefore extremely small p values.
 
 Random effects: Groups       Name        Variance Std.Dev. 
idyear:range (Intercept) 0.16211  0.4026   
range        (Intercept) 0.00000  0.0000   
Residual                 0.01709  0.1307  
Number of obs: 2429, groups:  idyear:range, 33; range, 2

   It doesn't make sense to use range as a random effect, since
there are only two levels.  Most practical to treat it as fixed
instead.

[snip]

The plain old lme4 package gives no df, leaving it to you to 
work it out for yourself.

  lmerTest uses Satterthwaite approximations, which are generally
pretty good but might have failed you here.  You could try the
pbkrtest and/or afex packages to get Kenward-Roger approximations,
which are slower but more reliable (if they're very close to
the Satterthwaite results you could fall back on the Satterthwaite
approx for practical use rather than slowing yourself down all
the time).

  If your covariates (natdist + cut05 + c0620 + regen) 
vary within years, then this is more or less a randomized-block
design, in which case the df given will be about right.

  By treating 'range' as a fixed effect, you won't be able to
make inferences beyond the two ranges considered -- but practically
speaking you wouldn't be able to extrapolate to other ranges if
you had only measured two in the first place ...


  Please try not to post in HTML ...