lmer: problem in crossed random effect model with verydifferent variances
On Wed, Jun 17, 2009 at 3:54 PM, Michael Li<wuolong at gmail.com> wrote:
On Wed, Jun 17, 2009 at 4:02 PM, Douglas Bates<bates at stat.wisc.edu> wrote:
On Wed, Jun 17, 2009 at 2:48 PM, Doran, Harold<HDoran at air.org> wrote:
Michael
I'll take a quick stab at this, but there is really no way to know what the issue is given that there is no real description of your data. First, SAS and lmer use different algorithms for generating parameter estimates, so it's no surprise that the world does not line up exactly between the two. However, the estimates should be similar.
Lmer used REML by default. What did you use in SAS, ML or REML
The default is in SAS is also REML. So REML was used for both lmer and PROC MIXED.
What are the values of the REML criterion (or the deviance, if you used ML estimates) at convergence? ?If they are very close then it is just a matter of the convergence criterion.
The REML criteria in fact were also close at convergence (SAS gives 448.45 while lmer says 448.5).
Consider that the relative variance for the analyst in the SAS fit is less than 0.01. When you estimate a variance relative to the residual variance as 0.01 it means "essentially zero" You should read the result from lmer as an estimate of zero. ?For some reason the optimization software doesn't like to converge on the boundary and often ends up at very small but non-zero values.
Certainly in practice, the difference between 1500 and 10^-12 does not really matter when the error variance is 200,000. ?Still, 1500 seems to be more 'palatable'. Anyway, it was just surprising to see the difference even though I understand that lmer uses difference algorithm and did not expect the answers to be exactly the same. By the way, at log scale, SAS just gives 0 for analyst variance while lmer says 10^-15. I am attaching the code and data at the end of the email in case anyone wants to try it out. Thanks for the clarification. Michael R: lmer (y ~ (1 | day) + (1 | analyst), data = tmp) SAS: proc mixed data = tmp; ? ? ? class day analyst; ? ? ? model y = / s; ? ? ? random day analyst; run; tmp: ? day analyst ? ?y 1 ? ?1 ? ? ? 1 5482 2 ? ?1 ? ? ? 1 3285 3 ? ?1 ? ? ? 1 4266 4 ? ?1 ? ? ? 1 3818 5 ? ?1 ? ? ? 1 4159 6 ? ?2 ? ? ? 1 3007 7 ? ?2 ? ? ? 1 3349 8 ? ?2 ? ? ? 1 3178 9 ? ?2 ? ? ? 1 3093 10 ? 2 ? ? ? 1 3242 11 ? 3 ? ? ? 1 2495 12 ? 3 ? ? ? 1 2687 13 ? 3 ? ? ? 1 2858 14 ? 3 ? ? ? 1 2090 15 ? 3 ? ? ? 1 2218 16 ? 1 ? ? ? 2 3882 17 ? 1 ? ? ? 2 4522 18 ? 1 ? ? ? 2 3647 19 ? 1 ? ? ? 2 3690 20 ? 1 ? ? ? 2 3754 21 ? 2 ? ? ? 2 3050 22 ? 2 ? ? ? 2 2858 23 ? 2 ? ? ? 2 3178 24 ? 2 ? ? ? 2 2901 25 ? 2 ? ? ? 2 2410 26 ? 3 ? ? ? 2 1855 27 ? 3 ? ? ? 2 3157 28 ? 3 ? ? ? 2 2111 29 ? 3 ? ? ? 2 2431 30 ? 3 ? ? ? 2 3114
Thanks for sending the data. That clears things up a bit. You only have two analysts. It is unrealistic to expect to estimate a variance from two groups. Just as a sanity check you could fit a model with fixed effects for day and analyst
str(dat)
'data.frame': 30 obs. of 3 variables: $ day : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 2 2 2 2 2 ... $ analyst: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ... $ y : int 5482 3285 4266 3818 4159 3007 3349 3178 3093 3242 ...
summary(aov(y ~ day * analyst, dat))
Df Sum Sq Mean Sq F value Pr(>F) day 2 12410291 6205146 27.8892 5.494e-07 analyst 1 237096 237096 1.0656 0.3122 day:analyst 2 219345 109672 0.4929 0.6169 Residuals 24 5339828 222493
summary(aov(y ~ day + analyst, dat))
Df Sum Sq Mean Sq F value Pr(>F) day 2 12410291 6205146 29.0212 2.378e-07 analyst 1 237096 237096 1.1089 0.302 Residuals 26 5559173 213814 You can see that the effect of analyst is not significant, either in the model that allows for interaction or in the additive model.