Calculation of random effects for factors in R
On Tue, Jul 28, 2015 at 9:01 AM, Sudell, Maria [mesudell]
<M.E.Sudell at liverpool.ac.uk> wrote:
Hello, I have a question concerning exactly how random effects for a factor are calculated in R. I have tried to find an answer on various R related websites and text books but cannot find a definitive explanation. As an example, if you had a longitudinal dataset, and you wanted to include an individual specific random effect for a smoking factor (say 3 levels, current, ex, never), how would the random effects be calculated using R? (I understand how to code this in R, I am aiming to understand the mechanics of how the function gets to the random effects).
I'm putting together class notes on this, but they are not quite ready, or else I would give them to you. The Pinheiro & Bates book (2000) is the classic statement on this. There is a newer article that the lme4 team prepared for JSS will answer this for you. Those are technically demanding. I have found there are easier-to read interpretations of this in the Gelman & Hill 2007 book and in Ben Bolker's book Ecological Models and Data in R. The approach is penalized maximum likelihood, in which the random effects are conceptualized as coefficients on a random effects design matrix. I did not realize how difficult this was to explain until I tried with some students. If you bang your head on a few of these books for a while, get the 2006 book by Simon Wood on generalized additive models. On the way to GAMs, he's got about the most beautiful explanation of how these models are estimated that you will ever find. That's technically challenging, but I've never seen the structure laid out so beautifully.
My understanding so far would be that indicator variables for each of the levels of the factor would be included (in this case 3 indicator variables of 0,1, one for each of current, ex, never). Then coefficients for the indicator variables would be found (so for each individual in the dataset, we would end up with a coefficient for one of the indicator variables, assuming that individuals can't be in more than one group). These random coefficients (one for each individual as each individual would only fall into one smoking status) would then have their mean and variation calculated, in order to report the distribution of the random effect. Is this correct?
Not exactly. The estimate of the variance of the random effect is a parameter estimate, and so far as I can tell, it is not ever linked or even compared against the estimates of the individual case random effects. That's an interesting question, though. Until you asked, I had not thought much about it. I've never run ranef() to get the individual random effect estimates and calculated their variance. Theoretically, we know the estimated random effects are a blend of the estimates you would get if you treated each subgroup in isolation and the estimate you get if you pool all of the data. And the sample size within each group determines how much weight is placed on the subgroup-specific estimate. Since those estimates of the b's are shrunken in that way, their variance won't necessarily coincide with the variance number at the top of the lmer output. Anyway, I've been reading the papers by Doug Bates and now the larger lme4 team and they explain all this thoroughly.
Apologies for such a simple question. Any help or explanation (or point to relevant paper or textbook) of how random effects are calculated for factors in R would be greatly appreciated.
Many thanks
Maria
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Paul E. Johnson Professor, Political Science Director 1541 Lilac Lane, Room 504 Center for Research Methods University of Kansas University of Kansas http://pj.freefaculty.org http://crmda.ku.edu