Dear all, I am applying a mixed model with binomial distribution on a very large data set (around 400000 samples) with binary outcome (very few event, around 4%). Some respondents but not all are repeated measured over the years, that's why a mixed model is applied. The model can be written as : mod <- glmer(response ~ AGE + SEX...+ YEAR + (1 | respondentID), family=binomial, data=dat) The distribution of the random effect (ID) from the model output shows an obvious non-normal distribution: a large proportion of close to zero values and very few large values around 10. I am wondering if in this case the glmm model is still valid? if not valid, what kind of alternative model can I try? Can someone give some suggestion? A consequent problem is when I calculate the explained variance from the model: VarF <- var(as.vector(fixef(mod ) %*% t(mod @pp$X))) VarF/(VarF + VarCorr(mod )$respondentID[1] + (pi^2)/3) the variance of the fixed effect (VarF) from the model is only 1.6, while the variance of the random effect (VarCorr(mod )$respondentID[1]) is 149. Due to the non-normal distribution, the variance of the random effect is very large as compared to the fixed effect. Does this imply that the model performs bad? Or I should compute conditional R square? To summarize, my questions are: 1) What's the influence in estimation of the fixed effect and its explained variance (R squared) when the random effect does not follow a normal distribution? If the influence is large, any suggestions to solve it? 2) In a more general sense, how to comment a model where a large amount of variation comes from the random effects? Thanks Regards, Chun
答复: Non-normal random effect in glmm
2 messages · Chen Chun, Thierry Onkelinx
Dear Chun, Have a look at the subjects with high random intercepts. They are likely subjects with all positive outcomes. The high random intercepts are the result of complete separation. I don't bother with calculating the proportion of variance explained in case of generalised linear models. This is something like R?: a nice and simple property of _linear_ models due to the Gaussian distribution where mean and variance are independent. But very hard with distribution where mean and variance are linked. Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2016-10-12 16:30 GMT+02:00 Chen Chun <talischen at hotmail.com>:
Dear all,
I am applying a mixed model with binomial distribution on a very large
data set (around 400000 samples) with binary outcome (very few event,
around 4%). Some respondents but not all are repeated measured over the
years, that's why a mixed model is applied. The model can be written as :
mod <- glmer(response ~ AGE + SEX...+ YEAR + (1 | respondentID),
family=binomial, data=dat)
The distribution of the random effect (ID) from the model output shows an
obvious non-normal distribution: a large proportion of close to zero values
and very few large values around 10. I am wondering if in this case the
glmm model is still valid? if not valid, what kind of alternative model
can I try? Can someone give some suggestion?
A consequent problem is when I calculate the explained variance from the
model:
VarF <- var(as.vector(fixef(mod ) %*% t(mod @pp$X)))
VarF/(VarF + VarCorr(mod )$respondentID[1] + (pi^2)/3)
the variance of the fixed effect (VarF) from the model is only 1.6, while
the variance of the random effect (VarCorr(mod )$respondentID[1]) is 149.
Due to the non-normal distribution, the variance of the random effect is
very large as compared to the fixed effect. Does this imply that the model
performs bad? Or I should compute conditional R square?
To summarize, my questions are:
1) What's the influence in estimation of the fixed effect and its
explained variance (R squared) when the random effect does not follow a
normal distribution? If the influence is large, any suggestions to solve it?
2) In a more general sense, how to comment a model where a large amount of
variation comes from the random effects?
Thanks
Regards,
Chun
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models