LMM diagnostics: conditional residuals correlated highly with fitted values
Hi Thierry,
Thank you for your reply and sorry for the HTML thing. Below is my
summary(model) output.
Y, Drink, and Age are continuous variables
Gender is F & M.
Family_ID is a factor.
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Y ~ Drink * Gender + Age + (1 | Family_ID)
Data: data
AIC BIC logLik deviance df.resid
1046.4 1074.0 -516.2 1032.4 372
Scaled residuals:
Min 1Q Median 3Q Max
-2.67228 -0.56085 -0.02968 0.66166 2.91452
Random effects:
Groups Name Variance Std.Dev.
Family_ID (Intercept) 0.3550 0.5958
Residual 0.6162 0.7850
Number of obs: 379, groups: Family_ID, 189
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.10309 0.43921 2.511
Drink 0.16425 0.08031 2.045
Gender.M -0.19364 0.10874 -1.781
Age -0.03377 0.01489 -2.268
Drink:Gender.M -0.13647 0.10681 -1.278
Correlation of Fixed Effects:
(Intr) Drnk Gndr.M Age
Drink -0.098
Gender.M -0.040 -0.249
Age -0.985 0.158 -0.054
Drnk:G.M 0.042 -0.737 -0.021 -0.085
Thank you very much,
Cherry
On Wed, Oct 7, 2015 at 5:14 AM, Thierry Onkelinx
<thierry.onkelinx at inbo.be> wrote:
Dear Cherry, Please don't post in HTML. Have a look at the posting guide. You'll need to provide more information. What is the class of each variable (continuous, count, presence/absence, factor, ...)? What is the output of summary(model)? Best regards, ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance Kliniekstraat 25 1070 Anderlecht Belgium To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey 2015-10-06 17:15 GMT+02:00 Yizhou Ma <maxxx848 at umn.edu>:
Dear LMM experts:
I am pretty new to using LMM and I have found the following situation
bewildering as I was trying to do diagnostics with my fitted model: my
conditional residuals correlated highly with the fitted values.
I have a dataset with multiple families, each has 1-4 siblings. I am
trying
to regress Y onto EVs include Drink, Gender, & Age, while using random
intercept for family. This is the model I used:
model<-lmer(Y~Drink*Gender+Age
+(1|Family_ID),data,REML=FALSE)
After fitting the model, I used
plot(model)
to see the relationship between conditional residuals and fitted values. I
expect them to be uncorrelated and I expect to see homoscedasticity.
Yet to my surprise there is a high correlation (~0.5) between the
residuals
and the fitted values. (see here <http://imgur.com/pPsG4aR>). I know from
GLM that this usually suggest nonlinear relationships between the EVs and
the DV.
I read some online posts (post1
<http://stats.stackexchange.com/questions/43566/strange-pattern-in-residual-plot-from-mixed-effect-model>
post2
<http://stats.stackexchange.com/questions/168179/correlation-between-standardized-residuals-and-fitted-values-in-a-linear-mixed-e/168210#168210>)
that suggest this can result from a poor model fit. So I tried a few
different models, including: 1) log transform Drink, which is originally
positively skewed; 2) add random slopes for Drink, Age, etc. None of these
changes have led to a substantial difference for the residual & fitted
value correlation.
Some other info:
1) my overall model fit is not poor as indicated by the correlation
between
fitted values & Y. It is around 0.8;
2) most variables in my model has a normal, or at least symmetrical,
distribution.
3) conditional residuals are normally distributed as shown in qqplots.
4) conditional residuals are not correlated with any fixed effects, such
as
Drink or Age.
I have two guesses as to what is going on:
1) maybe the fact that each family is a different size actually violates
assumptions of the model?
2) or maybe there is something wrong with estimation of the random effect
(family intercept)?
I'd really appreciate your insights as to what is going on here and if
there is any problems with my model.
Thank you very much,
Cherry
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models