Hi Malcolm & Henrik & all,
Thanks for the help! I've tried some of these before, but tried even more
now. I think I'm unfortunately still stuck on this, and I think a part of
the reason is that dropping the random intercepts seems weird but I think
it is necessary and helpful with a categorical fixed/random interaction.
I know that removing the random intercept seems weird, but my understanding
is that eliminating the intercept when modeling a categorical random slope
helps interpretation, but doesn't change the model substantively. The
variance of the random slope parameter for "white" changes and the
correlation flips to -1.000 when you add in the intercept, but the model is
actually equivalent. That is, without the intercept in the model, the
random white=0 effect actually is the random intercept for therapists. See
model fit below.
As I understand it, by removing the intercept when using a categorical
random/fixed interaction effect, you get a more interpretable value, namely
variance in the differences between the reference group and the target
group attributable to therapists. I believe that the variance shown when
the intercept is included is too large, since it actually describes all
variance associated with the categorical variable as deviations from an
"average" client who doesn't exist.
When estimating the model suggested by Henrik [(1|primary_ther) +
(0+white|primary_ther)], two things seem odd to me. First, the intercept is
forced to be uncorrelated with the slopes - this is not theoretically
necessary. The other is that there are three random effects being estimated
now, which is too many (with a two level categorical effect, only one
parameter is necessary). Unless I'm wrong about something else, I don't
think I can treat this the way I would a continuous variable. I think this
is redundant and See below again.
So I suppose I'm still stuck with the original question, since I don't
think these solutions hold with a categorical effect. Am I just missing
something else, or maybe the suggestions ought to hold for categorical
effects and it's something else causing difficulties?
Thanks a lot,
Andrew
Here's the model adding back in the random intercepts but including the
white random effect:
print(fm1_ml_int <- lmer(DI ~ first_di + factor(white) +
(factor(white)|primary_ther), rem3post, REML=F), corr=F)
Linear mixed model fit by maximum likelihood
Formula: DI ~ first_di + factor(white) + (factor(white) | primary_ther)
Data: rem3post
AIC BIC logLik deviance REMLdev
4980 5020 -2483 4966 4983
Random effects:
Groups Name Variance Std.Dev. Corr
primary_ther (Intercept) 0.041708 0.20423
factor(white)1 0.018995 0.13782 -1.000
Residual 0.509959 0.71411
Number of obs: 2263, groups: primary_ther, 192
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.45138 0.06007 7.514
first_di 0.48009 0.02360 20.338
factor(white)1 0.01140 0.03298 0.346
Here's the difference test of interest, they appear to be equivalent models:
anova(fm1_ml, fm1_ml_int)
Data: rem3post
Models:
fm1_ml: DI ~ first_di + factor(white) + (0 + factor(white) | primary_ther)
fm1_ml_int: DI ~ first_di + factor(white) + (factor(white) | primary_ther)
Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)
fm1_ml 7 4979.9 5019.9 -2482.9
fm1_ml_int 7 4979.9 5019.9 -2482.9 0 0 1
And here's a version of the model with uncorrelated intercept and slopes
(the bigger model has a singular convergence, as anticipated in these
slides:
http://lme4.r-forge.r-project.org/slides/2011-03-16-Amsterdam/2Longitudinal.pdf
):
Formula: DI ~ 1 + (1 | primary_ther) + (0 + factor(white) | primary_ther)
Data: rem3post_2
AIC BIC logLik deviance REMLdev
5027 5062 -2508 5015 5022
Random effects:
Groups Name Variance Std.Dev. Corr
primary_ther (Intercept) 1.7240e-06 0.001313
primary_ther factor(white)0 4.6543e-02 0.215738
factor(white)1 6.9851e-03 0.083577 0.752
Residual 5.1870e-01 0.720212
Number of obs: 2263, groups: primary_ther, 192
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.48466 0.01791 82.91
On Tue, May 21, 2013 at 1:30 PM, Malcolm Fairbrother <
M.Fairbrother at bristol.ac.uk> wrote:
Dear Andrew,
What if you drop the "0 +" bit? So:
lmer(DI ~ first_di + factor(white) + (factor(white) | primary_ther),
rem3post, REML=F)
or
lmer(DI ~ first_di + white + (white | primary_ther), rem3post, REML=F)
Including "0 +" means you're not estimating a random intercept for
"primary_ther", which it sounds like you need/want. Instead, you're
getting two random slopes, which I guess are perfectly correlated
because they're two sides of the same coin (the coin being your binary
dummy variable "white").
If that doesn't solve the problem, it might help for you to post the
results of "str(rem3post)" (i.e., your dataset) as well.
Cheers,
Malcolm
Date: Tue, 21 May 2013 11:41:04 -0400
From: Andrew McAleavey <andrew.mcaleavey at gmail.com>
To: r-sig-mixed-models at r-project.org
Subject: [R-sig-ME] categorical random effects correlation in lme4
Hi,
I'm currently investigating a question of relative effectiveness of
therapists, and the particular question is whether some therapists are
differentially effective with white versus racial/ethnic minority clients
(this is coded as a binary variable called "white" in this data). We have
conceptualized this as a cross-level random effect, so the model has one
random effect for therapist intercept and one effect for the difference
effectiveness between their white and nonwhite clients.
I am relatively new to lme4, but I think I have specified the model
correctly (the fixed effects represent client pretreatment severity and
nonsignificant fixed effect of binary race; they don't seem to impact the
estimation problem). Here's the model of interest:
print(fm1_ml <- lmer(DI ~ first_di + white + (0 +
factor(white)|primary_ther), rem3post, REML=F), corr=F)
The problem is that the two random effects are appearing to correlate at
= 1.000. I think this is an estimation problem, and probably indicates
the random variables aren't accounting for all that much variance. I'm
dubious of interpreting this model, therefore. However, when comparing it
to the random intercepts only model using the LRT, there is a significant
difference, suggesting that even though the explained variance is (very)
small, it may be worth including:
Data: rem3post
Models:
fm1_a_ml: DI ~ first_di + factor(white) + (1 | primary_ther)
fm1_ml: DI ~ first_di + factor(white) + (0 + factor(white) |
Df AIC BIC logLik Chisq Chi Df
Pr(>Chisq)
fm1_a_ml 5 4982.7 5011.3 -2486.3
fm1_ml 7 4979.9 5019.9 -2482.9 6.7871 2 0.03359 *
My question is basically this: How should I interpret these results?
are significant differences between therapists in terms of their relative
effectiveness with white vs. nonwhite clients, but they're just small? Or
is even this not justified? Would it be safer to say that there are
no estimable differences? Am I missing something else?
Thanks a lot,
Andrew McAleavey
Here's the model of interest output:
print(fm1_ml <- lmer(DI ~ first_di + factor(white) + (0 +
factor(white)|primary_ther), rem3post, REML=F), corr=F)
Linear mixed model fit by maximum likelihood
Formula: DI ~ first_di + factor(white) + (0 + factor(white) |
Data: rem3post
AIC BIC logLik deviance REMLdev
4980 5020 -2483 4966 4983
Random effects:
Groups Name Variance Std.Dev. Corr
primary_ther factor(white) 0 0.0417050 0.204218
factor(white)1 0.0044086 0.066397 1.000
Residual 0.5099596 0.714115
Number of obs: 2263, groups: primary_ther, 192
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.45138 0.06007 7.514
first_di 0.48009 0.02360 20.338
factor(white)1 0.01140 0.03298 0.346
--
Andrew McAleavey, M.S.
Department of Psychology
The Pennsylvania State University
346 Moore Building
University Park, PA 16802
aam239 at psu.edu