Message-ID: <0E4F6A32-CB86-4C49-AEFE-A0BEE6E16CED@health.ucsd.edu>
Date: 2020-04-21T17:25:33Z
From: Vaida, Florin
Subject: Precision about the glmer model for Bernoulli variables
In-Reply-To: <2AD9CAD5-FABC-4A13-B684-6D0908CFBF21@health.ucsd.edu>
Hi Emmanuel,
So I can prove positive within-subject correlation for GLME logistic regression with random intercepts - assuming all observations have same mean!
Let Yj ~ Bernoulli(mu), logit(mu) = beta + u, u ~ Normal(0, tau^2).
Using the conditional covariance formula you get
Cov(Y1, Y2) = E(Cov(Y1,Y2 | u) + Cov(E(Y1|u), E(Y2|u)) = 0 + Cov( mu(u), mu(u)) = Var(mu(u)) >= 0, with 0 only if tau^2 = 0.
This proof does not extend if you let Yj have different means, i.e., replace beta by beta_j.
It also does not apply to more general random effects structures, e.g. random intercepts and slopes.
Note however that for the *linear* model with random intercepts and slopes, the correlation is not guaranteed positive.
Florin
> On Apr 20, 2020, at 2:05 PM, Vaida, Florin <fvaida at health.ucsd.edu> wrote:
>
> Hi Emmanuel,
>
> That's a good question. My guess is that the correlation is non-negative generally, but I wasn't able to prove it theoretically even in the simplest case when Y1, Y2 ~ Bernoulli(u) independent conditionally on u, and u ~ Normal(0, 1). I am curious if someone has a solution.
> We can't go too far down this route in this forum, since Doug wants to keep it applied.
>
> Florin
>
>> On Apr 20, 2020, at 12:32 PM, Emmanuel Curis <emmanuel.curis at parisdescartes.fr> wrote:
>>
>> Hi Florin,
>>
>> Thanks for the answer, the precision about p(i,j), and the reference.
>>
>> A last question, that I forgot in my message: is the obtained
>> correlation also always positive, as in the linear case? Or may some
>> negative correlation appear, depending on the values of pi(i,j) and
>> pi(i,j')?
>>
>> Best regards,
>>
>> On Mon, Apr 20, 2020 at 03:27:39PM +0000, Vaida, Florin wrote:
>> ? Hi Emmanuel,
>> ?
>> ? Your reasoning is correct.
>> ?
>> ? As a quibble, outside a simple repeated measures experiment setup, p(i,j) *does* depend on j.
>> ? For example, if observations are collected over time, generally there is a time effect; if they are repeated measures with different experimental conditions, p(i,j) will depend on the condition j, etc.
>> ?
>> ? There is almost certainly no closed form solution for the covariance under logit.
>> ? I am not sure about the probit (my guess is not).
>> ? There will be some Laplace approximations available, a la Breslow and Clayton 1993.
>> ?
>> ? I'd be curious if these formulas/approximations were developed somewhere - I'd be surprised if they weren't.
>> ?
>> ? Florin
>> ?
>> ?
>> ? > On Apr 20, 2020, at 12:48 AM, Emmanuel Curis <emmanuel.curis at parisdescartes.fr> wrote:
>> ? >
>> ? > Hello everyone,
>> ? >
>> ? > I hope you're all going fine in these difficult times.
>> ? >
>> ? > I tried to understand in details the exact model used when using glmer
>> ? > for a Bernoulli experiment, by comparison with the linear mixed
>> ? > effects model, and especially how it introducts correlations between
>> ? > observations of a given group. I think I finally got it, but could
>> ? > you check that what I write below is correct and that I'm not missing
>> ? > something?
>> ? >
>> ? > I use a very simple case with only a single random effect, and no
>> ? > fixed effects, because I guess that adding fixed effects or other
>> ? > random effects does not change the idea, it "just" makes formulas more
>> ? > complex. I note i the random effect level, let's say ? patient ?, and
>> ? > j the observation for this patient.
>> ? >
>> ? > In the linear model, we have Y(i,j) = ?0 + Z(i) + epsilon( i, j ) with
>> ? > Z(i) and epsilon(i,j) randoms variables having a density of
>> ? > probability, independant, and each iid.
>> ? >
>> ? > Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
>> ? > positive correlation between observations of the same patient.
>> ? >
>> ? >
>> ? >
>> ? > In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
>> ? > f being the inverse link function, typically the reciprocal of the
>> ? > logit. So we have
>> ? >
>> ? > cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
>> ? > = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')
>> ? >
>> ? > Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').
>> ? >
>> ? > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
>> ? > integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz
>> ? >
>> ? > Then, we assume that conditionnally on Zi, the Yij are independant, is
>> ? > this right? This is the equivalent of ? the epsilon(i, j) are
>> ? > independant ?? I assume this hypothesis is also used for computing the
>> ? > likelihood? If not, what is the model for the joint probability?
>> ? >
>> ? > In that case,
>> ? >
>> ? > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
>> ? > integral(R) f(z) f(z) p( Z(i) = z ) dz
>> ? >
>> ? > and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have
>> ? >
>> ? > cov( Y(i,j), Y(i,j') ) =
>> ? > integral( R ) f?(z) p( Z(i) = z ) dz -
>> ? > ( integral( R ) f(z) p( Z(i) = z ) dz )?
>> ? >
>> ? > which in general has no reason to be nul, hence the two observations
>> ? > are correlated. Is this correct?
>> ? >
>> ? > Is there any way to have a closed-form of the covariance, for usual f
>> ? > (let's say, logit or probit) and Z distribution (let's say, Gaussian)?
>> ? >
>> ? > Thanks a lot for reading, and your answers,
>> ? >
>> ? > --
>> ? > Emmanuel CURIS
>> ? > emmanuel.curis at parisdescartes.fr
>> ? >
>> ? > Page WWW: http://emmanuel.curis.online.fr/index.html
>> ? >
>> ? > _______________________________________________
>> ? > R-sig-mixed-models at r-project.org mailing list
>> ? > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>> ?
>>
>> --
>> Emmanuel CURIS
>> emmanuel.curis at parisdescartes.fr
>>
>> Page WWW: http://emmanuel.curis.online.fr/index.html
>