Back to formatted view
Raw Message

Message-ID: <20200422081329.GB9808@info124.pharmacie.univ-paris5.fr>
Date: 2020-04-22T08:13:29Z
From: Emmanuel Curis
Subject: Precision about the glmer model for Bernoulli variables
In-Reply-To: <E8150DEF-961A-447D-875B-F38E78E8C228@health.ucsd.edu>

Thanks for the demonstration, and the explainations, and the time
spent about that!  That clarifies the interpretation of the models a
lot...

Best regards,

On Tue, Apr 21, 2020 at 05:33:13PM +0000, Vaida, Florin wrote:
> I forgot to mention that this proof works for any link function, any conditional distribution for the response, and any random effects distribution.
> In other words, it's not restricted to logit link, nor to binomial data, nor to normal random effects:
> 
> If Yj ~ iid conditional on u, then Cov(Y_j, Y_k) >= 0, with equality only in certain restrictive conditions.
> 
> 
> > On Apr 21, 2020, at 10:25 AM, Vaida, Florin <fvaida at health.ucsd.edu> wrote:
> > 
> > Hi Emmanuel,
> > 
> > So I can prove positive within-subject correlation for GLME logistic regression with random intercepts - assuming all observations have same mean!
> > 
> > Let Yj ~ Bernoulli(mu), logit(mu) = beta + u, u ~ Normal(0, tau^2).
> > Using the conditional covariance formula you get
> > Cov(Y1, Y2) = E(Cov(Y1,Y2 | u) + Cov(E(Y1|u), E(Y2|u)) = 0 + Cov( mu(u), mu(u)) = Var(mu(u)) >= 0, with 0 only if tau^2 = 0.
> > 
> > This proof does not extend if you let Yj have different means, i.e., replace beta by beta_j.
> > It also does not apply to more general random effects structures, e.g. random intercepts and slopes.
> > Note however that for the *linear* model with random intercepts and slopes, the correlation is not guaranteed positive.
> > 
> > Florin
> > 
> > 
> >> On Apr 20, 2020, at 2:05 PM, Vaida, Florin <fvaida at health.ucsd.edu> wrote:
> >> 
> >> Hi Emmanuel,
> >> 
> >> That's a good question.  My guess is that the correlation is non-negative generally, but I wasn't able to prove it theoretically even in the simplest case when Y1, Y2 ~ Bernoulli(u) independent conditionally on u, and u ~ Normal(0, 1).  I am curious if someone has a solution.
> >> We can't go too far down this route in this forum, since Doug wants to keep it applied.
> >> 
> >> Florin
> >> 
> >>> On Apr 20, 2020, at 12:32 PM, Emmanuel Curis <emmanuel.curis at parisdescartes.fr> wrote:
> >>> 
> >>> Hi Florin,
> >>> 
> >>> Thanks for the answer, the precision about p(i,j), and the reference.
> >>> 
> >>> A last question, that I forgot in my message: is the obtained
> >>> correlation also always positive, as in the linear case? Or may some
> >>> negative correlation appear, depending on the values of pi(i,j) and
> >>> pi(i,j')?
> >>> 
> >>> Best regards,
> >>> 
> >>> On Mon, Apr 20, 2020 at 03:27:39PM +0000, Vaida, Florin wrote:
> >>> ? Hi Emmanuel,
> >>> ? 
> >>> ? Your reasoning is correct.
> >>> ? 
> >>> ? As a quibble, outside a simple repeated measures experiment setup, p(i,j) *does* depend on j.
> >>> ? For example, if observations are collected over time, generally there is a time effect; if they are repeated measures with different experimental conditions, p(i,j) will depend on the condition j, etc.
> >>> ? 
> >>> ? There is almost certainly no closed form solution for the covariance under logit.
> >>> ? I am not sure about the probit (my guess is not).
> >>> ? There will be some Laplace approximations available, a la Breslow and Clayton 1993.
> >>> ? 
> >>> ? I'd be curious if these formulas/approximations were developed somewhere - I'd be surprised if they weren't.
> >>> ? 
> >>> ? Florin
> >>> ? 
> >>> ? 
> >>> ? > On Apr 20, 2020, at 12:48 AM, Emmanuel Curis <emmanuel.curis at parisdescartes.fr> wrote:
> >>> ? > 
> >>> ? > Hello everyone,
> >>> ? > 
> >>> ? > I hope you're all going fine in these difficult times.
> >>> ? > 
> >>> ? > I tried to understand in details the exact model used when using glmer
> >>> ? > for a Bernoulli experiment, by comparison with the linear mixed
> >>> ? > effects model, and especially how it introducts correlations between
> >>> ? > observations of a given group.  I think I finally got it, but could
> >>> ? > you check that what I write below is correct and that I'm not missing
> >>> ? > something?
> >>> ? > 
> >>> ? > I use a very simple case with only a single random effect, and no
> >>> ? > fixed effects, because I guess that adding fixed effects or other
> >>> ? > random effects does not change the idea, it "just" makes formulas more
> >>> ? > complex.  I note i the random effect level, let's say ? patient ?, and
> >>> ? > j the observation for this patient.
> >>> ? > 
> >>> ? > In the linear model, we have Y(i,j) = ?0 + Z(i) + epsilon( i, j ) with
> >>> ? > Z(i) and epsilon(i,j) randoms variables having a density of
> >>> ? > probability, independant, and each iid.
> >>> ? > 
> >>> ? > Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
> >>> ? > positive correlation between observations of the same patient.
> >>> ? > 
> >>> ? > 
> >>> ? > 
> >>> ? > In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
> >>> ? > f being the inverse link function, typically the reciprocal of the
> >>> ? > logit. So we have
> >>> ? > 
> >>> ? > cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
> >>> ? >     = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')
> >>> ? > 
> >>> ? > Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').
> >>> ? > 
> >>> ? > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> >>> ? >  integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz
> >>> ? > 
> >>> ? > Then, we assume that conditionnally on Zi, the Yij are independant, is
> >>> ? > this right? This is the equivalent of ? the epsilon(i, j) are
> >>> ? > independant ?? I assume this hypothesis is also used for computing the
> >>> ? > likelihood? If not, what is the model for the joint probability?
> >>> ? > 
> >>> ? > In that case,
> >>> ? > 
> >>> ? > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> >>> ? >  integral(R) f(z) f(z) p( Z(i) = z ) dz
> >>> ? > 
> >>> ? > and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have
> >>> ? > 
> >>> ? > cov( Y(i,j), Y(i,j') ) =
> >>> ? > integral( R ) f?(z) p( Z(i) = z ) dz -
> >>> ? >  ( integral( R ) f(z) p( Z(i) = z ) dz )?
> >>> ? > 
> >>> ? > which in general has no reason to be nul, hence the two observations
> >>> ? > are correlated. Is this correct?
> >>> ? > 
> >>> ? > Is there any way to have a closed-form of the covariance, for usual f
> >>> ? > (let's say, logit or probit) and Z distribution (let's say, Gaussian)?
> >>> ? > 
> >>> ? > Thanks a lot for reading, and your answers,
> >>> ? > 
> >>> ? > -- 
> >>> ? >                                Emmanuel CURIS
> >>> ? >                                emmanuel.curis at parisdescartes.fr
> >>> ? > 
> >>> ? > Page WWW: http://emmanuel.curis.online.fr/index.html
> >>> ? > 
> >>> ? > _______________________________________________
> >>> ? > R-sig-mixed-models at r-project.org mailing list
> >>> ? > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>> ? 
> >>> 
> >>> -- 
> >>>                              Emmanuel CURIS
> >>>                              emmanuel.curis at parisdescartes.fr
> >>> 
> >>> Page WWW: http://emmanuel.curis.online.fr/index.html
> >> 
> > 
> 

-- 
                                Emmanuel CURIS
                                emmanuel.curis at parisdescartes.fr

Page WWW: http://emmanuel.curis.online.fr/index.html