Message-ID: <20200422081329.GB9808@info124.pharmacie.univ-paris5.fr>
Date: 2020-04-22T08:13:29Z
From: Emmanuel Curis
Subject: Precision about the glmer model for Bernoulli variables
In-Reply-To: <E8150DEF-961A-447D-875B-F38E78E8C228@health.ucsd.edu>
Thanks for the demonstration, and the explainations, and the time
spent about that! That clarifies the interpretation of the models a
lot...
Best regards,
On Tue, Apr 21, 2020 at 05:33:13PM +0000, Vaida, Florin wrote:
> I forgot to mention that this proof works for any link function, any conditional distribution for the response, and any random effects distribution.
> In other words, it's not restricted to logit link, nor to binomial data, nor to normal random effects:
>
> If Yj ~ iid conditional on u, then Cov(Y_j, Y_k) >= 0, with equality only in certain restrictive conditions.
>
>
> > On Apr 21, 2020, at 10:25 AM, Vaida, Florin <fvaida at health.ucsd.edu> wrote:
> >
> > Hi Emmanuel,
> >
> > So I can prove positive within-subject correlation for GLME logistic regression with random intercepts - assuming all observations have same mean!
> >
> > Let Yj ~ Bernoulli(mu), logit(mu) = beta + u, u ~ Normal(0, tau^2).
> > Using the conditional covariance formula you get
> > Cov(Y1, Y2) = E(Cov(Y1,Y2 | u) + Cov(E(Y1|u), E(Y2|u)) = 0 + Cov( mu(u), mu(u)) = Var(mu(u)) >= 0, with 0 only if tau^2 = 0.
> >
> > This proof does not extend if you let Yj have different means, i.e., replace beta by beta_j.
> > It also does not apply to more general random effects structures, e.g. random intercepts and slopes.
> > Note however that for the *linear* model with random intercepts and slopes, the correlation is not guaranteed positive.
> >
> > Florin
> >
> >
> >> On Apr 20, 2020, at 2:05 PM, Vaida, Florin <fvaida at health.ucsd.edu> wrote:
> >>
> >> Hi Emmanuel,
> >>
> >> That's a good question. My guess is that the correlation is non-negative generally, but I wasn't able to prove it theoretically even in the simplest case when Y1, Y2 ~ Bernoulli(u) independent conditionally on u, and u ~ Normal(0, 1). I am curious if someone has a solution.
> >> We can't go too far down this route in this forum, since Doug wants to keep it applied.
> >>
> >> Florin
> >>
> >>> On Apr 20, 2020, at 12:32 PM, Emmanuel Curis <emmanuel.curis at parisdescartes.fr> wrote:
> >>>
> >>> Hi Florin,
> >>>
> >>> Thanks for the answer, the precision about p(i,j), and the reference.
> >>>
> >>> A last question, that I forgot in my message: is the obtained
> >>> correlation also always positive, as in the linear case? Or may some
> >>> negative correlation appear, depending on the values of pi(i,j) and
> >>> pi(i,j')?
> >>>
> >>> Best regards,
> >>>
> >>> On Mon, Apr 20, 2020 at 03:27:39PM +0000, Vaida, Florin wrote:
> >>> ? Hi Emmanuel,
> >>> ?
> >>> ? Your reasoning is correct.
> >>> ?
> >>> ? As a quibble, outside a simple repeated measures experiment setup, p(i,j) *does* depend on j.
> >>> ? For example, if observations are collected over time, generally there is a time effect; if they are repeated measures with different experimental conditions, p(i,j) will depend on the condition j, etc.
> >>> ?
> >>> ? There is almost certainly no closed form solution for the covariance under logit.
> >>> ? I am not sure about the probit (my guess is not).
> >>> ? There will be some Laplace approximations available, a la Breslow and Clayton 1993.
> >>> ?
> >>> ? I'd be curious if these formulas/approximations were developed somewhere - I'd be surprised if they weren't.
> >>> ?
> >>> ? Florin
> >>> ?
> >>> ?
> >>> ? > On Apr 20, 2020, at 12:48 AM, Emmanuel Curis <emmanuel.curis at parisdescartes.fr> wrote:
> >>> ? >
> >>> ? > Hello everyone,
> >>> ? >
> >>> ? > I hope you're all going fine in these difficult times.
> >>> ? >
> >>> ? > I tried to understand in details the exact model used when using glmer
> >>> ? > for a Bernoulli experiment, by comparison with the linear mixed
> >>> ? > effects model, and especially how it introducts correlations between
> >>> ? > observations of a given group. I think I finally got it, but could
> >>> ? > you check that what I write below is correct and that I'm not missing
> >>> ? > something?
> >>> ? >
> >>> ? > I use a very simple case with only a single random effect, and no
> >>> ? > fixed effects, because I guess that adding fixed effects or other
> >>> ? > random effects does not change the idea, it "just" makes formulas more
> >>> ? > complex. I note i the random effect level, let's say ? patient ?, and
> >>> ? > j the observation for this patient.
> >>> ? >
> >>> ? > In the linear model, we have Y(i,j) = ?0 + Z(i) + epsilon( i, j ) with
> >>> ? > Z(i) and epsilon(i,j) randoms variables having a density of
> >>> ? > probability, independant, and each iid.
> >>> ? >
> >>> ? > Hence, cov( Y(i,j), Y(i,j') ) = Var( Z(i) ): the model introduces a
> >>> ? > positive correlation between observations of the same patient.
> >>> ? >
> >>> ? >
> >>> ? >
> >>> ? > In the Bernoulli model, Y(i,j) ~ B( pi(i,j) ) and pi(i,j) = f( Z(i) ),
> >>> ? > f being the inverse link function, typically the reciprocal of the
> >>> ? > logit. So we have
> >>> ? >
> >>> ? > cov( Y(i,j), Y(i,j') ) = E( Y(i,j) Y(i, j') ) - E( Y(i,j) ) E( Y(i,j') )
> >>> ? > = Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) - pi(i,j) * pi(i,j')
> >>> ? >
> >>> ? > Since in practice pi(i,j) does not depend on j, pi(i,j) = pi(i,j').
> >>> ? >
> >>> ? > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> >>> ? > integral(R) Pr( Y(i,j) = 1 inter Y(i,j') = 1 | Z(i) = z ) p( Z(i) = z ) dz
> >>> ? >
> >>> ? > Then, we assume that conditionnally on Zi, the Yij are independant, is
> >>> ? > this right? This is the equivalent of ? the epsilon(i, j) are
> >>> ? > independant ?? I assume this hypothesis is also used for computing the
> >>> ? > likelihood? If not, what is the model for the joint probability?
> >>> ? >
> >>> ? > In that case,
> >>> ? >
> >>> ? > Pr( Y(i,j) = 1 inter Y(i,j') = 1 ) =
> >>> ? > integral(R) f(z) f(z) p( Z(i) = z ) dz
> >>> ? >
> >>> ? > and since pi(i,j) = integral( R ) f(z) p( Z(i) = z ) dz we have
> >>> ? >
> >>> ? > cov( Y(i,j), Y(i,j') ) =
> >>> ? > integral( R ) f?(z) p( Z(i) = z ) dz -
> >>> ? > ( integral( R ) f(z) p( Z(i) = z ) dz )?
> >>> ? >
> >>> ? > which in general has no reason to be nul, hence the two observations
> >>> ? > are correlated. Is this correct?
> >>> ? >
> >>> ? > Is there any way to have a closed-form of the covariance, for usual f
> >>> ? > (let's say, logit or probit) and Z distribution (let's say, Gaussian)?
> >>> ? >
> >>> ? > Thanks a lot for reading, and your answers,
> >>> ? >
> >>> ? > --
> >>> ? > Emmanuel CURIS
> >>> ? > emmanuel.curis at parisdescartes.fr
> >>> ? >
> >>> ? > Page WWW: http://emmanuel.curis.online.fr/index.html
> >>> ? >
> >>> ? > _______________________________________________
> >>> ? > R-sig-mixed-models at r-project.org mailing list
> >>> ? > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> >>> ?
> >>>
> >>> --
> >>> Emmanuel CURIS
> >>> emmanuel.curis at parisdescartes.fr
> >>>
> >>> Page WWW: http://emmanuel.curis.online.fr/index.html
> >>
> >
>
--
Emmanuel CURIS
emmanuel.curis at parisdescartes.fr
Page WWW: http://emmanuel.curis.online.fr/index.html