Mixed-model-binary logistic model with dependence between individual repeated measures
On Fri, 7 Jan 2011, Martin Maechler wrote:
Ben Bolker <bbolker at gmail.com>
on Fri, 07 Jan 2011 11:49:31 -0500 writes:
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1
> On 11-01-07 11:35 AM, Anna Ekman wrote:
>> Ben Bolker, thank you for your suggestions. >> >> Yes, it is suprising that I in SAS and STATA have to assume >> independence between the measurements within an individual.
> It's fundamentally a bit hard to specify correlation among individuals > in a non-normal model. One option is to go completely to the marginal > specification (which you said you don't want to do); probably the most > sensible statistical formulation is
> (fixed effects) eta0 = X*beta > (random effects) eta1 ~ MVN(mu=X*beta,Sigma=(something sensible such > as AR(1) within individuals)) > y ~ Bernoulli(eta1)
Interesting... {I've been "taught" in the past that correlation
specification for non-normal, i.e. GLME models,
would not make sense / be possible,
something you do not seem to support ...
}
Does the above mean {slight changes}
(fixed effects) eta0 = X*beta
(random effects) eta1 ~ MVN(0, Sigma=(something sensible such
as AR(1) within individuals))
(Y | X,eta1) ~ Bernoulli( logit(eta0 + eta1) )
With the probit link, such dichotomous and ordinal variable mixed models have a long history in genetics and psychometrics. In the latter case, factor analysis and path analysis of tetrachoric/polychoric correlations is completely equivalent to the probit-normal, although GLS/WLS was often used for computational reasons. We used to do all this in LISREL. For the case of varying numbers of observations per individual (and other irregular data types), you can use the "multiple groups" approach, where you specify a covariance matrix of the right size for each pattern of data, and constrain the correlations equal in the different groups. Since the main interest is in the correlations between latent variables, all hypotheses and estimates are usually framed at that "level" of the model. In the genetic situation, for example, we might estimate the heritability of a dichotomous trait based on family data under a polygenic model as being 1/2 the sibling tetrachoric correlation. Model criticism is done by comparing predicted risk to different degrees of relations of an affected individual, or set of affected relatives. Practically, this was used for genetic counselling etc. In the current era of genome wide association studies, a key question is the "missing heritability", ie amount of familial aggregation of diseases unexplained by gene variants with detectable effect: the case control studies have N=30000. Some of the arguments hinge on what kind of link function is used in the theoretical model. Sorry, I couldn't resist ;)
| David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v