Skip to content
Prev 8588 / 20628 Next

In simple terms, how is the estimated variance of higher-level effects calculated?

On Mon, 16 Jul 2012, Jeremy Koster wrote:

            
[...]
I think conceptualizing it as a latent variable model helps.  Since the 
latent variables are unobserved, we make inferences about their 
distribution based upon the distribution of the manifest variables and our 
assumptions about the nature of the latent variable distribution.

Different assumed latent variable distributions eg beta, normal, mixtures 
- and different link functions eg logit, probit, log, identity - will 
change not only your variance estimates, but your interpretation.

One useful exercise might be to simulate binary data from a threshold 
model, and demonstrate how it is that the variances of the (known) latent 
variables are estimated (in a probit-normal model), and how the 
tetrachoric correlation, Pearson correlation and odds ratio for a 2x2 
table vary by marginal probabilities and association strength.

You might also compare different models for this "classic"
boric acid teratogenicity dataset:

http://genepi.qimr.edu.au/staff/davidD/Sib-pair/Documents/Using_Sib-pair/Scripts/boricex.in

A final example might be to look at the commonly used approach of fitting 
a LMM to binary data coded as 1's and 0's (going back to Cochrane 1943), 
and whether results are deceptive or not.  In analysis of Genome Wide 
Association Scan data for a binary phenotype Y, we test the (fixed) effect 
of each measured polymorphism X (usually scored as 0,1,2) against Y, but 
we need to adjust for confounding due to unobserved relatedness of 
individuals in the study. The latter is estimated as an NxN empirical 
kinship matrix (the average pairwise correlation over M polymorphisms 
between N study participants, with M=2000000 to 5000000, and N = 1000 to 
100000).  When Y is continuous, a LMM is a very attractive approach...