Skip to content

adding a kinship matrix to a GLMM

4 messages · mhorton at uchicago.edu, David Duffy, Jarrod Hadfield

#
Hello,

I'd like to test the mixed-model approach taken in EMMA (-X):
http://www.nature.com/ng/journal/v42/n4/abs/ng.548.html

in a GLMM framework to analyze non-normal data (e.g. count data; sometimes 
there are lots of zeros). I'm trying this with MASS' glmmPQL, which seems to 
allow the user to provide an "an optional correlation structure".

There are a few classes that extend corStruct, but if I try corSymm, things seem 
to work with:

K <- read.table("kinship.txt",...); # an n x n 'identity by state' matrix
cs.K <- corSymm(K[lower.tri(K)],fixed=T);
cs.K <- Initialize(cs.K,data=data.init);
test <- glmmPQL( response ~ snps_i + offset(log(offset)),
          random=~1|subject,correlation=cs.K, family="quasipoisson" );

# data.init is a 1-column matrix of unique subject ids.
# there are between 1-4 biological replicates per subject

I'm writing to ask is this the best model formulation?
Also, if I add a co-factor 'block', which has 6 levels (subjects do not occur in all 
blocks), I get a lot of NA errors:
test <- glmmPQL( response ~ snps_i + as.factor(block)+ offset(log(offset)), 
          random=~1|subject,correlation=cs.K, family="quasipoisson" );
iteration 1
iteration 2
iteration 3
Error in na.fail.default(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L,  : 
  missing values in object

Again, is there a better model formula?  Is corSymm even doing what I think it is 
doing here? 

Thanks for your time and any comments!
Matt
1 day later
#
On Tue, 24 Jan 2012, mhorton at uchicago.edu wrote:

            
I think so.  That error might be arising from incomplete data, as it says 
(do debug(lme) to see where, as this is what glmmPQL calls).  PQL might 
not be giving brilliant estimates of the random effects, but I presume you 
are most interested in snps_1.  Here is one result from a simulation of 
mine for a binomial trait, using R GLMM packages, augmented by your 
PQL

head(x)
   ped id fa mo sex trait locus
1   1  1 NA NA   m  <NA>   1/2
2   1  2 NA NA   f  <NA>   1/2
3   1  3  1  2   f     n   1/2
4   1  4  1  2   f     y   1/1
5   1  5  1  2   f     n   1/1
6   2  6 NA NA   f  <NA>   1/2


library(pedigreemm)
ped <- pedigree(x$fa, x$mo, x$id)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped), family=binomial(), 
data=x)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped), 
family=binomial(link=probit), data=x)

library(AnimalINLA)
ped2 <- x[,2:4]
ped2[is.na(ped2)] <- 0
Ainv <- compute.Ainverse(ped2)
pheno <- data.frame(id=x$id, trait=(x$trait=="y"), Individual=x$id)
pheno <- pheno[complete.cases(pheno$trait),]
m1 <- animal.inla("trait", genetic="id", Ainverse=Ainv, data=pheno, 
type.data = "binomial")
summary(m1)

library(MASS)
library(kinship)
K <- kinship(x$id, x$fa, x$mo)
observed <- !is.na(x$trait)
K <- K[observed, observed]
cs.K <- corSymm(2*K[lower.tri(K)],fixed=T)
id <- as.matrix(as.factor(x$id[observed]))
colnames(id) <- "id"
cs.K <- Initialize(cs.K,data=id)
pql1 <- glmmPQL(trait ~ 1, random=~1|id,
                 correlation=cs.K,
                 data=x[observed,], family="binomial
summary(pql1)

library(MCMCglmm)
pheno <- data.frame(animal=x$id, sire=x$fa, dam=x$mo, y=(x$trait=="y"))
g1 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
                family="categorical", data=pheno, verbose=FALSE)
summary(g1)
g2 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
                family="ordinal", data=pheno, verbose=FALSE)
summary(g2)

-----------------------------
logistic-normal
                  RE SD
pedigreemm       0.847
AnimalINLA       0.671
glmmPQL          1.185
MCMCglmm         1.389
#
Thanks David. I'll see if I can fit a kinship matrix using the other packages you show. 
It would also be great to setup the same test case to see how quickly any of this will 
actually run. I can't do that immediately, but I will try to post the results next week.

---- Original message ----
#
Hi,

Just a quick note on the MCMCglmm syntax: it is best to fix the  
residual variance a priori with binary data since it cannot be  
estimated. In order to make the variance comparable with the other  
methods you also need to rescale by 1/(1+a) for probit models or  
1/(1+c2*a) for logit models where "a" is the residual varaiance (I  
usually fix at one) and c2 is:

((16 * sqrt(3))/(15 * pi))^2

Cheers,

Jarrod


Quoting mhorton at uchicago.edu on Thu, 26 Jan 2012 10:10:45 -0600 (CST):