Hello, I'd like to test the mixed-model approach taken in EMMA (-X): http://www.nature.com/ng/journal/v42/n4/abs/ng.548.html in a GLMM framework to analyze non-normal data (e.g. count data; sometimes there are lots of zeros). I'm trying this with MASS' glmmPQL, which seems to allow the user to provide an "an optional correlation structure". There are a few classes that extend corStruct, but if I try corSymm, things seem to work with: K <- read.table("kinship.txt",...); # an n x n 'identity by state' matrix cs.K <- corSymm(K[lower.tri(K)],fixed=T); cs.K <- Initialize(cs.K,data=data.init); test <- glmmPQL( response ~ snps_i + offset(log(offset)), random=~1|subject,correlation=cs.K, family="quasipoisson" ); # data.init is a 1-column matrix of unique subject ids. # there are between 1-4 biological replicates per subject I'm writing to ask is this the best model formulation? Also, if I add a co-factor 'block', which has 6 levels (subjects do not occur in all blocks), I get a lot of NA errors: test <- glmmPQL( response ~ snps_i + as.factor(block)+ offset(log(offset)), random=~1|subject,correlation=cs.K, family="quasipoisson" ); iteration 1 iteration 2 iteration 3 Error in na.fail.default(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, : missing values in object Again, is there a better model formula? Is corSymm even doing what I think it is doing here? Thanks for your time and any comments! Matt
adding a kinship matrix to a GLMM
4 messages · mhorton at uchicago.edu, David Duffy, Jarrod Hadfield
1 day later
On Tue, 24 Jan 2012, mhorton at uchicago.edu wrote:
I'm trying this with MASS' glmmPQL, which seems to
allow the user to provide an "an optional correlation structure".
There are a few classes that extend corStruct, but if I try corSymm, things seem
to work with:
I'm writing to ask is this the best model formulation?
Also, if I add a co-factor 'block', which has 6 levels (subjects do not occur in all
blocks), I get a lot of NA errors:
test <- glmmPQL( response ~ snps_i + as.factor(block)+ offset(log(offset)),
random=~1|subject,correlation=cs.K, family="quasipoisson" );
iteration 1
iteration 2
iteration 3
Error in na.fail.default(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, :
missing values in object
Again, is there a better model formula? Is corSymm even doing what I think it is
doing here?
I think so. That error might be arising from incomplete data, as it says
(do debug(lme) to see where, as this is what glmmPQL calls). PQL might
not be giving brilliant estimates of the random effects, but I presume you
are most interested in snps_1. Here is one result from a simulation of
mine for a binomial trait, using R GLMM packages, augmented by your
PQL
head(x)
ped id fa mo sex trait locus
1 1 1 NA NA m <NA> 1/2
2 1 2 NA NA f <NA> 1/2
3 1 3 1 2 f n 1/2
4 1 4 1 2 f y 1/1
5 1 5 1 2 f n 1/1
6 2 6 NA NA f <NA> 1/2
library(pedigreemm)
ped <- pedigree(x$fa, x$mo, x$id)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped), family=binomial(),
data=x)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped),
family=binomial(link=probit), data=x)
library(AnimalINLA)
ped2 <- x[,2:4]
ped2[is.na(ped2)] <- 0
Ainv <- compute.Ainverse(ped2)
pheno <- data.frame(id=x$id, trait=(x$trait=="y"), Individual=x$id)
pheno <- pheno[complete.cases(pheno$trait),]
m1 <- animal.inla("trait", genetic="id", Ainverse=Ainv, data=pheno,
type.data = "binomial")
summary(m1)
library(MASS)
library(kinship)
K <- kinship(x$id, x$fa, x$mo)
observed <- !is.na(x$trait)
K <- K[observed, observed]
cs.K <- corSymm(2*K[lower.tri(K)],fixed=T)
id <- as.matrix(as.factor(x$id[observed]))
colnames(id) <- "id"
cs.K <- Initialize(cs.K,data=id)
pql1 <- glmmPQL(trait ~ 1, random=~1|id,
correlation=cs.K,
data=x[observed,], family="binomial
summary(pql1)
library(MCMCglmm)
pheno <- data.frame(animal=x$id, sire=x$fa, dam=x$mo, y=(x$trait=="y"))
g1 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
family="categorical", data=pheno, verbose=FALSE)
summary(g1)
g2 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
family="ordinal", data=pheno, verbose=FALSE)
summary(g2)
-----------------------------
logistic-normal
RE SD
pedigreemm 0.847
AnimalINLA 0.671
glmmPQL 1.185
MCMCglmm 1.389
| David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
Thanks David. I'll see if I can fit a kinship matrix using the other packages you show. It would also be great to setup the same test case to see how quickly any of this will actually run. I can't do that immediately, but I will try to post the results next week. ---- Original message ----
Date: Thu, 26 Jan 2012 10:33:21 +1000 (EST) From: "David Duffy" <David.Duffy at qimr.edu.au> Subject: Re: [R-sig-ME] adding a kinship matrix to a GLMM To: <mhorton at uchicago.edu> Cc: <r-sig-mixed-models at r-project.org> On Tue, 24 Jan 2012, mhorton at uchicago.edu wrote:
I'm trying this with MASS' glmmPQL, which seems to
allow the user to provide an "an optional correlation structure".
There are a few classes that extend corStruct, but if I try corSymm, things seem
to work with:
I'm writing to ask is this the best model formulation?
Also, if I add a co-factor 'block', which has 6 levels (subjects do not occur in all
blocks), I get a lot of NA errors:
test <- glmmPQL( response ~ snps_i + as.factor(block)+ offset(log(offset)),
random=~1|subject,correlation=cs.K, family="quasipoisson" );
iteration 1
iteration 2
iteration 3
Error in na.fail.default(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, :
missing values in object
Again, is there a better model formula? Is corSymm even doing what I think it is
doing here?
I think so. That error might be arising from incomplete data, as it says
(do debug(lme) to see where, as this is what glmmPQL calls). PQL might
not be giving brilliant estimates of the random effects, but I presume you
are most interested in snps_1. Here is one result from a simulation of
mine for a binomial trait, using R GLMM packages, augmented by your
PQL
head(x)
ped id fa mo sex trait locus
1 1 1 NA NA m <NA> 1/2
2 1 2 NA NA f <NA> 1/2
3 1 3 1 2 f n 1/2
4 1 4 1 2 f y 1/1
5 1 5 1 2 f n 1/1
6 2 6 NA NA f <NA> 1/2
library(pedigreemm)
ped <- pedigree(x$fa, x$mo, x$id)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped), family=binomial(),
data=x)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped),
family=binomial(link=probit), data=x)
library(AnimalINLA)
ped2 <- x[,2:4]
ped2[is.na(ped2)] <- 0
Ainv <- compute.Ainverse(ped2)
pheno <- data.frame(id=x$id, trait=(x$trait=="y"), Individual=x$id)
pheno <- pheno[complete.cases(pheno$trait),]
m1 <- animal.inla("trait", genetic="id", Ainverse=Ainv, data=pheno,
type.data = "binomial")
summary(m1)
library(MASS)
library(kinship)
K <- kinship(x$id, x$fa, x$mo)
observed <- !is.na(x$trait)
K <- K[observed, observed]
cs.K <- corSymm(2*K[lower.tri(K)],fixed=T)
id <- as.matrix(as.factor(x$id[observed]))
colnames(id) <- "id"
cs.K <- Initialize(cs.K,data=id)
pql1 <- glmmPQL(trait ~ 1, random=~1|id,
correlation=cs.K,
data=x[observed,], family="binomial
summary(pql1)
library(MCMCglmm)
pheno <- data.frame(animal=x$id, sire=x$fa, dam=x$mo, y=(x$trait=="y"))
g1 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
family="categorical", data=pheno, verbose=FALSE)
summary(g1)
g2 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
family="ordinal", data=pheno, verbose=FALSE)
summary(g2)
-----------------------------
logistic-normal
RE SD
pedigreemm 0.847
AnimalINLA 0.671
glmmPQL 1.185
MCMCglmm 1.389
--
| David Duffy (MBBS PhD) ,-_|\
| email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / *
| Epidemiology Unit, Queensland Institute of Medical Research \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
Hi, Just a quick note on the MCMCglmm syntax: it is best to fix the residual variance a priori with binary data since it cannot be estimated. In order to make the variance comparable with the other methods you also need to rescale by 1/(1+a) for probit models or 1/(1+c2*a) for logit models where "a" is the residual varaiance (I usually fix at one) and c2 is: ((16 * sqrt(3))/(15 * pi))^2 Cheers, Jarrod Quoting mhorton at uchicago.edu on Thu, 26 Jan 2012 10:10:45 -0600 (CST):
Thanks David. I'll see if I can fit a kinship matrix using the other packages you show. It would also be great to setup the same test case to see how quickly any of this will actually run. I can't do that immediately, but I will try to post the results next week. ---- Original message ----
Date: Thu, 26 Jan 2012 10:33:21 +1000 (EST) From: "David Duffy" <David.Duffy at qimr.edu.au> Subject: Re: [R-sig-ME] adding a kinship matrix to a GLMM To: <mhorton at uchicago.edu> Cc: <r-sig-mixed-models at r-project.org> On Tue, 24 Jan 2012, mhorton at uchicago.edu wrote:
I'm trying this with MASS' glmmPQL, which seems to
allow the user to provide an "an optional correlation structure".
There are a few classes that extend corStruct, but if I try
corSymm, things seem
to work with:
I'm writing to ask is this the best model formulation?
Also, if I add a co-factor 'block', which has 6 levels (subjects
do not occur in all
blocks), I get a lot of NA errors:
test <- glmmPQL( response ~ snps_i + as.factor(block)+ offset(log(offset)),
random=~1|subject,correlation=cs.K, family="quasipoisson" );
iteration 1
iteration 2
iteration 3
Error in na.fail.default(list(subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, :
missing values in object
Again, is there a better model formula? Is corSymm even doing
what I think it is
doing here?
I think so. That error might be arising from incomplete data, as it says
(do debug(lme) to see where, as this is what glmmPQL calls). PQL might
not be giving brilliant estimates of the random effects, but I presume you
are most interested in snps_1. Here is one result from a simulation of
mine for a binomial trait, using R GLMM packages, augmented by your
PQL
head(x)
ped id fa mo sex trait locus
1 1 1 NA NA m <NA> 1/2
2 1 2 NA NA f <NA> 1/2
3 1 3 1 2 f n 1/2
4 1 4 1 2 f y 1/1
5 1 5 1 2 f n 1/1
6 2 6 NA NA f <NA> 1/2
library(pedigreemm)
ped <- pedigree(x$fa, x$mo, x$id)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped), family=binomial(),
data=x)
pedigreemm(trait=="y" ~ (1|id), pedigree=list(id=ped),
family=binomial(link=probit), data=x)
library(AnimalINLA)
ped2 <- x[,2:4]
ped2[is.na(ped2)] <- 0
Ainv <- compute.Ainverse(ped2)
pheno <- data.frame(id=x$id, trait=(x$trait=="y"), Individual=x$id)
pheno <- pheno[complete.cases(pheno$trait),]
m1 <- animal.inla("trait", genetic="id", Ainverse=Ainv, data=pheno,
type.data = "binomial")
summary(m1)
library(MASS)
library(kinship)
K <- kinship(x$id, x$fa, x$mo)
observed <- !is.na(x$trait)
K <- K[observed, observed]
cs.K <- corSymm(2*K[lower.tri(K)],fixed=T)
id <- as.matrix(as.factor(x$id[observed]))
colnames(id) <- "id"
cs.K <- Initialize(cs.K,data=id)
pql1 <- glmmPQL(trait ~ 1, random=~1|id,
correlation=cs.K,
data=x[observed,], family="binomial
summary(pql1)
library(MCMCglmm)
pheno <- data.frame(animal=x$id, sire=x$fa, dam=x$mo, y=(x$trait=="y"))
g1 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
family="categorical", data=pheno, verbose=FALSE)
summary(g1)
g2 <- MCMCglmm(y~1, random=~animal, pedigree=pheno[,1:3],
family="ordinal", data=pheno, verbose=FALSE)
summary(g2)
-----------------------------
logistic-normal
RE SD
pedigreemm 0.847
AnimalINLA 0.671
glmmPQL 1.185
MCMCglmm 1.389
--
| David Duffy (MBBS PhD) ,-_|\
| email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / *
| Epidemiology Unit, Queensland Institute of Medical Research \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.