Interaction variance from mcmcglmm
Thanks Paul, I have tried with the suggested expanded model, using simulated data where I have added Y-source * Background interactions. See below. Is it correct then to say that the variance estimate of "BG:Y.Source" is variance in lifespan explained by the interaction between the genetic background (autosomes and X source) and the source of the Y? Then the phenotypic variance would be the sum of all components (2.910), and the interaction BG*Y.Source would explain 100*(0.902/2.910) = 30.98% of the variance in lifespan?
colMeans(NestChain_Mod2$VCV)
BG Y.Source Y.Source:Y BG:Y.Source BG:Y.Source:Y Vial units 1.0029972054 0.4037536287 0.0008707007 0.9017016510 0.0002926541 0.0013600115 0.5993455117 I also have a follow up on the nesting, I may need some explanation of the syntax in the model you proposed (what are : and / doing?), but I read that as having Y nested within Y.Source, which is then nested within the background (BG). I understand that Y should be nested within Y.Source (because each Y tested can only be from one source population), however, given that all Y chromosomes are tested in all backgrounds (autosome and X source), wouldn't it be inappropriate to nest within BG? i.e. Y's A1-A34 are all tested in CN, DH, and ZW, Y's B1-B34 are also all tested in CN, DH, and ZW... Thanks, Rob Here's the script I used: ############ prior3 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000), G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000), G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000), G4 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000), G5 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000), G6 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)), R = list(V = 1, nu=0.002)) # Y source * BG variance dummy$Age2 = ifelse(dummy$Y.Source=="A"&dummy$BG=="CN", dummy$Age+15, dummy$Age) dummy$Age2 = ifelse(dummy$Y.Source=="B"&dummy$BG=="CN", abs(dummy$Age-15), dummy$Age2) dummy$Age2 = ifelse(dummy$Y.Source=="C"&dummy$BG=="DH", dummy$Age+15, dummy$Age2) dummy$Age2 = ifelse(dummy$Y.Source=="C"&dummy$BG=="ZW", abs(dummy$Age-15), dummy$Age2) dummy$Age2 = ifelse(dummy$Y.Source=="A"&dummy$BG=="DH", abs(dummy$Age-15), dummy$Age2) dummy$Age2 = ifelse(dummy$Y.Source=="B"&dummy$BG=="ZW", dummy$Age+15, dummy$Age2) boxplot(dummy$Age2 ~ dummy$Y.Source*dummy$BG) dummy$Score2 = (dummy$Age2 - mean(dummy$Age2)) / sd(dummy$Age2) # Chain NestChain_Mod2 = MCMCglmm(Score2 ~1 + Block, random = ~BG + Y.Source + Y.Source:Y + BG:Y.Source + BG:Y.Source:Y + Vial, rcov = ~units, nitt = 5000, thin = 5, burnin = 1000, prior = prior3, family = "gaussian", start = list(QUASI = FALSE), data = dummy) file = paste(filepath, "NestChain_mod2.rda", sep = "") save(NestChain_Mod2, file = file) NestChain_mod2 = load(paste(filepath, "NestChain_mod2.rda", sep= "")) # Variance components colMeans(NestChain_Mod2$VCV) colMeans(NestChain_Mod2$VCV)[1]/sum(colMeans(NestChain_Mod2$VCV))*100 colMeans(NestChain_Mod2$VCV)[2]/sum(colMeans(NestChain_Mod2$VCV))*100 colMeans(NestChain_Mod2$VCV)[3]/sum(colMeans(NestChain_Mod2$VCV))*100 colMeans(NestChain_Mod2$VCV)[4]/sum(colMeans(NestChain_Mod2$VCV))*100 colMeans(NestChain_Mod2$VCV)[5]/sum(colMeans(NestChain_Mod2$VCV))*100 colMeans(NestChain_Mod2$VCV)[6]/sum(colMeans(NestChain_Mod2$VCV))*100 colMeans(NestChain_Mod2$VCV)[7]/sum(colMeans(NestChain_Mod2$VCV))*100 --------------------------- --- Robert Griffin PhD griffinevo.wordpress.com
On Tue, Nov 3, 2015 at 6:29 PM, paul debes <paul.debes at utu.fi> wrote:
Dear Rob, Unfortunately, I cannot help with the MCMCglmm prior specifications and hope someone else will do this, but maybe with the model. I think that fitting two separate models might lead to different results than one common model when you have a significant but ignored covariance between effects across populations when estimating only the within population variance. As the Y effects are nested within Y.Source effects (they are, right?), one initial full model to estimate within and across population variance, and including the interaction with the genetic background effects, could be: fixed = Score ~ 1 + Block, random = ~ BG + Y.Source/Y + BG:Y.Source/Y + Vial, expanded: random = ~ BG + Y.Source + Y.Source:Y + BG:Y.Source + BG:Y.Source:Y + Vial, Best, Paul On Tue, 03 Nov 2015 17:01:32 +0200, Robert Griffin < robgriffin247 at hotmail.com> wrote: Dear list members,
I recently published work using mcmcglmm to test for Y linked genetic
variance, finding variance within a population. This data was collected by
measuring phenotypic data from randomly sampled Y chromosomes from within
a
population, tested in a single genetic background, using lifespan as the
response, and Y chromosome as a random effect.
I have been trying to think of how I would improve such a study and
concluded that a more complete study would be done using
- Y chromosomes from multiple source populations (to test for among
population variance)
- multiple genetic backgrounds (to distinguish between [direct] genetic
effects, and epistatic effects)...
This got me thinking about the analysis of such data. I would need to
estimate how much of the Y-linked variance within and among populations is
genetic and epistatic. This has lead me to two main questions where I
could
use some guidance:
1) can mcmcglmm be used to estimate the interaction (epistatic) variance?
- I think the answer is yes but I have as yet been unable to find clear
advice on this, how can I add interaction variance term in to mcmcglmm?
- I can add the variance explained by the genetic background by adding
this
as a random effect too, but (how) can I find the interaction between two
random explanatory variables?
2) could I do it all from a single model or would I need multiple models?
- If I built a model where the trait (Lifespan) is the response variable,
with random effects of Y chromosome, Y chromosome source population,
genetic background, and the interactions between Y/Y-source and genetic
background, would I be able to get estimates of the direct genetic and
epistatic variances within and among the source populations?
- I could build two models, one with Y chromosome and genetic background
as
random effects (including the interaction), and the other with
Y-chromosome
source population and genetic background as random effects (including the
interaction). These two models would separately estimate the within and
among population variances. I think this would be the correct approach.
However, I'm unconvinced that the latter would give the desired division
in
to direct and epistatic Y-linked variation.
Thanks,
Rob
*Please kindly note*, advice on prior specification when estimating
interaction effects would be also very highly appreciated, remembering
that
the variance components are likely very small. I have included a
demonstration script which makes some dummy data, and shows some of the
proposed models. The script includes a filepath which you can edit if you
wish to save and load chains, and switches at line 56 to turn the chains
on/off when running scripts.
##############
## R SCRIPT ##
##############
# Y-background interaction dummy data
rm(list=ls())
library("lme4")
library("MCMCglmm")
set.seed(24)
# Set filepath (CUSTOMISE THIS IF YOU WISH TO BE ABLE TO SAVE AND LOAD
CHAINS)
filepath = paste0("C:\\Users\\Rob\\Notes\\Y_epistasis\\")
#### Description of dummy data ####
# 20 Y chromosomes randomly sampled from 3 populations (A1:A20, B1:B20,
C1:C20, n = 60)
# Test in genetic backgrounds (BG) from three populations (Canton [CN],
Zimbabwe [ZW], and Dahomey [DH])
# Measure a trait (Lifepsan) in 90 individuals per combination of Y & BG
# Measured across three replicate blocks (BL1, BL2, BL3), with two vials
each per block
n1 = 20 # Number of Y's sampled per source population
n2 = 30 # Number of measurements per block/line
n3 = 3 # Number of blocks (BL1, BL2, BL3)
n4 = 3 # Number of genetic backgrounds (CN, ZW, DH)
n5 = 3 # Number of Y source populations (A, B, C)
n6 = 2 # Number of vials per block/Y/BG
# Construction of data
dummy = data.frame(
as.factor(rep(c(paste0("A",1:n1), paste0("B",1:n1), paste0("C",1:n1)),
each
= n2*n3, times = n4)),
as.factor(rep(c("A","B","C"), each = n1*n2*n3, times = n5)),
as.factor(rep(c("CN","ZW","DH"), each = n1*n2*n3*n4, times = 1)),
as.factor(rep(c("BL1","BL2","BL3"), each = n2)),
as.factor(rep(c("A","B"), each = n2 / n6)),
as.numeric(abs(round(rnorm(n1*n2*n3*n4*n5, 50, 15),0)))
)
colnames(dummy) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age")
dummy$Vial = paste(dummy$Y, dummy$Y.Source, dummy$BG, dummy$Block,
dummy$Vial, sep = "_")
head(dummy)
# Mean zero unit variance for Age (response variable)
dummy$Score = (dummy$Age - mean(dummy$Age)) / sd(dummy$Age)
# Subset DFs (Only one source population and one genetic background, as in
previous JEB paper)
dummy_A = data.frame(split(dummy , f = dummy$Y.Source)[1])
dummy_A_CN = data.frame(split(dummy_A , f = dummy_A$A.BG)[1])
colnames(dummy_A) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
"Score")
colnames(dummy_A_CN) = c("Y", "Y.Source", "BG", "Block", "Vial", "Age",
"Score")
# MCMC Priors
prior1 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
R = list(V = 1, nu=0.002))
prior2 = list(G = list( G1 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G2 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000),
G3 = list(V = 1, nu=1, alpha.mu=0, alpha.V=1000)),
R = list(V = 1, nu=0.002))
# MCMC Switches
testing = "y"
runChain_A_CN1 = "y"
runChain_A1 = "y"
runChain_1 = "y"
# MCMC Details
if(testing == "y"){
nitt = 10000
burn = 1000
thin = 10
}else{
nitt = 100000
burn = 10000
thin = 100
}
# MCMC Chain (as in JEB Paper: One Y source population, One genetic
background)
# Within-population Y-linked variation
if(runChain_A_CN1=="y"){
Chain_A_CN1 = MCMCglmm(Score ~1 + Block,
random = ~Y + Vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burn,
prior = prior1,
family = "gaussian",
start = list(QUASI = FALSE),
data = dummy_A_CN)
file = paste(filepath, "Chain_A_CN1.rda", sep = "")
save(Chain_A_CN1, file = file)
}else{}
##############################
#### Proposed MCMC Chains ####
##############################
# Within-population Y-linked variation split in to direct and epistatic
if(runChain_A1=="y"){
Chain_A1 = MCMCglmm(Score ~1 + Block,
random = ~Y + BG + Vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burn,
prior = prior2,
family = "gaussian",
start = list(QUASI = FALSE),
data = dummy_A)
file = paste(filepath, "Chain_A1.rda", sep = "")
save(Chain_A1, file = file)
}else{}
# Among-population Y-linked variation split in to direct and epistatic
if(runChain_1=="y"){
Chain_1 = MCMCglmm(Score ~1 + Block,
random = ~Y.Source + BG + Vial,
rcov = ~units,
nitt = nitt,
thin = thin,
burnin = burn,
prior = prior2,
family = "gaussian",
start = list(QUASI = FALSE),
data = dummy)
file = paste(filepath, "Chain_1.rda", sep = "")
save(Chain_1, file = file)
}else{}
Chain.A_CN1 = load(paste(filepath, "Chain_A_CN1.rda", sep= ""))
Chain.A1 = load(paste(filepath, "Chain_A1.rda", sep= ""))
Chain.1 = load(paste(filepath, "Chain_1.rda", sep= ""))
[[alternative HTML version deleted]]
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
-- Paul Debes DFG Research Fellow University of Turku Department of Biology It?inen Pitk?katu 4 20520 Turku Finland
_______________________________________________ R-sig-mixed-models at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models