binary trait correlation across environments (experimental trials) using MCMCglmm? - R-SIG-mixed-models

Wed, Jan 2, 2019 8:33 AM #

Dear list,
I am trying to estimate the genetic correlation between binary traits
(survival) across trial (experimental) exposures of oysters to low
salinity.  I have a 50 family half-sib design (with replicates) that have
been deployed in a randomized design. Estimates of heritability with
MCMCglmm were reasonable in each of the two trials (same families used but
different individuals) but I am interested in estimating the genetic
correlation for survival between the two trials.

Code and suggestions I have read on this and other online forums seem to
apply to genetic correlations (rG) estimated for two traits measured on the
SAME individuals at the same time (environment) e.g. tarsus length and
color of birds.  For example, code like this from a a  2017 R-Sig-ME post
<https://stat.ethz.ch/pipermail/r-sig-mixed-models/2017q1/025532.html>:

*Bin.Bin.corr <- MCMCglmm(cbind(bin1, bin2) ~ trait - 1, random =
**~us(trait):animal, **rcov = ~corg(trait):units, family =
c("threshold",
**"threshold"), pedigree = Ped, prior = Prior1rG, **data = Data, nitt
= 4050000, burnin = 50000, thin = 2000, **verbose =**FALSE)*


Note that  Jerrod suggested different priors and modeling the
interaction of sex and animal with: *us(sex):animal *but Im not sure
this approach would applies to a cross-environment correlation
estimate?


So, my question is, what is the best way to proceed to estimate a
cross environment genetic correlation of a binary trait (survival)? My
data look like this (head and tail of file) where "Trial" is the
relevant 'environment' variable (with conditions A and B):






  animal
  Family
  rep
  sire
  dam
  Surv
  Trial


  20160010411501
  1
  A
  20135319911008
  20142889911003
  1
    A


  20160010411502
  1
  A
  20135319911008
  20142889911003
  1
    A


  20160010411503
  1
  A
  20135319911008
  20142889911003
  1
    A


  20160010411504
  1
  A
  20135319911008
  20142889911003
  1
    A


  20160010411505
  1
  A
  20135319911008
  20142889911003
  1
    A


  20160010411506
  1
  A
  20135319911008
  20142889911003
  1
    A


  .....








  animal
  Family
  rep
  sire
  dam
  Surv
  Trial


  20160880411657
  88
  C
  20140019911002
  20135319911007
  0
    B


  20160880411658
  88
  C
  20140019911002
  20135319911007
  0
    B


  20160880411659
  88
  C
  20140019911002
  20135319911007
  0
    B


  20160880411660
  88
  C
  20140019911002
  20135319911007
  0
    B


  20160880411661
  88
  C
  20140019911002
  20135319911007
  0
    B


  20160880411662
  88
  C
  20140019911002
  20135319911007
  0
    B



Here are two possible approaches I have come up with, though I am
unsure of their validity.


1) With my binary phenotype (Surv) indexed by Trial (A or B), I could
estimate the interaction of animal:Trial (e.g. ~us(Trial):animal) as
suggested by Jerrod for that previous R-Sig-Me post, however, its not
clear to me

what to do next with the outcome of that term?  Running a model like this:


bi_model_trial <- MCMCglmm(Surv ~ 1, random = ~us(Trial):animal,
family = "threshold",prior = prior, pedigree = pedigree.t, data =
trials, nitt = 5e+04, burnin = 15000, thin = 10)


the following output is achieved (I know the eff. sample sizes are WAY
too low, but just trying to understand what to do with the output)

Iterations = 15001:49991
 Thinning interval  = 10
 Sample size  = 3500

 DIC: 5868.55

 G-structure:  ~us(Trial):animal

                     post.mean l-95% CI u-95% CI eff.samp
TrialA:TrialA.animal    0.3803   0.1330   0.6587    27.94
TrialB:TrialA.animal    0.1074  -0.1775   0.3795    77.64
TrialA:TrialB.animal    0.1074  -0.1775   0.3795    77.64
TrialB:TrialB.animal    1.6707   0.6172   2.9186    17.74

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units    0.4744   0.2055   0.8418    15.82

 Location effects: phen ~ 1

            post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)     3.625    2.657    4.557    3.169 <3e-04 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

 Cutpoints:
                     post.mean l-95% CI u-95% CI eff.samp
cutpoint.traitphen.1     4.186    3.046    5.221    2.318


Im not sure how to interpret these interaction terms or how a genetic
correlation would be calculated from them? Is this even the right way
to setup the model for estimating genetic correlation between trials?


2) A Second approach would be to simply estimate the correlation of
family means (survival) between trials using a set of equations from
this   paper <https://onlinelibrary.wiley.com/doi/full/10.1111/j.1420-9101.2005.00997.x>.

It would seem most appropriate to use equation (5) and substitute Sire
variance in place of Family variance (half sib design), and thus
genetic correlation Rg = Variance.sire / (  Variance.sire +
Variance.Sire*Variance.Env).


The question is, how to get the appropriate interaction variance
components (Var. Sire*Env)   from the MCMCglmm output or how to code
the model appropriately? Perhaps another package is better suited for
this?


Any advice on my understanding of the problem (i.e. cross
environmental genetic correlation is different from simple bivariate
genetic correlation of traits within an environment/trial) and how to
accomplish the cross environmental genetic co

correlation ( approach 1 or 2 or something completely different),
would be most appreciated.


Thanks!


LVP




.

Jarrod Hadfield

Wed, Jan 2, 2019 12:17 PM #

Hi,

Your model bi_model_trial is the correct one. However you must fix the residual variance at one in the prior. You have estimated the genetic (co)variance matrix for the two trials, from which you can obtain the genetic correlation.  Alternatively, you could fit animal+animal:Trial which assumes the genetic variances in the two trials are the same and the correlation is positive. The correlation in this latter model is obtained as VAR(animal)/(VAR(animal)+VAR(animal:Trial)).  Also, there seems to be a problem with your Surv data as it has three levels rather than 2 and so a cutpoint is being estimated.

Cheers,

Jarrod

On 2 Jan 2019, at 16:33, Plough, Louis <lplough at umces.edu<mailto:lplough at umces.edu>> wrote:

bi_model_trial

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Plough, Louis

Wed, Jan 2, 2019 2:18 PM #

Thanks Jarrod for the prompt response!

I think the residual variance was set at 1.   My prior  looks like this:

prior <- list(R = list(V = 1, nu = 3), G = list(G1 = list(V = diag(2), nu =
2)))

Not sure why the difference for *nu* in R vs G (3 vs 2)...might have been a
typo. Is there guidance on setting the gamma parameter for this kind of
binary trait, cross environment correlation model with the 'threshold'
family? In the low iteration (toy) run I did for the R-Sig-ME post, the
correlation was a bit lower than I would expect based on simply phenotypic
correlation of family means, so I think I might need to tweak *nu *for
both?

*Can I also confirm that I am estimating the rG correctly with the
following code:*

corr.gen<-model_trial4$VCV[,'TrialA:TrialB.animal']/sqrt(model_trial4$VCV[,'TrialA:TrialA.animal']*model_trial4$VCV[,'TrialB:TrialB.animal'])

Any reason to use 'TrialA:TrialB.animal' vs 'TrialB:TrialA.animal' in the
numerator? Or are they basically equivalent? I wouldn't want add them,
would I?

Thanks for your help!!!

LVP

On Wed, Jan 2, 2019 at 3:17 PM HADFIELD Jarrod <j.hadfield at ed.ac.uk> wrote:

Jarrod Hadfield

Thu, Jan 3, 2019 1:24 AM #

Hi,

The prior is not fixed at one. Also, the priors you are using are quite informative. I would use

prior <- list(R = list(V = 1, fix=1), G = list(G1 = list(V = diag(2), nu = 2, alpha.mu=c(0,0), alpha.V=diag(2)*100)))

Your equation for the genetic correlation is correct.

You shouldn't expect the correlation in family means to equal the model based estimate of the genetic correlation for a number of reasons; most importantly they are on different scales (data versus latent) and the variances of the family means contains residual variation but the covariance doesn't so the correlation in family means is a (downwardly) biased estimator.

Cheers,

Jarrod

On 02/01/2019 22:18, Plough, Louis wrote:

Thanks Jarrod for the prompt response!

I think the residual variance was set at 1.   My prior  looks like this:

prior <- list(R = list(V = 1, nu = 3), G = list(G1 = list(V = diag(2), nu = 2)))

Not sure why the difference for nu in R vs G (3 vs 2)...might have been a typo. Is there guidance on setting the gamma parameter for this kind of binary trait, cross environment correlation model with the 'threshold' family? In the low iteration (toy) run I did for the R-Sig-ME post, the correlation was a bit lower than I would expect based on simply phenotypic correlation of family means, so I think I might need to tweak nu for both?

Can I also confirm that I am estimating the rG correctly with the following code:

corr.gen<-model_trial4$VCV[,'TrialA:TrialB.animal']/sqrt(model_trial4$VCV[,'TrialA:TrialA.animal']*model_trial4$VCV[,'TrialB:TrialB.animal'])

Any reason to use 'TrialA:TrialB.animal' vs 'TrialB:TrialA.animal' in the numerator? Or are they basically equivalent? I wouldn't want add them, would I?

Thanks for your help!!!

LVP

On Wed, Jan 2, 2019 at 3:17 PM HADFIELD Jarrod <j.hadfield at ed.ac.uk<mailto:j.hadfield at ed.ac.uk>> wrote:

Hi,

Your model bi_model_trial is the correct one. However you must fix the residual variance at one in the prior. You have estimated the genetic (co)variance matrix for the two trials, from which you can obtain the genetic correlation.  Alternatively, you could fit animal+animal:Trial which assumes the genetic variances in the two trials are the same and the correlation is positive. The correlation in this latter model is obtained as VAR(animal)/(VAR(animal)+VAR(animal:Trial)).  Also, there seems to be a problem with your Surv data as it has three levels rather than 2 and so a cutpoint is being estimated.

Cheers,

Jarrod

On 2 Jan 2019, at 16:33, Plough, Louis <lplough at umces.edu<mailto:lplough at umces.edu>> wrote:

bi_model_trial

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Plough, Louis

Sun, Jan 6, 2019 11:40 AM #

Hi Jarrod et al.
Thanks for these suggestions. The model coding the interaction as
~us(Trait):animal is working (running), but convergence appears to be
a way off even after 2 or 3 million iterations, especially for the
TrialB variance ( survival in the second environment).  Would another
family (ordinal?) be worth considering over the threshold family? Or
changing the prior somehow, e.g.  bumping up the alpha.V=diag(2)*1000?
 We do expect some positive correlation between the two trials...i.e.
should not be zero.

Coding the model the other way you suggested as animal+animal:Trait
gives me the following error:

Error in buildZ(rmodel.terms[r], data = data, nginverse = names(ginverse)) :

  interactions not permitted between standard random effects and those
associated with ginverse

I tried used a similar prior as for the above model, but changed the
number of structures... (not sure if that is the right prior here):

For either model, I have been studying your tutorials for how to tweak
the priors in a sensible way (for a binary trait), but I still
struggle with how the various supplementary parameters alter the prior
distribution or CDF... you had mentioned that my previous priors were
'very informative', but in which way? Towards zero or too high?

LVP

On Thu, Jan 3, 2019 at 4:24 AM HADFIELD Jarrod <j.hadfield at ed.ac.uk> wrote:

Hi,

The prior is not fixed at one. Also, the priors you are using are quite informative. I would use

prior <- list(R = list(V = 1, fix=1), G = list(G1 = list(V = diag(2), nu = 2, alpha.mu=c(0,0), alpha.V=diag(2)*100)))

Your equation for the genetic correlation is correct.

You shouldn't expect the correlation in family means to equal the model based estimate of the genetic correlation for a number of reasons; most importantly they are on different scales (data versus latent) and the variances of the family means contains residual variation but the covariance doesn't so the correlation in family means is a (downwardly) biased estimator.

Cheers,

Jarrod






On 02/01/2019 22:18, Plough, Louis wrote:

Thanks Jarrod for the prompt response!

I think the residual variance was set at 1.   My prior  looks like this:

prior <- list(R = list(V = 1, nu = 3), G = list(G1 = list(V = diag(2), nu = 2)))

Not sure why the difference for nu in R vs G (3 vs 2)...might have been a typo. Is there guidance on setting the gamma parameter for this kind of binary trait, cross environment correlation model with the 'threshold' family? In the low iteration (toy) run I did for the R-Sig-ME post, the correlation was a bit lower than I would expect based on simply phenotypic correlation of family means, so I think I might need to tweak nu for both?

Can I also confirm that I am estimating the rG correctly with the following code:

corr.gen<-model_trial4$VCV[,'TrialA:TrialB.animal']/sqrt(model_trial4$VCV[,'TrialA:TrialA.animal']*model_trial4$VCV[,'TrialB:TrialB.animal'])

Any reason to use 'TrialA:TrialB.animal' vs 'TrialB:TrialA.animal' in the numerator? Or are they basically equivalent? I wouldn't want add them, would I?

Thanks for your help!!!

LVP



On Wed, Jan 2, 2019 at 3:17 PM HADFIELD Jarrod <j.hadfield at ed.ac.uk> wrote:

Hi,

Your model bi_model_trial is the correct one. However you must fix the residual variance at one in the prior. You have estimated the genetic (co)variance matrix for the two trials, from which you can obtain the genetic correlation.  Alternatively, you could fit animal+animal:Trial which assumes the genetic variances in the two trials are the same and the correlation is positive. The correlation in this latter model is obtained as VAR(animal)/(VAR(animal)+VAR(animal:Trial)).  Also, there seems to be a problem with your Surv data as it has three levels rather than 2 and so a cutpoint is being estimated.

Cheers,

Jarrod






On 2 Jan 2019, at 16:33, Plough, Louis <lplough at umces.edu> wrote:

bi_model_trial


The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Jarrod Hadfield

Tue, Jan 8, 2019 3:15 AM #

Hi,

Is it possible to share the data - it is hard to troubleshoot otherwise?

Cheers,

Jarrod

On 06/01/2019 19:40, Plough, Louis wrote:

Hi Jarrod et al.
Thanks for these suggestions. The model coding the interaction as
~us(Trait):animal is working (running), but convergence appears to be
a way off even after 2 or 3 million iterations, especially for the
TrialB variance ( survival in the second environment).  Would another
family (ordinal?) be worth considering over the threshold family? Or
changing the prior somehow, e.g.  bumping up the alpha.V=diag(2)*1000?
  We do expect some positive correlation between the two trials...i.e.
should not be zero.

Coding the model the other way you suggested as animal+animal:Trait
gives me the following error:

bi_model_trial_pos <- MCMCglmm(phen ~ 1, random = ~animal+animal:Trial, family = "threshold",prior = prior_b, pedigree = pedigree.t, data = trials_up, nitt = 5e+05, burnin = 25000, thin = 20)

Error in buildZ(rmodel.terms[r], data = data, nginverse = names(ginverse)) :

   interactions not permitted between standard random effects and those
associated with ginverse

I tried used a similar prior as for the above model, but changed the
number of structures... (not sure if that is the right prior here):

prior_b <- list(R = list(V = 1, fix=1), G = list(G1 = list(V = 1, nu = 2),G2=list(V=1,alpha.mu=0, nu=2)))

For either model, I have been studying your tutorials for how to tweak
the priors in a sensible way (for a binary trait), but I still
struggle with how the various supplementary parameters alter the prior
distribution or CDF... you had mentioned that my previous priors were
'very informative', but in which way? Towards zero or too high?

LVP


On Thu, Jan 3, 2019 at 4:24 AM HADFIELD Jarrod <j.hadfield at ed.ac.uk> wrote:

Hi,

The prior is not fixed at one. Also, the priors you are using are quite informative. I would use

prior <- list(R = list(V = 1, fix=1), G = list(G1 = list(V = diag(2), nu = 2, alpha.mu=c(0,0), alpha.V=diag(2)*100)))

Your equation for the genetic correlation is correct.

You shouldn't expect the correlation in family means to equal the model based estimate of the genetic correlation for a number of reasons; most importantly they are on different scales (data versus latent) and the variances of the family means contains residual variation but the covariance doesn't so the correlation in family means is a (downwardly) biased estimator.

Cheers,

Jarrod






On 02/01/2019 22:18, Plough, Louis wrote:

Thanks Jarrod for the prompt response!

I think the residual variance was set at 1.   My prior  looks like this:

prior <- list(R = list(V = 1, nu = 3), G = list(G1 = list(V = diag(2), nu = 2)))

Not sure why the difference for nu in R vs G (3 vs 2)...might have been a typo. Is there guidance on setting the gamma parameter for this kind of binary trait, cross environment correlation model with the 'threshold' family? In the low iteration (toy) run I did for the R-Sig-ME post, the correlation was a bit lower than I would expect based on simply phenotypic correlation of family means, so I think I might need to tweak nu for both?

Can I also confirm that I am estimating the rG correctly with the following code:

corr.gen<-model_trial4$VCV[,'TrialA:TrialB.animal']/sqrt(model_trial4$VCV[,'TrialA:TrialA.animal']*model_trial4$VCV[,'TrialB:TrialB.animal'])

Any reason to use 'TrialA:TrialB.animal' vs 'TrialB:TrialA.animal' in the numerator? Or are they basically equivalent? I wouldn't want add them, would I?

Thanks for your help!!!

LVP



On Wed, Jan 2, 2019 at 3:17 PM HADFIELD Jarrod <j.hadfield at ed.ac.uk> wrote:

Hi,

Your model bi_model_trial is the correct one. However you must fix the residual variance at one in the prior. You have estimated the genetic (co)variance matrix for the two trials, from which you can obtain the genetic correlation.  Alternatively, you could fit animal+animal:Trial which assumes the genetic variances in the two trials are the same and the correlation is positive. The correlation in this latter model is obtained as VAR(animal)/(VAR(animal)+VAR(animal:Trial)).  Also, there seems to be a problem with your Surv data as it has three levels rather than 2 and so a cutpoint is being estimated.

Cheers,

Jarrod






On 2 Jan 2019, at 16:33, Plough, Louis <lplough at umces.edu> wrote:

bi_model_trial


The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.