Skip to content

Question regarding MCMCglmm model structure for correlated responses

6 messages · Jannik Vindeløv, Jarrod Hadfield

#
Dear List,

I need to compare two systems ("Bulk" and "DVS"), each consisting of a number of "Cultures" (e.g. Bulk: B1 - B5 and DVS: A - E). I have repeated measurements of each culture in each system (more from the Bulk system than the DVS system) with regards to pH and composition variables. The composition variables: Moi, Fat, NFS, are highly correlated because they need to add to 100%. pH may also be correlated to the composition variables.

Furthermore, I assume that the residual variance of all composition variables can be estimated together within each system (sigma comp). And I assume, that the residual variance will be the same for each culture within a system, but not between systems.  

I would like to answer questions such as:
1) Are the performance of the systems the same? (i.e. do Bulk and DVS systems have the same mean and culture variance within each response variable)?
2) Are the performance of the cultures within each system the same? (i.e. do they have the same mean, assuming the variance to be the same) 
3) Are the responses correlated and by how much?

I assume that I can use MCMCglmm to do the analysis, as it handles multiple responses and marginal contrasts (means and variance) are straight forward to calculate from the MCMC chains. But I?am unsure how to specify the fixed, random and residual covariance in the function interface (e.g. how do I tell the function to estimate the same variance for the composition variables?).

To make my request more specific, I add a synthetic data set that can be used for the analysis:

# Packages
library(dplyr)

# Design
df_sim <- data.frame(System = c(rep("Bulk", 75), rep("DVS", 25)), Culture = c(rep(c("B1", "B2", "B3", "B4", "B5"), each = 15), rep(c("A", "B", "C", "D", "E"), each = 5)))
df_sim$Culture <- ordered(df_sim$Culture, levels = c("B1", "B2", "B3", "B4", "B5", "A", "B", "C", "D", "E"))

# Responses
set.seed(1905) # start from the same place everytime
df_sim <- df_sim %>%
  mutate(pH = c(rep(rnorm(n = 5, mean = 5.3, sd = 0.1), each = 15), # Bulk 0.1 pHU higher than DVS on average
                rep(rnorm(n = 5, mean = 5.2, sd = 0.05), each = 5)), # DVS between cultures more consistent
         pH = c(rnorm(n = 75, mean = pH, sd = 0.1), # Bulk have larger within culture variation
                rnorm(n = 25, mean = pH, sd = 0.05)), # DVS have smaller within culture variation
         Moi = rep(rnorm(n = 5, mean = 40, sd = 0.1), each = 20), # all cultures, regardless of system, are on average identical
         Moi = rnorm(n = 100, mean = Moi, sd = 0.05), # and with same noise
         Fat = rep(rnorm(n = 5, mean = 25, sd = 0.01), each = 20), # all cultures are on average identical
         Fat = rnorm(n = 100, mean = Fat, sd = 0.05), # and with same noise
         NFS = 100 - Moi - Fat # all must sum to 100%
  )

My first attempt has been the following:

MCMCglmm(fixed = cbind(pH, Moi, Fat, NFS) ~ trait*Culture - 1,
                 family = rep("gaussian", 4),
                 rcov = ~us(System:trait):units,
                 data = df_sim,
                 nitt = 10000,
                 thin = 10,
                 burnin = 3000,
                 verbose = FALSE)

But this does not take into account the Moi, Fat, NFS used to estimate a common composition residual. 
Is there a way to specify this in the rcov formula? Am I using the right family for the residuals, when i know the are correlated 100 = Moi + Fat + NFS? Is MCMCglmm the right tool or am I asking to much of the package?

Any help would be appreciated.

Thanks

Jannik
#
Dear Jannik,

For the compositional variables it makes sense to drop one dimension, it 
doesn't matter which. For example if you drop NFS you can derive its 
variance, and covariance with the remaining two as:

VAR(NFS) = VAR(1-Moi-Fat) = VAR(Moi)+VAR(Fat)+2COV(Moi, Fat)

COV(Moi, NFS) = COV(Moi, 1-Moi-Fat) = -[VAR(Moi)+COV(Moi, Fat)]

COV(Fat, NFS) = COV(Fat, 1-Moi-Fat) = -[VAR(Fat)+COV(Moi, Fat)]

If you want separate residual covariance matrices for each system then use:

rcov=~us(at.level(System, "Bulk"):trait):units+us(at.level(System, "DVS"):trait):units

you will also want a System*trait term in the fixed effects to allow the mean responses to differ across the systems.

As I understand your design, no culture is found in both systems and so the correlation between a culture's performance in the two cultures is not estimable. The formula

random=~us(at.level(System, "Bulk"):trait):culture+us(at.level(System, "DVS"):trait):culture

lets the between culture covariances matrices differ between the systems.

random=~us(trait):culture

forces the between culture covariances matrices to be the same in the two systems. *If* cultures appeared in both systems this formula also assumes that the performance of a culture in the two systems is expected to be the same.

HOWEVER, with only 5 cultures per system, I would say you don't have enough information to allow system specific covariance matrices. With 10 cultures in total, even estimating a single between culture covariance matrices is pushing it I think.

Cheers,

Jarrod
On 18/07/2016 18:08, Jannik Vindel?v wrote:

  
    
#
Thank you so much Jarrod.

I was not aware of the at.level() trick :-)

Adding the trait*System to the fixed effects as suggested:

fixed = trait*Culture + trait*System

Makes some "fixed effect not estimable and have been removed ?. I assume because of the Strong Correlation between System and Culture? But I think it does not matter, as I can estimate the system performance as the contrast between "Bulk Cultures" and "DVS Cultures ??

On a different note, I would like to use the posteriors as priors for the next data set, do you have any advice on how to proceed? I was thinking of using a distribution fitting routine such as fitdistr() in the MASS package to fit an inverse gamma to the rcov chains and translate the parameters into the inverse wishart?

Thank you for your help...

Jannik

Den 19. jul. 2016 kl. 07.21 skrev Jarrod Hadfield <j.hadfield at ed.ac.uk>:

Dear Jannik,

For the compositional variables it makes sense to drop one dimension, it doesn't matter which. For example if you drop NFS you can derive its variance, and covariance with the remaining two as:

VAR(NFS) = VAR(1-Moi-Fat) = VAR(Moi)+VAR(Fat)+2COV(Moi, Fat)

COV(Moi, NFS) = COV(Moi, 1-Moi-Fat) = -[VAR(Moi)+COV(Moi, Fat)]

COV(Fat, NFS) = COV(Fat, 1-Moi-Fat) = -[VAR(Fat)+COV(Moi, Fat)]

If you want separate residual covariance matrices for each system then use:

rcov=~us(at.level(System, "Bulk"):trait):units+us(at.level(System, "DVS"):trait):units

you will also want a System*trait term in the fixed effects to allow the mean responses to differ across the systems.

As I understand your design, no culture is found in both systems and so the correlation between a culture's performance in the two cultures is not estimable. The formula

random=~us(at.level(System, "Bulk"):trait):culture+us(at.level(System, "DVS"):trait):culture

lets the between culture covariances matrices differ between the systems.

random=~us(trait):culture

forces the between culture covariances matrices to be the same in the two systems. *If* cultures appeared in both systems this formula also assumes that the performance of a culture in the two systems is expected to be the same.

HOWEVER, with only 5 cultures per system, I would say you don't have enough information to allow system specific covariance matrices. With 10 cultures in total, even estimating a single between culture covariance matrices is pushing it I think.

Cheers,

Jarrod
On 18/07/2016 18:08, Jannik Vindel?v wrote:

  
    
#
Dear Jarrod,

You have shown me how to estimate the variance and covariance of NFS to the other responses.
I would also like to do a posterior predictive check of this NFS, i.e. calculate the 95% confidence and prediction credible intervals for NFS and compare this to the actual measurements (per culture).

Is it possible to calculate posterior credible intervals if NFS is not included in the model e.g. from the MCMC chains? (MCMCglmm throws an error if NFS is included in th MCMCglmm call: "ill-conditioned G/R structure ?)

I was thinking, if for each culture, I could calculate NFS = 100 - Moi - Fat at each step in the posterior prediction chains, I would arrive at the posterior predictive chain for NFS... 

Is it possible for the predict method, as it is, to return the posterior prediction chains instead of the summaries (e.g. mean, lwr, upr) for a given culture? Or is it easy to do this by ? hand"?

Thank you for your support and thank you for making and maintaining such a great package?

/Jannik  

Den 19. jul. 2016 kl. 07.21 skrev Jarrod Hadfield <j.hadfield at ed.ac.uk>:

Dear Jannik,

For the compositional variables it makes sense to drop one dimension, it doesn't matter which. For example if you drop NFS you can derive its variance, and covariance with the remaining two as:

VAR(NFS) = VAR(1-Moi-Fat) = VAR(Moi)+VAR(Fat)+2COV(Moi, Fat)

COV(Moi, NFS) = COV(Moi, 1-Moi-Fat) = -[VAR(Moi)+COV(Moi, Fat)]

COV(Fat, NFS) = COV(Fat, 1-Moi-Fat) = -[VAR(Fat)+COV(Moi, Fat)]

If you want separate residual covariance matrices for each system then use:

rcov=~us(at.level(System, "Bulk"):trait):units+us(at.level(System, "DVS"):trait):units

you will also want a System*trait term in the fixed effects to allow the mean responses to differ across the systems.

As I understand your design, no culture is found in both systems and so the correlation between a culture's performance in the two cultures is not estimable. The formula

random=~us(at.level(System, "Bulk"):trait):culture+us(at.level(System, "DVS"):trait):culture

lets the between culture covariances matrices differ between the systems.

random=~us(trait):culture

forces the between culture covariances matrices to be the same in the two systems. *If* cultures appeared in both systems this formula also assumes that the performance of a culture in the two systems is expected to be the same.

HOWEVER, with only 5 cultures per system, I would say you don't have enough information to allow system specific covariance matrices. With 10 cultures in total, even estimating a single between culture covariance matrices is pushing it I think.

Cheers,

Jarrod
On 18/07/2016 18:08, Jannik Vindel?v wrote:

  
    
#
Hi,

The "fixed effect not estimable and have been removed" warning can be 
ignored. It willonly issue this when effects are confounded, not just 
strongly correlated. It is because you have fitted the trait intercepts 
twice. trait+trait:Culture + trait:System will probably get rid of the 
warning, but the model will be the same as before.

You could do as you say regrding the priors. You could also analyse both 
data-sets together if the model is not too complicated.

Cheers,

Jarrod
On 19/07/2016 22:48, Jannik Vindel?v wrote:

  
    
#
Hi,

You will have to do this by hand. If you want the posterior predictive 
distribution for the sample covariances (i.e. the covariances you 
oberserve in the data) then I would

1/ for MCMC iteration i generate the full 3x3 covariance matrix from the 
estimated 2x2 covariance matrix, using the rules that I showed you.

2/ draw n samples from this covariance matrix using mvrnorm where n is 
your sample size.

3/ calculate the covariance of the samples using cov and store

4/ go back to 1/ and repeat for i+1

You end up with a posterior predictive distribution for the sample 
covariance matrix (although technically you would have to multiply your 
empirical covariance matrix and the result of 3 by (n+1)/n to get the 
sample covariance matrix).

Cheers,

Jarrod
On 20/07/2016 06:57, Jannik Vindel?v wrote: