Skip to content

Including binary and Gaussian responses in one multivariate mixed model

4 messages · Yi-Hsiu Chen, Pierre de Villemereuil

#
Dear list members,

I am trying to fit a multivariate mixed model that includes a binary and a Gaussian response variables with MCMCglmm and met some problems I couldn?t figure out. As I couldn't find much discussion on these topics, I was wondering if I could have some help from the list.

This question is also posted on Cross Validated (sorry for cross-posting), so please visit this link for a more readable version:  https://stats.stackexchange.com/questions/472662/including-both-binary-and-gaussian-responses-in-one-multivariate-mixed-model-in <https://stats.stackexchange.com/questions/472662/including-both-binary-and-gaussian-responses-in-one-multivariate-mixed-model-in> . 

Here is my question: 

I would like to fit a multivariate mixed model to a dataset that contains: a binary Direction variable showing the direction the individuals moved toward, a continuous PropRangeChange_s variable showing the scaled proportion of the occupied range increased to old occupied range (OldRange), a continuous Dist variable showing the distance between new range centre and a key location, a categorical Type stating type of the individuals; categorical variables SiteID and Lat_band showing where the observations were recorded, in which SiteID is nested within Lat_band (more than one SiteID in one Lat_band level). The example data can be downloaded at this link: https://drive.google.com/drive/folders/16sjYpBSBnu3XCNcJDPpXcJQuo4UDj8cc?usp=sharing

What I am interested in is: how Direction and PropRangeChange_s change in response to OldRange, Dist, and Type ? I am also interested in whether the probability of moving toward certain direction is related to PropRangeChange_s.

I therefore tried fitting a multivariate mixed model with MCMCglmm with below code (also included in the shared folder):

library(MCMCglmm)
library(coda) 
# Read in data required
test.data<-read.csv(file="D:/Data/ExampleData.csv", header=TRUE, sep=",")

# iteration setting
niterations <- 50000 
nburnin <- 10000
nthin <- 50 

# set priors
prior.op1 <- list(R = list(R1 = list(V = diag(2), nu = 0.002)), 
                  G = list(G1 = list(V = diag(2), nu = 0.01, alpha.mu = rep(0, 2), alpha.V = diag(2) * 1000),
                           G2 = list(V = diag(2), nu = 0.01, alpha.mu = rep(0, 2), alpha.V = diag(2) * 1000)))

# Model test 1 - include Type as fixed effect
set.seed(1)
test.mmm.c1 <- MCMCglmm(cbind(factor(Direction), PropRangeChange_s) ~ 0 + trait + trait:(scale(OldRange)*scale(Dist)*Type),
                            random = ~us(trait):Lat_band + us(trait):Lat_band:SiteID , rcov = ~us(trait):units,
                            data = test.data, family = c("categorical", "gaussian"), prior = prior.op1, nitt = niterations, burnin = nburnin, thin = nthin, verbose = T, pr = T,pl=T) 

set.seed(2)
test.mmm.c2 <- MCMCglmm(cbind(factor(Direction), PropRangeChange_s) ~ 0 + trait + trait:(scale(OldRange)*scale(Dist)*Type),
                        random = ~us(trait):Lat_band + us(trait):Lat_band:SiteID , rcov = ~us(trait):units,
                        data = test.data, family = c("categorical", "gaussian"), prior = prior.op1, nitt = niterations, burnin = nburnin, thin = nthin, verbose = T, pr = T,pl=T) 

#extract fixed effects
test.mmm.FIX <- mcmc.list(test.mmm.c1$Sol,test.mmm.c2$Sol)

#extract random effects
test.mmm.VCV <- mcmc.list(test.mmm.c1$VCV,test.mmm.c2$VCV) 

# Check convergence
max(gelman.diag(test.mmm.FIX[, 1:48])$psrf)        #[Return] 1.027717

max(gelman.diag(test.mmm.VCV)$psrf)                # Error shown

# Check estimate for fixed effects
summary(test.mmm.FIX[, 1:48])                      # Very large estimate

# Check the corelations between two responses 
test.mmm.Lat_band<-mMULTIVCV[, 1:4]
test.mmm.SiteID<-mMULTIVCV[, 5:8]
test.mmm.units<-mMULTIVCV[, 9:12]

summary(test.mmm.Lat_band)                         # Very large estimate 
summary(test.mmm.units)  

test.mmm.Lat_band.summary<-summary(mcmc.list(posterior.cor(test.mmm.c1$VCV[, 1:4]), posterior.cor(test.mmm.c2$VCV[, 1:4])))
# Q: weird quantiles that ranges from -1 to 1, which is the entire parameter range for correlation
test.mmm.units.summary<-summary(mcmc.list(posterior.cor(test.mmm.c1$VCV[, 9:12]), posterior.cor(test.mmm.c2$VCV[, 9:12])))

The script run smoothly with no warnings showing up. However, when I checked the estimates, I found all the estimates for Direction-related coefficients are very large (such as, more than 10^4), which doesn't look correct. I therefore was wondering if I can have some help on following questions:

	? Is it statistically acceptable to include both binary and continuous variables as response variables?

	? When I check the fixed effects, for Direction-related coefficients, it only states "traitDirection.2" in the output, how could I know which Direction level it is for?

	? As the chains looks converging well for the fixed effects (Gelman?Rubin diagnostic = 1.027), what might be the causes for the error message ? Error in chol.default(W) : the leading minor of order 3 is not positive definite ? shown when checking the convergence for random effects and the seemly unlikely overly large coefficient estimates?

	? Is it possible to obtain marginal R^2 and conditional R^2 with MCMCglmm?

	? Can I extract the latent variable, the probability of moving upward, with function: predict(test.mmm.c1, marginal = NULL, posterior = "mean", type = "response") 


Sorry for the long post. This is my first multivariate mixed model as well as my first MCMCglmm model, hope I didn?t make a fundamental mistake. Any suggestions would be appreciated.


Best wishes
Yi-Hsiu
#
Hi,

First thing: I don't think your prior is right. You need to place the "categorical" trait in second (you might want to use "threshold" family instead by the way, unless you insist on having a logit link function) and then fix its residual variance by using "fix = 2". Also, even if you do this, I believe your current prior will impose a strong prior into covariances near 0. You can play a bit with the priors using the following script:
https://github.com/devillemereuil/prior-MCMCglmm/

As for the other questions
1. Yes.
2. "2" should relate to a level in your factor when you call "factor(Direction)", MCMCglmm is simply following R formula rules here I believe. You can name your levels using the "levels" argument of the factor() function.
3. I believe the source of these errors is the fact that the residual variance needs to be fixed for the binary trait as I mentioned above.
4. Yes. I don't know of any package performing this out-of-the-box but I never looked. Computing the "fixed-effects variance" as in Nakagawa & Schielzeth can be performed like this:

compute_varpred <- function(beta, design_matrix) {
  var(as.vector(design_matrix %*% beta)) 
}
vf <- apply(test.mmm$Sol, 1, compute_varpred, design_matrix = test.mmm$X)

5. I believe the latent variables are stored directly in test.mmm$Liab if "pl = TRUE" was set.

Hope this helps,
Pierre

Le mercredi 17 juin 2020, 19:12:08 CEST Yi-Hsiu Chen a ?crit :
#
Dear Pierre,

Thank you very much for the reply. The suggestions are helpful, these unlikely, very large estimates vanish after the categorical trait was placed in second and the prior for the residual variance was fixed. Unfortunately the error messages ? Error in cholesterols.default (W): the leading minor of order 3 is not positive definite? were still shown when I checked the convergence for random effects with german.diag(), I wonder if it indicates that more iterations are required.     

Before I proceed for a longer run, I was wondering would you (or any other list member) mind further explaining for me why my priors are strong?
#
Hi,

Yes, sorry, I wasn't thinking straight when I mentioned the strong prior on the covariance, I was thinking about something else.

I have no idea what is generating this error in the gelman.diag() function however. Sorry.

Cheers,
Pierre.

Le jeudi 18 juin 2020, 13:11:18 CEST Yi-Hsiu Chen a ?crit :