Hello,
On 01/06/2013, at 02:10 AM, ckellogg wrote:
Hello, I am using the cca function in Vegan to examine the relationship between microbial community structure and a (large) suite of environmental variables. My constraining/environmental data matrix as a lot of holes in it so I have been exploring using the na.action argument. This is the command I am currently using: toolik250.cca<-cca(toolikotus250.ra~julianday+logwindspd_max_1dayprior+lograin_3dayprior+sqrtwindspd_1dayprior+windspd_3dayprior+days_since_thaw+days_since_iceout+days_btw_iceoutandthaw+toolikepitemp_24h+logtemp+conductivity+pH, toolikenv.s, na.action=na.omit) The CCA seems to run just fine, but when I attempt to do the posthoc tests such as anova.cca (anova(toolik250.cca,by='terms',perm=999), I get an error message: "Error in anova.ccabyterm(object, step = step, ...) : number of rows has changed: remove missing values?" What exactly is occurring here to cause this error - I suspect it must be related to the fact that the environmental data matrix has a lot of missing data. I don't quite understand why it states that the number of rows has changed...changed from what?
The number of rows has changed from term to term. That is, you have different numbers of missing values in each term (= explanatory variable), and when rows with missing values are removed for the current model, the accepted observations change from term to term. I admit the error message is not the most obvious one. I must see where it comes from, and how to make it more informative. However, it does give a hint to "remove missing values", doesn't it? If you want to have a term-wise test with missing values in terms, you must refit your model for complete.cases. Use argument 'subset' to select a subset with no missing values. Currently I don't know any nice short cut to do this with the current mode, but the following may work (untested), although it is not nice: keep <- rep(TRUE, nrow(tooliken.s) keep[toolik250.cca$na.action] <- FALSE m2 <- update(toolik250.cca, subset = keep) anova(m2, by="terms", perm=999)
Is there any way to get around having missing data when running the anovas as you can when running the CCA itself? One other question I have is when I try and run the CCA on all the data in my environmental data matrix (toolikenv.s), not just a subset of variables as I do above, using this command: toolik250.cca <-cca(toolikotus250.ra~., toolikenv.s, na.action=na.omit) I get the following error message. "Error in svd(Xbar, nu = 0, nv = 0) : a dimension is zero" What might be causing this error message to be thrown?
What does sum(complete.cases(toolikenv.s)) give as a result? Does it give 0? I suspect you have so many holes that nothing is left when you remove rows with any missing values. The message is about an attempt to analyse zero-dimensional matrix.
Thank you so much for your help. Maybe I will just have to filter out the samples with missing environmental data (or filter out some of the variables themselves if they have too much missing data), but I was just hoping to avoid having to do this.
The functions can handle missing values, but they handle them by removing the observation. Do you want to lose a huge number of rows? We won't invent values to replace missing data in cca(). Some people have suggested ways to do that, and that is not difficult: just search for imputation in R (for instance, package mice). However, the real problem is how to compare and summarize the multivariate results after imputation. Further, if you have a lot of missing values, nothing may be very reliable. It could be possible to collect together and combine permutation test results after multiple imputation, but better consult a statistician before trying to do this. Cheers, Jari Oksanen
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland