anova.cca question / missing data in constraining matrix - R-SIG-ecology

Jari Oksanen · 2013-06-01T02:20:50Z

Hello, On 01/06/2013, at 02:10 AM, ckellogg wrote: > Hello, > I am using the cca function in Vegan to examine the relationship between > microbial community structure and a (large) suite of environmental > variables. My constraining/environmental data matrix as a lot of holes in > it so I have been exploring using the na.action argument. > This is the command I am currently using: > toolik250.cca<-cca(toolikotus250.ra~julianday+logwindspd_max_1dayprior+lograin_3dayprior+sqrtwindspd_1dayprior

Jari Oksanen

Fri, May 31, 2013 7:20 PM #

Hello,

On 01/06/2013, at 02:10 AM, ckellogg wrote:

The number of rows has changed from term to term. That is, you have different numbers of missing values in each term (= explanatory variable), and when rows with missing values are removed for the current model, the accepted observations change from term to term. I admit the error message is not the most obvious one. I must see where it comes from, and how to make it more informative. However, it does give a hint to "remove missing values", doesn't it?

If you want to have a term-wise test with missing values in terms, you must refit your model for complete.cases.  Use argument 'subset' to select a subset with no missing values. Currently I don't know any nice short cut to do this with the current mode, but the following may work (untested), although it is not nice:

keep <- rep(TRUE, nrow(tooliken.s)
keep[toolik250.cca$na.action] <- FALSE
m2 <- update(toolik250.cca, subset = keep)
anova(m2, by="terms", perm=999)

What does sum(complete.cases(toolikenv.s)) give as a result? Does it give 0?

I suspect you have so many holes that nothing is left when you remove rows with any missing values. The message is about an attempt to analyse zero-dimensional matrix.

The functions can handle missing values, but they handle them by removing the observation. Do you want to lose a huge number of rows? We won't invent values to replace missing data in cca(). Some people have suggested ways to do that, and that is not difficult: just search for imputation in R (for instance, package mice). However, the real problem is how to compare and summarize the multivariate results after imputation. Further, if you have a lot of missing values, nothing may be very reliable. It could be possible to collect together and combine permutation test results after multiple imputation, but better consult a statistician before trying to do this.

Cheers, Jari Oksanen

Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland

Jari Oksanen

Fri, May 31, 2013 10:18 PM #

On 01/06/2013, at 05:20 AM, Jari Oksanen wrote:

Actually, there is a bit easier way of doing this, because 'subset' can also be a vector of indices, and negative indices acn be used to remove observations. If 'toolik250.cca' is a result object with missing observations, then

m2 <- update(toolik250.cca, subset = -toolik250.cca$na.action)

will remove items listed as removed in 'na.action' (NB. the minus sign in 'subset'). The update()d model will be equal to the original model, but missing data removed.

Cheers, Jari Oksanen

Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland
jari.oksanen at oulu.fi, Ph. +358 400 408593, http://cc.oulu.fi/~jarioksa