Skip to content
Prev 4453 / 7420 Next

cross validation in CoCA and CCA

Jesse,

I do not know what you mean with CV in this context, but basic cross validation can be done with vegan functions cca(), rda() and capscale(). These functions have predict method that accepts 'newdata', and using new data allows cross validation. They also have a calibrate() function that can directly estimate the values of predictor values from community data, and this also has 'newdata'. So you can build ("train") a model and then use independent 'newdata' to use ("test") the model. However, we do not have any generic crossvalidate(object, data, k, ?) function for canned cross validation process, but you have to do this by hand. Neither do we have any functions for multistep CV or structured CV where some of the external variables were known and others predicted/calibrated. However, basic facilities for hand crafting such models are provided. Simple things, like k-fold cross validation are really trivial to build with ordination (but if you build in the uncertainty of model building in the process --- like you should --- you must be very careful in collecting the data as the variables can change in cross validation).

Here one 5-fold CV cycle with rda:

library(vegan)
data(mite, mite.env)
## 5-fold CV
k <- rep(1:5, len=nrow(mite))
## x is matrix to collect predictions for two vars
x <- matrix(NA, nrow=nrow(mite), ncol=2)
colnames(x) <- c("SubsDens", "WatrCont")
## shuffle for each CV
k <- sample(k)
## the next line could be broken into several commands within {}
for(i in 1:5) x[k==i,] <- calibrate(rda(decostand(mite, "hell") ~ SubsDens+WatrCont, mite.env, subset = k != i), newdata = decostand(mite[k==i,], "hell"))

Easy, but not very good a prediction (cca would be marginally better, like it usually is).

Cheers, Jari Oksanen
On 29/03/2014, at 04:44 AM, Gavin Simpson wrote: