An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-sig-ecology/attachments/20101105/cac92c05/attachment.pl>
Using pcnm to correct for spatial autocorrelation
2 messages · Kevin McCluney, Jari Oksanen
Kevin, I'll answer only some of your technical questions. I don't want to implicate that you should use PCNM, but I only say how to use them if you use them. For conceptual issues, you ma also check B. Gilbert & J. R. Bennett Journal of Applied Ecology, Volume 47, Issue 5, pages 1071?1082, October 2010.
On 5/11/10 22:59 PM, "Kevin McCluney" <Kevin.McCluney at colostate.edu> wrote:
I am trying to use pcnm in vegan to correct for spatial autocorrelation in analyses of influences of environmental factors on multivariate community composition, as well as univariate analyses of diversity and abundance. I have several questions. 1. PCNM requires a threshold and I am aware that the value that keeps all sites connected is most commonly used. My data has a group of 4 sites spaced <70 m apart that are over 3 km from another group of 30 sites spaced less than 105 m apart, and one more site which is 700 m from that group. It seems more reasonable to me to use a threshold of 105 m than one of >3km, especially since my study focuses on ground arthropods. This would create 3 groupings of sites. If I use the >3km threshold I get only 8 pcnm axes, whereas if I use 105 m I get 23 pcnm axes. What are the dangers of using 105 m for the threshold?
The choice of threshold is arbitrary and it will influence the results. The standard (which also is the default in vegan) is indeed to use longest possible threshold to keep the data connected. I cannot see any dangers in using any other thresholds. It is not more dangerous to use threshold of 700m than to use a threshold of 3000m. You could quite as well as what are the danger of using the default of >3km. The number of PCNM vectors has no relevance for the choice. If you start from the Euclidean distances of spatial locations on a plane (like Earth is for many practical purposes), you would get back two principal coordinates. We put there an arbitrary threshold in PCNM and these non-Euclidify the matrix. Therefore you get more than two PCNM vectors and sevaral negative eigenvalues for locations on a plane. Having a low number of PCNM vectors and not too many negative eigenvalues is a sign of more Euclidean space. You cannot have completely Euclidean (=2 dims) space for PCNM since then you just fall back to a simple linear trend surface. With PCNM you have trickier surfaces.
2. I know that the standard procedure for pcnm involves removing or ignoring axes with negative eigenvalues. When I use the pcnm function in vegan, I get more eigenvalues than I have axes. If I assume that the first values correspond to the first axes, then all the axes in my analyses have positive eigenvalues, but then the "extra" values are all negative. What are these "extra" eigenvalues?
It is a correct assumption that the eigenvalues and corresponding eigenvectors are ordered similarly. All PCNMs that you get are for those positive eigenvalues, and the first eigenvalue is for the first axis etc. You will normally get negative eigenvalues. It is not only a common procedure to ignore axes with negative eigenvalues, but it is about the only practical choice (and the only choice you have in vegan).
3. The next step in pcnm involves selecting pcnm axes with significant effects on responses. I know that it has been suggested to use forward selection routines in cca to select these axes. I'm also aware of some of the limitations of this technique and the suggestion that forward selection should have additional criteria. Namely, that forward selection should only be used if the full model with all terms is significant and should also compare the adjusted R2 with each term added to that of the original full model. If I perform an analysis with all non-negative pcnm terms and the model is not significant, does this mean there is no spatial autocorrelation and no selection procedure is needed? If it is significant, then I need adjusted R2 values to perform the forward selection, but I don't see how to get these using cca in vegan.
There is no way of doing this in vegan. You can do it for rda(), and the development version of vegan in repository http://r-forge.r-project.org/ has an automatic function ordiR2step for rda() or capscale() to do that forward selection (it is not yet in the release version, because I have expected Guillaume Blanchet's and Pierre Legendre's blessing to the function before release). The problem is that vegan does not have adjusted R2 for CCA. I have seen a Pedro Peres-Neto's paper on calculating adjusted R2 fro CCA, but the calculation is pretty tricky and slow, and we haven't implemented that in vegan. I guess you refer to Blanchet et al., Ecology 89, 2623-2632 (2008) when you write about recommended procedure: that paper only considered RDA. With CCA you must trust your own judgment when you decide how to select your PCNMs. I hope you remembered to supply weights to pcnm() function in vegan when you calculated your PCNM vectors. Non-significant PCNMs say nothing about spatial autocorrelation. They may say something about spatial structure that can be expressed with your set of spatial basis vectors.
Additionally, for my multivariate analyses, I am using adonis to perform statistical tests, not cca. Therefore, should I also be using adonis to look for significant pcnm axes? I haven't seen this done, but do not see potential drawbacks. Are there any?
I wouldn't mix adonis() and pcnm(). The PCNMs were designed to be used with RDA or other ordination techniques. adonis() works with dissimilarities. In PCNM you start with distances, then you change these to basis vectors, and for adonis() you would change those back to distances. I think this is not wise. Why not use the PCNM distances directly? One reason is that you cannot get them directly in vegan or other functions I know without editing vegan functions to return them or calculating them manually. That would be rather easy since you can just copy the commands of pcnm() functions (it seems that one line is enough). I think CCA and adonis is not a good coupling: CCA is weighted method using Chi-square metric and adonis is unweighted using Euclidean metric. Coupling RDA and adonis or capscale() and adonis (for non-Euclidean case) is more natural. Cheers, Jari Oksanen