To Philip, Carsten, Etienne, Ben and Chris I really want to thank all of you for the time and effort you put into answering my question. You guys rock!! It gives me such faith in the power of open-source communities like this, and makes me want to contribute in turn where possible! Thanks, Philip, for your insightful questions and helping me to think about the data more clearly. I was being stupid with the zeroes: yes, they do result from aggregating the data, and they do represent cases where a species did not occur in a particular sampling unit (so no cover or abundance recorded). All records of abundance for a species have matching records of cover. Since I am mainly interested in how strongly correlated the 2 measures are, I think I can happily leave out the zeroes, since I am only interested in abundance vs cover where these were recorded. You have reminded me to think carefully about what the aggregation of my data means for the analysis, though. Ben, my cover data is not in the form of point counts so that is not an option. Also, I can't use raw counts for abundance because of unequal sampling effort/area. I have decided that correlation coefficients are probably fine for my purposes. I have calculated Spearman and Kendall correlations, and used Pearson correlations and model II regression on log-transformed data (as you did, Etienne), as well as on ranked data. These all indicate a strong positive correlation, and a linear relationship with transformed data, and give a consistent picture. Beta regression looks like a really useful tool that, even if I don't use it here, I may well use for some other aspects of my project. Thanks Ben for pointing me to it. Carsten: did you imply that beta regression is necesarily model I regression (no variance in predictor variable)?? I'd be interested to hear anyone's thoughts on how much of a limitation this is for situations where both y and x are random variables. Is it the same as for OLS regression, where OLS is acceptable if the error variance in x is less than a third of that in y? Thanks again! Cheers Karen
On Wed 27Oct10, Philip Dixon wrote:
Karen, I suggest you step back and ask two questions: 1) what are you trying to do? (i.e. what is the real goal?) 2) what do you do it to? (i.e. what's the appropriate data?) Do you want to construct a model or estimate correlations? Your detailed questions suggest that your real interest is the correlations. If so, you don't need an explicit model. Just estimate the correlation. You get to choose how to define correlation. The four most common choices are Pearson correlation on the original scale, Pearson correlation on some transformed scale, Spearman (rank) correlation, or Kendall's (tau) correlation. I presume the 0's arise from species that are absent from a site, i.e. a (0,0) pair of (abundance, cover). Is it appropriate to include these? You could define correlation in three ways: 1) conditional on a species present at a site. That eliminates all the (0,0). 2) conditional on a species present in the regional species pool. This MAY be the same as conditional on a species present in your data set. You clearly have the second. The BIG issue is whether 'in your data' is an adequate representation of 'in the regional species pool'. 3) conditional on all extant species in your taxonomic group. That adds additional (0,0) pairs for all species present in other regions. If you really want to model the relationship, this issue is still important. Best wishes, Philip Dixon
_______________________________________________ R-sig-ecology mailing list R-sig-ecology at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.