How to deal with rare species post-rarefaction - subsampling or not?
Dear list, I have six fields with 60 samples, and i want to analyze the microbial diversity based on high throughput sequencing. The read range between samples was about one magnitude (i. e the samples with the highest reads had about tenfold more than those with the least read numbers). i have done rarefaction based on Hill numbers (Chao and Joust, 2014) and i found out that i reached full coverage with n=1, and n=2, respectively, and i was reaching plateau for n=0 for all samples. My lowest sample completeness value was 0.995 for a sample with about 30,000 observations. See here an example of one the six fields (from top to bottom: Rarefaction based on species richness, linearized simpson, linearized shannon; the dots represent the end of each curve, after which the curve was extrapolated according to the aforementioned paper). http://s21.postimg.org/fm0nhp4w7/image.png I did species richness boxplots based on uncorrected species richness and on corrected values (for which i used the "double reference"-approach (see the paper)), and they were substantially different, with some of the high read samples losing about 20% of their observed species richness. Now, on to the question(s): - One of my wishes is to identify shared species and core species sets in the entirety of the six fields or subsets. I would like to use my entire dataset without subsampling, since i have such a high sample coverage, but this obviously has impact on the interpretation of the data. However, if i subsample, dont i have to do it in many permutations? And wouldnt subsampling also have severe impact on the interpretatory power of my analysis as well? - I have yet to find a nice subsampling routine in R for community data, that enables me to do further calculations on the entire set of n subsets, possibly in lists. - As a bonus, if i want to use Chao-1 as an index of expected species richness, do i do it on subsampled datasets or on samples as they are? I would rather do it on raw data (because this is what i have measured), but i fear for sample comparability. I think i have shifted the problem of subsampling now to the area of rare and very rare biospheres. Sorry for bothering, many thanks for reading it.
Tim Richter-Heitmann (M.Sc.) PhD Candidate International Max-Planck Research School for Marine Microbiology University of Bremen Microbial Ecophysiology Group (AG Friedrich) FB02 - Biologie/Chemie Leobener Stra?e (NW2 A2130) D-28359 Bremen Tel.: 0049(0)421 218-63062 Fax: 0049(0)421 218-63069