Improves haplotype() function in pegas 0.13
Hi Jarrett, I'm Cc'ing to r-sig-genetics since we had a recent discussion on a similar topic (see below). ----- Le 14 Mai 20, ? 9:59, Jarrett Phillips phillipsjarrett1 at gmail.com a ?crit :
Hello, Emmanuel Paradis has recently updated the haplotype() function in pegas 0.13 to account for base ambiguities, gaps and Ns. Thank you Emmanuel!
Base ambiguities were already considered in previous versions of pegas, but it was not explicit (or flexible).
The argument 'strict' simply considers or ignores all gaps and ambiguities, but does this also consider/ignore Ns?
Yes. 'strict' means "strict interpretation of the characters without interpreting them as base ambiguities". For instance, consider the following 9 aligned sequences with 2 sites (without labels for simplicity): AA AR AM AW AV AH AD AN A- By default (and with the last version of pegas), haplotype() will return a single haplotype because it cannot be inferred whether any of the sequences 2-9 is different from the first one. If strict = TRUE, nine haplotypes will be returned.
The 'trailingGapsAsN' simply treats leading and trailing gaps as Ns, ignoring internal gaps. This argument is set to TRUE by default. From the above, it appears that 'strict' ignores Ns. If 'strict' is set to TRUE, does this mean that TRUE/FALSE assignment 'trailingGapsAsN' is ignored as well?
Yes. I've added a line in the help page of haplotype() to say that 'trailingGapsAsN' has no effect if 'strict = TRUE'.
The reason I ask is because I use haplotype() in one of my R packages to compute optimal sample sizes for genetic diversity assessment (HACSim). Currently in my package, R throws a warning to users if missing data or base ambiguities are present within DNA alignments. Given Emmanuel's changes, it seems the warning in my package will not be needed once I set 'strict = TRUE'. I am unsure however on how to properly set 'trailingGapsAsN' to ensure that gaps do not affect haplotype calculation if they are left in the alignment. Gaps, ambiguities and Ns will cause an overestimation of haplotypes, and therefore an inflation of standing genetic variation.
Maybe the discussion we had on r-sig-genetics could be relevant here. There doesn't seem to be an easy answer to these questions. Also, the coming version of ape will include the new function latag2n (Leading and Trailing Alignment Gaps to N) which changes sequences such as "A-C-" into "A-CN". Cheers, Emmanuel
Can someone weigh in on this? Thanks! Cheers, Jarrett [[alternative HTML version deleted]]
_______________________________________________ R-sig-phylo mailing list - R-sig-phylo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo at r-project.org/