PEGAS: assignment to haplotype when missing information
(Cc. to r-sig-genetics) I'm going to modify the algorithm in haplotype.DNAbin() as follows: 1. Find the sequences that are exactly identical, so that, eg, the 3 sequences: A- AR AA would be treated as different at this step. 2. Substitute the leading and trailing "-" for N (thus keeping the alignment gaps only in the 'middle' of sequences). 3. Compute the Hamming distances among haplotypes using 5 states (A, G, C, T, and "-") and ambiguities so that, eg, d(A,R)=0, d(G,R)=0, d(A,G)=1, and so on. 4. If all these distances > 0 then exit. 5. Examine each haplotype and its distances to the others: 5a. If there is only one distance = 0, then pool them in a single haplotype and give a warning. 5b. If two or more distances are equal to zero, then keep them separate and give a message (possibly attached to the returned object). There could be options to control this algorithm: - exit after step 1. - ignore step 2. At step 5, it seems to make sense to start with the "shortest" sequences and pool them with the "longer" ones, ie, "A-" would be pooled with "AA". Comments and suggestions are welcome. Best, Emmanuel ----- Le 26 F?v 20, ? 16:35, Emmanuel Paradis emmanuel.paradis at ird.fr a ?crit :
Hi Hirra, The assignment is not random, it follows the order of the sequences in the data: - Seqs. A and B are compared and found to be identical so they are both assigned to haplotype I. - Seq. C is compared to haplotype I (effectively seq. A) and found to be different so it is assigned to haplotype II. - Seq. D is compared to haplotype I and found to be similar and so assigned to haplotype I. If you reorder your data and put Seq. C first, you'd obtain that C and D are assigned to the same haplotype. The same issue occurs with ambiguous bases. These situations certainly deserve to have an option to haplotype() to handle them properly. Best, Emmanuel ----- Le 25 F?v 20, ? 19:31, Hirra Farooq hirra.farooq at postgrad.manchester.ac.uk a ?crit :
Hello, I am using the pegas R package to assign sequences into haplotypes. I recently tried out a test examples with 4 sequences. 2 of the sequences (A and B) are identical, 1 sequence (Seq C) differs from these at only one position (pos 648). The 4th sequence (Seq D) is identical to all but shorter so has no residues at the determinant position 648. (See image below) So correctly pegas assigns A and B to haplotype I and C to haplotype II. However it also assigns D to I, despite there being no information at which residue is at the determinant position. I just wanted to know in such cases as D when there is missing information, does pegas just randomly assign to a haplotype? aln (633..663) names [A] CCCGATTTTATATCAACATTTATTT------ [D] CCCGATTTT---------------------- [B] CCCGATTTTATATCAACATTTATTT------ [C] CCCGATTTTATATCACCATTTATTTTGATTT Thanks and best wishes, Hirra University of Manchester Student. [[alternative HTML version deleted]]
_______________________________________________ R-sig-phylo mailing list - R-sig-phylo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo at r-project.org/
_______________________________________________ R-sig-phylo mailing list - R-sig-phylo at r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo at r-project.org/