Skip to content

Pegas vs Arlequin, and negative AMOVA values

9 messages · Marc Domènech Andreu, Emmanuel Paradis

#
Thanks for your answer. For computing the distance matrix I am using the
dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the equation or
formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc
On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar <zkamvar at gmail.com> wrote:

            

  
    
#
Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu mdomenan at gmail.com a ?crit :
#
Hi,
Yes I tried it. Most of the results are very similar but some change. Do
you know the difference between those two methods?
Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

  
    
#
----- Le 5 Mai 20, ? 0:36, Marc Dom?nech Andreu <mdomenan at gmail.com> a ?crit :
model = "N" is the Hamming distance (absolute number of differences between two sequences) 

model = "raw" is the Hamming distance divided by the sequence length (aka uncorrected distance, or p-distance) 

About the use of 'pairwise.deletion' in dist.dna(): in fact there is no simple/unique solution for this option. It depends very much on the data at hand and the distribution of "missing data", especially gaps. You need to check their distribution, for example with image(x) of image(x, what = "-") where 'x' is the DNA data. You may get nonsensical results leaving the default pairwise.deletion = FALSE if there are long gaps. Even a small number of gaps may be problematic if there are in a column (site) which is polymorphic. 

Best, 

Emmanuel

        

            

        

            

  
  
1 day later
#
Oh ok thanks. Well my sequences are COI sequences, a mitochondrial protein
coding gene so there are no gaps in the middle. However, there are in the
extremes, some sequences being longer than others. So I will set
pairwise.deletion=TRUE as you suggest.
Thanks,
Marc

On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

  
    
#
To see if this has an impact, you can do this: 

d0 <- dist.dna(x, "N") 
d1 <- dist.dna(x, "N", pairwise.deletion = TRUE) 
plot(d0, d1) 
abline(0, 1, lty = 3) # draw x = y line 

Best, 

Emmanuel 

----- Le 6 Mai 20, ? 21:35, Marc Dom?nech Andreu <mdomenan at gmail.com> a ?crit :

        

            

        

            

  
  
#
Hello,
Thanks for the tip. I tried that in several species and it looks like it
does have an effect. The values with pairwise.deletion=TRUE are most of the
times a bit higher than with pairwise.deletion=FALSE, and sometimes equal.
Would you suggest using pairwise.deletion=TRUE then?
I also tried to find if the very negative values in AMOVA results (like
-0,25) were due to very low values of genetic distances, but there doesn't
seem to be a relation.
Thanks for your help,
Marc

On Thu, May 7, 2020 at 5:53 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

  
    
#
I think you should go deeper in your data exploration. Here are two other other diagnostics you can do: 

del.colgapsonly(x, freq.only = TRUE) 
del.rowgapsonly(x, freq.only = TRUE) 

These will give you the number of gaps for each column and row, respectively. 

Imagine the following situation: x is an alignment with 100 sequences and 1000 sites, all sequences are complete with no ambiguity, except one which has 500 bp from the 5'-end, so it has a trail of 500 "-" on the 3'-end to be aligned with the 99 others. Doing base.freq(x, all = TRUE) will show that there are 0.5% of gaps so you may think it's OK. But that's wrong. Doing dist.dna(x) will throw 50% of the data (even if you add more complete sequences to the alignment!). If the rates of evolution are different in the two halves of the sequence, then comparing the results from dist.dna(x) and dist.dna(x, pairwise.deletion = TRUE) is likely to be very tricky. 

That's where the two above diagnostics may help you: you may find better to remove some sequences if they have a lot of gaps and they create more trouble than anything else. 

HTH 

Best, 

Emmanuel 

----- Le 8 Mai 20, ? 4:07, Marc Dom?nech Andreu <mdomenan at gmail.com> a ?crit :

        

            

        

            

  
  
4 days later
#
Thanks for the answer, the example really helped me understand it much
better. After some exploration of my data I think the case is more similar
to the example, with few but long gaps at the ends rather than many short
ones. So I think I will remove those and see if the estimations improve.
Thanks again,
Marc

On Fri, May 8, 2020 at 6:08 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote: