Pegas vs Arlequin, and negative AMOVA values

9 messages · Marc Domènech Andreu, Emmanuel Paradis

Original

1

9

Marc Domènech Andreu

Mon, May 4, 2020 2:44 AM #

Thanks for your answer. For computing the distance matrix I am using the
dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the equation or
formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar <zkamvar at gmail.com> wrote:

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

*Marc Dom?nech Andreu*
PhD student at University of Barcelona.
Departament de Biologia Evolutiva, Ecologia i Ci?ncies Ambientals.

	[[alternative HTML version deleted]]

Emmanuel Paradis

Mon, May 4, 2020 3:19 AM #

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu mdomenan at gmail.com a ?crit :

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

Marc Domènech Andreu

Mon, May 4, 2020 10:36 AM #

Hi,
Yes I tried it. Most of the results are very similar but some change. Do
you know the difference between those two methods?
Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu mdomenan at gmail.com a
?crit :

Thanks for your answer. For computing the distance matrix I am using the
dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the equation

or

formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar <zkamvar at gmail.com>

wrote:

This highly depends on the distance function you are using for pegas:

1. How does it treat missing data? I believe Arlequin treats missing
data by dropping them from the denominator.

2. If you have a diploid species, does it calculate distance for
haplotypes?

Both of these can affect the resulting Phi values. You might also try
poppr.amova() with the method = "pegas" function to automate the

process.

Best,

Zhian

On 4/29/20 3:04 AM, Marc Dom?nech Andreu wrote:

Hello everyone,
I would like to ask for help with two questions regarding AMOVA and

the

Pegas package.
1. Do you know which is the formula or equation that Pegas and

Arlequin

use

for performing AMOVA? I only get to obtain almost identical results

when

set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in

Arlequin.

2. I'm doing the analyses for several species, and for some of them I
obtained negative AMOVA results. I know slightly negative results are

not

uncommon and as far as I know they should be treated as 0, but in some
cases they are very negative, such as -25%. Why can this be? Maybe

because

I have too few sequences for those species?
Thanks in advance,
Marc

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

*Marc Dom?nech Andreu*
PhD student at University of Barcelona.
Departament de Biologia Evolutiva, Ecologia i Ci?ncies Ambientals.

	[[alternative HTML version deleted]]

Emmanuel Paradis

Mon, May 4, 2020 8:03 PM #

----- Le 5 Mai 20, ? 0:36, Marc Dom?nech Andreu <mdomenan at gmail.com> a ?crit :

model = "N" is the Hamming distance (absolute number of differences between two sequences) 

model = "raw" is the Hamming distance divided by the sequence length (aka uncorrected distance, or p-distance) 

About the use of 'pairwise.deletion' in dist.dna(): in fact there is no simple/unique solution for this option. It depends very much on the data at hand and the distribution of "missing data", especially gaps. You need to check their distribution, for example with image(x) of image(x, what = "-") where 'x' is the DNA data. You may get nonsensical results leaving the default pairwise.deletion = FALSE if there are long gaps. Even a small number of gaps may be problematic if there are in a column (site) which is polymorphic. 

Best, 

Emmanuel

Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis < [
mailto:emmanuel.paradis at ird.fr | emmanuel.paradis at ird.fr ] > wrote:

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu [ mailto:mdomenan at gmail.com |
mdomenan at gmail.com ] a ?crit :

Thanks for your answer. For computing the distance matrix I am using the
dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the equation or
formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar < [ mailto:zkamvar at gmail.com |
zkamvar at gmail.com ] > wrote:

This highly depends on the distance function you are using for pegas:

1. How does it treat missing data? I believe Arlequin treats missing
data by dropping them from the denominator.

2. If you have a diploid species, does it calculate distance for
haplotypes?

Both of these can affect the resulting Phi values. You might also try
poppr.amova() with the method = "pegas" function to automate the process.

Best,

Zhian

On 4/29/20 3:04 AM, Marc Dom?nech Andreu wrote:

Hello everyone,
I would like to ask for help with two questions regarding AMOVA and the
Pegas package.
1. Do you know which is the formula or equation that Pegas and Arlequin

use

for performing AMOVA? I only get to obtain almost identical results when

set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in Arlequin.
2. I'm doing the analyses for several species, and for some of them I
obtained negative AMOVA results. I know slightly negative results are not
uncommon and as far as I know they should be treated as 0, but in some
cases they are very negative, such as -25%. Why can this be? Maybe

because

I have too few sequences for those species?
Thanks in advance,
Marc

_______________________________________________
R-sig-genetics mailing list
[ mailto:R-sig-genetics at r-project.org | R-sig-genetics at r-project.org ]
[ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

_______________________________________________
R-sig-genetics mailing list
[ mailto:R-sig-genetics at r-project.org | R-sig-genetics at r-project.org ]
[ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

1 day later

Marc Domènech Andreu

Wed, May 6, 2020 7:35 AM #

Oh ok thanks. Well my sequences are COI sequences, a mitochondrial protein
coding gene so there are no gaps in the middle. However, there are in the
extremes, some sequences being longer than others. So I will set
pairwise.deletion=TRUE as you suggest.
Thanks,
Marc

On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

----- Le 5 Mai 20, ? 0:36, Marc Dom?nech Andreu <mdomenan at gmail.com> a
?crit :

Hi,
Yes I tried it. Most of the results are very similar but some change. Do
you know the difference between those two methods?


model = "N" is the Hamming distance (absolute number of differences
between two sequences)

model = "raw" is the Hamming distance divided by the sequence length (aka
uncorrected distance, or p-distance)

About the use of 'pairwise.deletion' in dist.dna(): in fact there is no
simple/unique solution for this option. It depends very much on the data at
hand and the distribution of "missing data", especially gaps. You need to
check their distribution, for example with image(x) of image(x, what = "-")
where 'x' is the DNA data. You may get nonsensical results leaving the
default pairwise.deletion = FALSE if there are long gaps. Even a small
number of gaps may be problematic if there are in a column (site) which is
polymorphic.

Best,

Emmanuel

Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu mdomenan at gmail.com a
?crit :

Thanks for your answer. For computing the distance matrix I am using the
dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the

equation or

formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar <zkamvar at gmail.com>

wrote:

This highly depends on the distance function you are using for pegas:

1. How does it treat missing data? I believe Arlequin treats missing
data by dropping them from the denominator.

2. If you have a diploid species, does it calculate distance for
haplotypes?

Both of these can affect the resulting Phi values. You might also try
poppr.amova() with the method = "pegas" function to automate the

process.

Best,

Zhian

On 4/29/20 3:04 AM, Marc Dom?nech Andreu wrote:

Hello everyone,
I would like to ask for help with two questions regarding AMOVA and

the

Pegas package.
1. Do you know which is the formula or equation that Pegas and

Arlequin

use

for performing AMOVA? I only get to obtain almost identical results

when

set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in

Arlequin.

2. I'm doing the analyses for several species, and for some of them I
obtained negative AMOVA results. I know slightly negative results

are not

uncommon and as far as I know they should be treated as 0, but in

some

cases they are very negative, such as -25%. Why can this be? Maybe

because

I have too few sequences for those species?
Thanks in advance,
Marc

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

*Marc Dom?nech Andreu*
PhD student at University of Barcelona.
Departament de Biologia Evolutiva, Ecologia i Ci?ncies Ambientals.

	[[alternative HTML version deleted]]

Emmanuel Paradis

Wed, May 6, 2020 8:53 PM #

To see if this has an impact, you can do this: 

d0 <- dist.dna(x, "N") 
d1 <- dist.dna(x, "N", pairwise.deletion = TRUE) 
plot(d0, d1) 
abline(0, 1, lty = 3) # draw x = y line 

Best, 

Emmanuel 

----- Le 6 Mai 20, ? 21:35, Marc Dom?nech Andreu <mdomenan at gmail.com> a ?crit :

Oh ok thanks. Well my sequences are COI sequences, a mitochondrial protein
coding gene so there are no gaps in the middle. However, there are in the
extremes, some sequences being longer than others. So I will set
pairwise.deletion=TRUE as you suggest.
Thanks,
Marc

On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis < [
mailto:emmanuel.paradis at ird.fr | emmanuel.paradis at ird.fr ] > wrote:

----- Le 5 Mai 20, ? 0:36, Marc Dom?nech Andreu < [ mailto:mdomenan at gmail.com |
mdomenan at gmail.com ] > a ?crit :

Hi,
Yes I tried it. Most of the results are very similar but some change. Do you
know the difference between those two methods?

model = "N" is the Hamming distance (absolute number of differences between two
sequences)

model = "raw" is the Hamming distance divided by the sequence length (aka
uncorrected distance, or p-distance)

About the use of 'pairwise.deletion' in dist.dna(): in fact there is no
simple/unique solution for this option. It depends very much on the data at
hand and the distribution of "missing data", especially gaps. You need to check
their distribution, for example with image(x) of image(x, what = "-") where 'x'
is the DNA data. You may get nonsensical results leaving the default
pairwise.deletion = FALSE if there are long gaps. Even a small number of gaps
may be problematic if there are in a column (site) which is polymorphic.

Best,

Emmanuel

Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis < [
mailto:emmanuel.paradis at ird.fr | emmanuel.paradis at ird.fr ] > wrote:

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu [ mailto:mdomenan at gmail.com |
mdomenan at gmail.com ] a ?crit :

Thanks for your answer. For computing the distance matrix I am using the
dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the equation or
formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar < [ mailto:zkamvar at gmail.com |
zkamvar at gmail.com ] > wrote:

This highly depends on the distance function you are using for pegas:

1. How does it treat missing data? I believe Arlequin treats missing
data by dropping them from the denominator.

2. If you have a diploid species, does it calculate distance for
haplotypes?

Both of these can affect the resulting Phi values. You might also try
poppr.amova() with the method = "pegas" function to automate the process.

Best,

Zhian

On 4/29/20 3:04 AM, Marc Dom?nech Andreu wrote:

Hello everyone,
I would like to ask for help with two questions regarding AMOVA and the
Pegas package.
1. Do you know which is the formula or equation that Pegas and Arlequin

use

for performing AMOVA? I only get to obtain almost identical results when

set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in Arlequin.
2. I'm doing the analyses for several species, and for some of them I
obtained negative AMOVA results. I know slightly negative results are not
uncommon and as far as I know they should be treated as 0, but in some
cases they are very negative, such as -25%. Why can this be? Maybe

because

I have too few sequences for those species?
Thanks in advance,
Marc

_______________________________________________
R-sig-genetics mailing list
[ mailto:R-sig-genetics at r-project.org | R-sig-genetics at r-project.org ]
[ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

_______________________________________________
R-sig-genetics mailing list
[ mailto:R-sig-genetics at r-project.org | R-sig-genetics at r-project.org ]
[ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

Marc Domènech Andreu

Thu, May 7, 2020 2:07 PM #

Hello,
Thanks for the tip. I tried that in several species and it looks like it
does have an effect. The values with pairwise.deletion=TRUE are most of the
times a bit higher than with pairwise.deletion=FALSE, and sometimes equal.
Would you suggest using pairwise.deletion=TRUE then?
I also tried to find if the very negative values in AMOVA results (like
-0,25) were due to very low values of genetic distances, but there doesn't
seem to be a relation.
Thanks for your help,
Marc

On Thu, May 7, 2020 at 5:53 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

To see if this has an impact, you can do this:

d0 <- dist.dna(x, "N")
d1 <- dist.dna(x, "N", pairwise.deletion = TRUE)
plot(d0, d1)
abline(0, 1, lty = 3) # draw x = y line

Best,

Emmanuel

----- Le 6 Mai 20, ? 21:35, Marc Dom?nech Andreu <mdomenan at gmail.com> a
?crit :

Oh ok thanks. Well my sequences are COI sequences, a mitochondrial protein
coding gene so there are no gaps in the middle. However, there are in the
extremes, some sequences being longer than others. So I will set
pairwise.deletion=TRUE as you suggest.
Thanks,
Marc

On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

----- Le 5 Mai 20, ? 0:36, Marc Dom?nech Andreu <mdomenan at gmail.com> a
?crit :

Hi,
Yes I tried it. Most of the results are very similar but some change. Do
you know the difference between those two methods?


model = "N" is the Hamming distance (absolute number of differences
between two sequences)

model = "raw" is the Hamming distance divided by the sequence length (aka
uncorrected distance, or p-distance)

About the use of 'pairwise.deletion' in dist.dna(): in fact there is no
simple/unique solution for this option. It depends very much on the data at
hand and the distribution of "missing data", especially gaps. You need to
check their distribution, for example with image(x) of image(x, what = "-")
where 'x' is the DNA data. You may get nonsensical results leaving the
default pairwise.deletion = FALSE if there are long gaps. Even a small
number of gaps may be problematic if there are in a column (site) which is
polymorphic.

Best,

Emmanuel

Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu mdomenan at gmail.com a
?crit :

Thanks for your answer. For computing the distance matrix I am using

the

dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the

equation or

formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar <zkamvar at gmail.com>

wrote:

This highly depends on the distance function you are using for pegas:

1. How does it treat missing data? I believe Arlequin treats missing
data by dropping them from the denominator.

2. If you have a diploid species, does it calculate distance for
haplotypes?

Both of these can affect the resulting Phi values. You might also try
poppr.amova() with the method = "pegas" function to automate the

process.

Best,

Zhian

On 4/29/20 3:04 AM, Marc Dom?nech Andreu wrote:

Hello everyone,
I would like to ask for help with two questions regarding AMOVA and

the

Pegas package.
1. Do you know which is the formula or equation that Pegas and

Arlequin

use

for performing AMOVA? I only get to obtain almost identical results

when

set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in

Arlequin.

2. I'm doing the analyses for several species, and for some of them

obtained negative AMOVA results. I know slightly negative results

are not

uncommon and as far as I know they should be treated as 0, but in

some

cases they are very negative, such as -25%. Why can this be? Maybe

because

I have too few sequences for those species?
Thanks in advance,
Marc

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

*Marc Dom?nech Andreu*
PhD student at University of Barcelona.
Departament de Biologia Evolutiva, Ecologia i Ci?ncies Ambientals.

	[[alternative HTML version deleted]]

Emmanuel Paradis

Thu, May 7, 2020 9:08 PM #

I think you should go deeper in your data exploration. Here are two other other diagnostics you can do: 

del.colgapsonly(x, freq.only = TRUE) 
del.rowgapsonly(x, freq.only = TRUE) 

These will give you the number of gaps for each column and row, respectively. 

Imagine the following situation: x is an alignment with 100 sequences and 1000 sites, all sequences are complete with no ambiguity, except one which has 500 bp from the 5'-end, so it has a trail of 500 "-" on the 3'-end to be aligned with the 99 others. Doing base.freq(x, all = TRUE) will show that there are 0.5% of gaps so you may think it's OK. But that's wrong. Doing dist.dna(x) will throw 50% of the data (even if you add more complete sequences to the alignment!). If the rates of evolution are different in the two halves of the sequence, then comparing the results from dist.dna(x) and dist.dna(x, pairwise.deletion = TRUE) is likely to be very tricky. 

That's where the two above diagnostics may help you: you may find better to remove some sequences if they have a lot of gaps and they create more trouble than anything else. 

HTH 

Best, 

Emmanuel 

----- Le 8 Mai 20, ? 4:07, Marc Dom?nech Andreu <mdomenan at gmail.com> a ?crit :

Hello,
Thanks for the tip. I tried that in several species and it looks like it does
have an effect. The values with pairwise.deletion=TRUE are most of the times a
bit higher than with pairwise.deletion=FALSE, and sometimes equal. Would you
suggest using pairwise.deletion=TRUE then?
I also tried to find if the very negative values in AMOVA results (like -0,25)
were due to very low values of genetic distances, but there doesn't seem to be
a relation.
Thanks for your help,
Marc

On Thu, May 7, 2020 at 5:53 AM Emmanuel Paradis < [
mailto:emmanuel.paradis at ird.fr | emmanuel.paradis at ird.fr ] > wrote:

To see if this has an impact, you can do this:

d0 <- dist.dna(x, "N")
d1 <- dist.dna(x, "N", pairwise.deletion = TRUE)
plot(d0, d1)
abline(0, 1, lty = 3) # draw x = y line

Best,

Emmanuel

----- Le 6 Mai 20, ? 21:35, Marc Dom?nech Andreu < [ mailto:mdomenan at gmail.com |
mdomenan at gmail.com ] > a ?crit :

Oh ok thanks. Well my sequences are COI sequences, a mitochondrial protein
coding gene so there are no gaps in the middle. However, there are in the
extremes, some sequences being longer than others. So I will set
pairwise.deletion=TRUE as you suggest.
Thanks,
Marc

On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis < [
mailto:emmanuel.paradis at ird.fr | emmanuel.paradis at ird.fr ] > wrote:

----- Le 5 Mai 20, ? 0:36, Marc Dom?nech Andreu < [ mailto:mdomenan at gmail.com |
mdomenan at gmail.com ] > a ?crit :

Hi,
Yes I tried it. Most of the results are very similar but some change. Do you
know the difference between those two methods?

model = "N" is the Hamming distance (absolute number of differences between two
sequences)

model = "raw" is the Hamming distance divided by the sequence length (aka
uncorrected distance, or p-distance)

About the use of 'pairwise.deletion' in dist.dna(): in fact there is no
simple/unique solution for this option. It depends very much on the data at
hand and the distribution of "missing data", especially gaps. You need to check
their distribution, for example with image(x) of image(x, what = "-") where 'x'
is the DNA data. You may get nonsensical results leaving the default
pairwise.deletion = FALSE if there are long gaps. Even a small number of gaps
may be problematic if there are in a column (site) which is polymorphic.

Best,

Emmanuel

Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis < [
mailto:emmanuel.paradis at ird.fr | emmanuel.paradis at ird.fr ] > wrote:

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu [ mailto:mdomenan at gmail.com |
mdomenan at gmail.com ] a ?crit :

Thanks for your answer. For computing the distance matrix I am using the
dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the equation or
formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar < [ mailto:zkamvar at gmail.com |
zkamvar at gmail.com ] > wrote:

This highly depends on the distance function you are using for pegas:

1. How does it treat missing data? I believe Arlequin treats missing
data by dropping them from the denominator.

2. If you have a diploid species, does it calculate distance for
haplotypes?

Both of these can affect the resulting Phi values. You might also try
poppr.amova() with the method = "pegas" function to automate the process.

Best,

Zhian

On 4/29/20 3:04 AM, Marc Dom?nech Andreu wrote:

Hello everyone,
I would like to ask for help with two questions regarding AMOVA and the
Pegas package.
1. Do you know which is the formula or equation that Pegas and Arlequin

use

for performing AMOVA? I only get to obtain almost identical results when

set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in Arlequin.
2. I'm doing the analyses for several species, and for some of them I
obtained negative AMOVA results. I know slightly negative results are not
uncommon and as far as I know they should be treated as 0, but in some
cases they are very negative, such as -25%. Why can this be? Maybe

because

I have too few sequences for those species?
Thanks in advance,
Marc

_______________________________________________
R-sig-genetics mailing list
[ mailto:R-sig-genetics at r-project.org | R-sig-genetics at r-project.org ]
[ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

_______________________________________________
R-sig-genetics mailing list
[ mailto:R-sig-genetics at r-project.org | R-sig-genetics at r-project.org ]
[ https://stat.ethz.ch/mailman/listinfo/r-sig-genetics |
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics ]

4 days later

Marc Domènech Andreu

Tue, May 12, 2020 2:12 AM #

Thanks for the answer, the example really helped me understand it much
better. After some exploration of my data I think the case is more similar
to the example, with few but long gaps at the ends rather than many short
ones. So I think I will remove those and see if the estimations improve.
Thanks again,
Marc

On Fri, May 8, 2020 at 6:08 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

I think you should go deeper in your data exploration. Here are two other
other diagnostics you can do:

del.colgapsonly(x, freq.only = TRUE)
del.rowgapsonly(x, freq.only = TRUE)

These will give you the number of gaps for each column and row,
respectively.

Imagine the following situation: x is an alignment with 100 sequences and
1000 sites, all sequences are complete with no ambiguity, except one which
has 500 bp from the 5'-end, so it has a trail of 500 "-" on the 3'-end to
be aligned with the 99 others. Doing base.freq(x, all = TRUE) will show
that there are 0.5% of gaps so you may think it's OK. But that's wrong.
Doing dist.dna(x) will throw 50% of the data (even if you add more complete
sequences to the alignment!). If the rates of evolution are different in
the two halves of the sequence, then comparing the results from dist.dna(x)
and dist.dna(x, pairwise.deletion = TRUE) is likely to be very tricky.

That's where the two above diagnostics may help you: you may find better
to remove some sequences if they have a lot of gaps and they create more
trouble than anything else.

HTH

Best,

Emmanuel

----- Le 8 Mai 20, ? 4:07, Marc Dom?nech Andreu <mdomenan at gmail.com> a
?crit :

Hello,
Thanks for the tip. I tried that in several species and it looks like it
does have an effect. The values with pairwise.deletion=TRUE are most of the
times a bit higher than with pairwise.deletion=FALSE, and sometimes equal.
Would you suggest using pairwise.deletion=TRUE then?
I also tried to find if the very negative values in AMOVA results (like
-0,25) were due to very low values of genetic distances, but there doesn't
seem to be a relation.
Thanks for your help,
Marc

On Thu, May 7, 2020 at 5:53 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

To see if this has an impact, you can do this:

d0 <- dist.dna(x, "N")
d1 <- dist.dna(x, "N", pairwise.deletion = TRUE)
plot(d0, d1)
abline(0, 1, lty = 3) # draw x = y line

Best,

Emmanuel

----- Le 6 Mai 20, ? 21:35, Marc Dom?nech Andreu <mdomenan at gmail.com> a
?crit :

Oh ok thanks. Well my sequences are COI sequences, a mitochondrial
protein coding gene so there are no gaps in the middle. However, there are
in the extremes, some sequences being longer than others. So I will set
pairwise.deletion=TRUE as you suggest.
Thanks,
Marc

On Tue, May 5, 2020 at 5:03 AM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

----- Le 5 Mai 20, ? 0:36, Marc Dom?nech Andreu <mdomenan at gmail.com> a
?crit :

Hi,
Yes I tried it. Most of the results are very similar but some change. Do
you know the difference between those two methods?


model = "N" is the Hamming distance (absolute number of differences
between two sequences)

model = "raw" is the Hamming distance divided by the sequence length
(aka uncorrected distance, or p-distance)

About the use of 'pairwise.deletion' in dist.dna(): in fact there is no
simple/unique solution for this option. It depends very much on the data at
hand and the distribution of "missing data", especially gaps. You need to
check their distribution, for example with image(x) of image(x, what = "-")
where 'x' is the DNA data. You may get nonsensical results leaving the
default pairwise.deletion = FALSE if there are long gaps. Even a small
number of gaps may be problematic if there are in a column (site) which is
polymorphic.

Best,

Emmanuel

Thanks,
Marc

On Mon, May 4, 2020 at 2:44 PM Emmanuel Paradis <emmanuel.paradis at ird.fr>
wrote:

Hi Marc,

Have you tried model = "N" in dist.dna()?

Best,

Emmanuel

----- Le 4 Mai 20, ? 16:44, Marc Dom?nech Andreu mdomenan at gmail.com a
?crit :

Thanks for your answer. For computing the distance matrix I am using

the

dist.dna function in Ape package, with the model set to "raw"
and pairwise.deletion = FALSE. However I don't know exactly the

equation or

formula pegas uses for AMOVA.
I am working with a mitochondrial marker so it would be haploid.
Marc

On Wed, Apr 29, 2020 at 5:26 PM Zhian N. Kamvar <zkamvar at gmail.com>

wrote:

This highly depends on the distance function you are using for pegas:

1. How does it treat missing data? I believe Arlequin treats missing
data by dropping them from the denominator.

2. If you have a diploid species, does it calculate distance for
haplotypes?

Both of these can affect the resulting Phi values. You might also try
poppr.amova() with the method = "pegas" function to automate the

process.

Best,

Zhian

On 4/29/20 3:04 AM, Marc Dom?nech Andreu wrote:

Hello everyone,
I would like to ask for help with two questions regarding AMOVA

and the

Pegas package.
1. Do you know which is the formula or equation that Pegas and

Arlequin

use

for performing AMOVA? I only get to obtain almost identical

results when

set "is.squared = FALSE" in pegas and "Locus by locus AMOVA" in

Arlequin.

2. I'm doing the analyses for several species, and for some of

them I

obtained negative AMOVA results. I know slightly negative results

are not

uncommon and as far as I know they should be treated as 0, but in

some

cases they are very negative, such as -25%. Why can this be? Maybe

because

I have too few sequences for those species?
Thanks in advance,
Marc

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

_______________________________________________
R-sig-genetics mailing list
R-sig-genetics at r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-genetics

*Marc Dom?nech Andreu*
PhD student at University of Barcelona.
Departament de Biologia Evolutiva, Ecologia i Ci?ncies Ambientals.

	[[alternative HTML version deleted]]