ape, read.dna to phase fasta file not working properly, tajima - R-SIG-Genetics

Wed, Aug 30, 2017 1:49 PM #

?
 fasta8c18.fa
<https://drive.google.com/file/d/0B6qb8IlaQGFZX0Q0YzNUVGNHV1E/view?usp=drive_web>
?Hello,

I'm trying to complete the very simple task of reading in an unphased fasta
file and phasing it using ape, and then calculating Tajima's D using pegas,
but my data doesn't seem to be reading in correctly. Input and output is as
follows:
library("ape")
library("adegenet")
library("ade4")
library("pegas")

Warning message:
In data(DNAbin8c18) : data set ?DNAbin8c18? not found

##clearly the data is not read in properly, so looked at what had been
loaded

817452 DNA sequences in binary format stored in a matrix.

All sequences of same length: 96

Labels:
CLocus_12706_Sample_1_Locus_34105_Allele_0 [BayOfIslands_s08...
CLocus_12706_Sample_2_Locus_31118_Allele_0 [BayOfIslands_s08...
CLocus_12706_Sample_3_Locus_30313_Allele_0 [BayOfIslands_s09...
CLocus_12706_Sample_5_Locus_33345_Allele_0 [BayOfIslands_s09...
CLocus_12706_Sample_7_Locus_37388_Allele_0 [BayOfIslands_s09...
CLocus_12706_Sample_8_Locus_29451_Allele_0 [BayOfIslands_s09...
...

More than 10 million nucleotides: not printing base composition

##although likely won't work, trying taj d test to see what happens

Error: cannot allocate vector of size 2489.3 Gb

?I'm sending the datafile along as a link as well.

Any thoughts would be much appreciated.

Ella?

Ella Bowles, PhD
Postdoctoral Researcher
Department of Biology
Concordia University

Website: https://ellabowlesphd.wordpress.com/
Email: bowlese at gmail.com

	[[alternative HTML version deleted]]

Ella Bowles

Thu, Aug 31, 2017 3:11 PM #

Hello,

I wanted to send a follow-up note to say that the developer helped me with
my problem. His reply was
The problem is that your data are too big (too many sequences) and
tajima.test() needs to compute the matrix of all pairwise distances. You
could this check by trying:

dist.dna(DNAbin8c18, "N")

One possibility for you is to sample randomly some observations, and repeat
this many times, eg:

tajima.test(DNAbin8c18[sample(n, size = 1000), ])

This could be:

N <- 1000 # number of repeats
RES <- matrix(N, 3)
for (i in 1:N)
    RES[, i] <- unlist(tajima.test(DNAbin8c18[sample(n, size = 10000), ]))

You may adjust N and 'size =' to have something not too long to run. Then
you may look at the distribution of the columns of RES.

On Wed, Aug 30, 2017 at 4:49 PM, Ella Bowles <bowlese at gmail.com> wrote:

Ella Bowles, PhD
Postdoctoral Researcher
Department of Biology
Concordia University

Website: https://ellabowlesphd.wordpress.com/
Email: bowlese at gmail.com

	[[alternative HTML version deleted]]