Mismatch distribution
Hello! I need your help. I am trying to calculate the pairwise differences between sequences from several fasta files. I would like for each of my DNA alignments (fasta files), calculate the pairwise differences and then: - 1. Combine all the data of each file to have one file and one histogram (mismatch distribution) - 2. calculate the mean for each difference for all the file and again make a mismatch distribution plot
Here the script that I wrote:
library("pegas")
library("seqinr")
library("ggplot2")
Files <- list.files(pattern="fas")
nb_files <- length(Files)
for (i in 1:nb_files) {
Dist <- as.numeric(dist.gene(read.dna(Files[i], "fasta"), method
= "pairwise",
pairwise.deletion = FALSE, variance = FALSE))
Data <- merge(Data, Dist, by=c("x"), all=T)
}
hist(Data, prob=TRUE) lines(density(Data), col="blue", lwd=2)
However, the script does not work and I do not know what to change to make it working. Thanks in advance for your help. Myriam
Myriam Croze, PhD Post-doctorante Division of EcoScience, Ewha Womans University Seoul, South Korea Email: myriam.croze07 at gmail.com [[alternative HTML version deleted]]