From: marchywka at hotmail.com
To: tal.galili at gmail.com; r-help at r-project.org
Subject: RE: [R] Performing basic Multiple Sequence Alignment in R?
Date: Tue, 21 Dec 2010 17:03:17 -0500
From: tal.galili at gmail.com
Date: Tue, 21 Dec 2010 20:17:18 +0200
Subject: Re: [R] Performing basic Multiple Sequence Alignment in R?
To: r-help at r-project.org
Dear Mike and Thomas,
From what I gathered here (Thanks to Joris Meys):
http://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r/4498434#4498434
There is an R interface to the MUSCLE algorithm in the bio3d package
(function seqaln()).
But not one for clustal.
I will probably end up using pairwiseAlignment on pairs of allignments
with some sort of stopping rules (I'll have to play with it to see how
it works).
http://scholar.google.com/scholar?hl=en&q=%22exact+string+matching%22+alignment
http://citeseerx.ist.psu.edu/search?q=exact+string+matching+alignment+dna&submit=Search&sort=rel
Certainly if you are flexible and can use whatever may be close in R that
is fine but I seem to recall that exact string matching was a fast and
interesting way to go and maybe some of the authors above, in the interest
of promoting their work, would help implement an R version if there is demand.
I seem to recall I did something like building indexes of the strings to be aligned
first, finding substrings that were unique to a given string but appeared only
once in each of the sequences to be aligned ( this was the most restrictive criterion
but you can imagine how to make it more accomodating). Now that you got me started,
up front tokenizing or compiling of input sequences ( usually no more than indexing
them in some way ) made many later operations like alignment go faster. This
may have ended up being similar to BLAST but now I can't really recall. Anyway,
my point here is that some where in R there may be packages that
generate intermediate forms useful across disciplines- mining data from
text, linquistics, or macromolecule analysis. In fact, the indexing process
helps find things that have migrated a long ways from their original place
and there are probably other non-alignment related things you could
get out of the approach.