Thank you! The script is now adapted to Biostrings and it is really fast! For
example, it does:
alph_sequence <- alphabetFrequency(data$sequence, baseOnly=TRUE)
data$GCsequence <- rowSums(alph_sequence[,c("G", "C")]) /
rowSums(alph_sequence)
in the G+C computation. It also works amazingly fast in substring extraction
(substring), reverse complement (reverseComplement sequences), palindromes
search (findComplementedPalindromes) and so on.
Now, my bottleneck is conventional string handling, because I have not found
yet how to convert DNAStringSets to vector of chars. Now, I'm doing it by:
dna <- vector()
for (i in 1:length(dnaset)) {
c(dna, toString(data$dnaset[[i]])) -> dna
}
Regards,
Retama