Skip to content
Prev 181105 / 398502 Next

Loop avoidance and logical subscripts

retama wrote:
A very efficient way to do this is

   library(Biostrings)
   dna = DNAStringSet(data$sequence)
   alf = alphabetFrequency(dna, baseOnly=TRUE)
   gc = rowSums(alf[,c("G", "C")]) / rowSums(alf)

this takes about .8 second for 3 million 36mers, for instance. 
Biostrings is installed with

   source('http://bioconductor.org/biocLite.R')
   biocLite('Biostrings')

Martin