Loop avoidance and logical subscripts

Martin Morgan · 2009-05-21T17:18:52Z

retama wrote: > Patrick Burns kindly provided an article about this issue called 'The R > Inferno'. However, I will expand a little bit my question because I think it > is not clear and, if I coud improve the code it will be more understandable > to other users reading this messages when I will paste it :) > > In my example, I have a dataframe with several hundreds of DNA sequences in > the column data$sequences (each value is a long string written in an > alphabet of four characters, which are

Martin Morgan

Thu, May 21, 2009 10:18 AM

retama wrote:

A very efficient way to do this is

   library(Biostrings)
   dna = DNAStringSet(data$sequence)
   alf = alphabetFrequency(dna, baseOnly=TRUE)
   gc = rowSums(alf[,c("G", "C")]) / rowSums(alf)

this takes about .8 second for 3 million 36mers, for instance. 
Biostrings is installed with

   source('http://bioconductor.org/biocLite.R')
   biocLite('Biostrings')

Martin

Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Thread (5 messages)

retama Loop avoidance and logical subscripts May 21 retama Loop avoidance and logical subscripts May 21 (Ted Harding) Loop avoidance and logical subscripts May 21 Martin Morgan Loop avoidance and logical subscripts May 21 retama Loop avoidance and logical subscripts May 27