[Bioc-devel] [BioC] readGappedAlignments and param argument

Tue, Sep 10, 2013 3:25 AM

On 09/10/2013 12:22 AM, Elena Grassi wrote:

You can use summarizeOverlaps with a 'BamFileList' created by something like

   myFiles = dir("/some/dir", pattern="bam$")
   bfl = BamFileList(myFiles, yieldSize=1000000)
   olaps = summarizeOverlaps(features, bfl)

see the example on the help page

   method?"summarizeOverlaps,GRanges,BamFileList"

(there's helpful tab completion after 'method?"sum<tab>'). If you've also said

   library(parallel)
   options(mc.cores=detectCores())  ## how many cores to use?

then the bam files are processed in parallel (choose yieldSize so that the 
different threads are not competing for memory).

more generally a strategy is to use yieldSize and iterate through the file, like

   bf = BamFile(fl, yieldsSize=1000000)
   open(bf)
   repeat {
       curr = readBamGappedAlignments(bf) ## read up to yieldSize records
       if (length(curr) == 0)
           break ## done
       ## process chunk 'curr'
   }
   close(bf)

this is actually what summarizeOverlaps does when invoked on a BamFile (or 
BamFileList().

one small thing that came out of that thread was that

   as(seqinfo(BamFile("/some/file")), "GRanges")

gives a GRanges of all the sequence lengths.

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

[Bioc-devel] [BioC] readGappedAlignments and param argument

Thread (4 messages)