Skip to content
Prev 9542 / 21318 Next

[Bioc-devel] GenomicRanges, streaming and Tabix

On 07/21/2016 02:12 PM, Simon Anders <simon.anders at fimm.fi> (by way of
Simon Anders wrote:
There is

   Rsamtools::TabixFile() with yieldSize for iteration

   Rsamtools::scanTabix() both for iteration and 'restriction' (IRanges 
/ GRanges() queries into sorted, indexed tabix)

   Rsamtools::bgzip() and indexTabix() for indexing; there seems not to 
be a sortTabix...

and also

   GenomicFiles::reduceByYield() as a framework for iteration

rtracklayer::import() has a `text=` argument.

Together, these lead to the following illustration

suppressPackageStartupMessages({
     library(rtracklayer)
     library(Rsamtools)
     library(GenomicFiles)
})

## preparation

## cat ES_input_filtered_ucsc_chr6.bed | sort -k1,1V -k2,2n > \
##    ES_input_filtered_ucsc_chr6.sorted.bed

fl <- "ES_input_filtered_ucsc_chr6.sorted.bed"
bgz <- bgzip(fl)
indexTabix(bgz, format="bed")

## restriction

tbx0 <- TabixFile(bgz)
param <-GRanges("chr6", IRanges(c(3324254, 4324254), width=1000))
records <- scanTabix(tbx0, param=param)
gr <- import(format="bed", text =unlist(records, use.names=FALSE))
grl <- split(gr, rep(names(records), lengths(records)))

## iteration -- GenomicFiles vignette section 6.2

YIELD <- function(X) unlist(scanTabix(X, format="bed"))
MAP <- function(VALUE, ...) import(format="bed", text=VALUE)
REDUCE <- append
tbx1 <- open(TabixFile(bgz, yieldSize=100000)) # about 466k records
xx <- reduceByYield(tbx1, YIELD, MAP, REDUCE)
close(tbx1)

Martin
This email message may contain legally privileged and/or...{{dropped:2}}