Skip to content

[Bioc-devel] pointer and big matrix in R

3 messages · Servant Nicolas, Vincent Carey, Martin Morgan

#
for some approaches to working with large matrices in R, please look at

http://cran.r-project.org/web/views/HighPerformanceComputing.html

scroll down to "Large memory and out-of-memory data"

i suspect this view text should be enlarged to include rhdf5 and ncdf
as additional
relevant resources

On Fri, Apr 12, 2013 at 1:02 PM, Servant Nicolas
<Nicolas.Servant at curie.fr> wrote:
#
On 04/12/2013 10:02 AM, Servant Nicolas wrote:
This is not a super-big object, so perhaps you're running in to problems with 
R's propensity to copy data? An easy solution might be to re-use the 
SummarizedExperiment class, which addresses this issue by placing the 'assays' 
data in a reference class.

     library(GenomicRanges)

     .HTCexp = setClass("HTCexp", contains="SummarizedExperiment",
       representation=representation(y_intervals="GenomicRanges"))

     HTCexp <-
         function(intdata = matrix(0, 0, 0), x_intervals=GRanges(),
                  y_intervals=GRanges(), ...)
     {
         .HTCexp(SummarizedExperiment(intdata, rowData=x_intervals),
                 y_intervals=y_intervals, ...)
     }

which already gives

 > HTCexp()
class: HTCexp
dim: 0 0
exptData(0):
assays(1): ''
rownames: NULL
rowData metadata column names(0):
colnames: NULL
colData names(0):
 > m <- matrix(0, 5000, 5000,
+             dimnames=list(seq_len(5000), seq_len(5000)))
 > g <- GRanges("A", IRanges(1:5000, width=0))
 > HTCexp(m, g, g)
class: HTCexp
dim: 5000 5000
exptData(0):
assays(1): ''
rownames: NULL
rowData metadata column names(0):
colnames(5000): 1 2 ... 4999 5000
colData names(0):

I think you'd need to implement "[" and a 'y_intervals' accessors


     setGeneric("y_intervals", function(x, ...) standardGeneric("y_intervals"))

     setMethod("y_intervals", "HTCexp", function(x, ...) {
         x at y_intervals
     })

     setMethod("[", "HTCexp", function(x, i, j, ..., drop=TRUE) {
         ## not sure that this is complete...
         if (missing(i) && missing(j))
             x
         else {
             se <- as(x, "SummarizedExperiment")
             if (missing(i))
                 initialize(x, se[,j], y_intervals=y_intervals(x)[j])
             else if (missing(j))
                 initialize(x, se[i,])
             else
                 initialize(x, se[i,j], y_intervals=y_intervals(x)[j])
         }
     })

Martin