Message-ID: <966846501.652474.1373409249953.JavaMail.root@fhcrc.org>
Date: 2013-07-09T22:34:09Z
From: Martin Morgan
Subject: [Bioc-devel] GenomicRanges::assays
In-Reply-To: <CAC2h7uvT0+eg020yY8W7hR5dchSg7iQNt7CpZ-zNVrReGO35kw@mail.gmail.com>
The problem is that the dimnames are stored in only one location, and this is not on the assays. When you ask for the assays, the dimnames are added, triggering a full copy of the data. If the dimnames are not of interest, then
assays(BS, withDimnames=FALSE)
This is not really ideal, so I'll give some thought to a better implementation.
Martin
----- Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:
> Note the final "s" in assays. It is super slow. This is a BSseq object
> with 28M rows and 7 columns, which means there are two assays M and Cov
> each being 28M x 7 (which is pretty big, on the Gb scale)
>
> These two commands retrieve the same data as far as I understand.
>
> > system.time({BS at assays$field("data")})
> user system elapsed
> 0 0 0
> > system.time({assays(BS)})
> user system elapsed
> 19.677 10.436 30.114
>
> Follow up question:
>
> 1) It seems that all assays are stored in a SimpleList inside a reference
> class. If I only want to replace one of the assays, like
> assay(Object, "NAME") <- value
> does this mean that all assays are being copied? Is this different from
> say eSet where each assay is a matrix in an environment?
>
> 2) I think we need a convenience function for the assay names of a
> SummarizedExperiment. (This is how I saw the issue above, I was using
> names(assays(Object)))
>
> Kasper
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel