Skip to content

[Bioc-devel] change names(assays(SummarizedExperiment)) w/o copy?

2 messages · Michael Love, Martin Morgan

#
hi,

Is there a way that I can change the names of the assays slot of a
SummarizedExperiment, without making a new copy of the data contained
within? Assume I get an SE which has already been constructed, but no
names on the assays() SimpleList.

thanks,

Mike
> gc()
           used (Mb) gc trigger (Mb) max used (Mb)
 Ncells 1291106   69    1710298 91.4  1590760 85.0
 Vcells 1178619    9    1925843 14.7  1724123 13.2
 > m <- matrix(1:2e7, ncol=10)
 > gc()
            used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1291111 69.0    1967602 105.1  1590760  85.0
 Vcells 11178604 85.3   22482701 171.6 21178631 161.6

# made a ~75 Mb matrix

 > colnames(m) <- letters[1:10]
 > gc()
            used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1291149 69.0    1967602 105.1  1590760  85.0
 Vcells 11178679 85.3   22482701 171.6 21179851 161.6
 > se <- SummarizedExperiment(m)
 > gc()
            used (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1302603 69.6    1967602 105.1  1623929  86.8
 Vcells 12189777 93.1   22482701 171.6 21179851 161.6

# so far no copying

 > names(assays(se)) <- "counts"
 > gc()
            used  (Mb) gc trigger  (Mb) max used  (Mb)
 Ncells  1303174  69.6    1967602 105.1  1623929  86.8
 Vcells 22190847 169.4   23686836 180.8 22203423 169.4

# last step made a copy
R Under development (unstable) (2014-05-07 r65539)
 Platform: x86_64-apple-darwin12.5.0 (64-bit)

 locale:
 [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

 attached base packages:
 [1] parallel  stats     graphics  grDevices utils     datasets  methods
 [8] base

 other attached packages:
 [1] GenomicRanges_1.17.12 GenomeInfoDb_1.1.3    IRanges_1.99.13
 [4] S4Vectors_0.0.6       BiocGenerics_0.11.2

 loaded via a namespace (and not attached):
 [1] RCurl_1.95-4.1 stats4_3.2.0   XVector_0.5.6
#
On 05/07/2014 12:06 PM, Michael Love wrote:
Hi Mike --

   names(assays(se)) = "counts"

extracts the assays from se, then applies the names to the SimpleList, then 
re-assigns the SimpleList to the SummarizedExperiment. The memory copy (of big 
data) is actually in the extraction assays(se)

 > m = matrix(0, 0, 0); tracemem(m)
[1] "<0x3449b4e8>"
 > se = SummarizedExperiment(m)
 > a = assays(se)
tracemem[0x3449b4e8 -> 0x34ef64f0]: lapply lapply lapply lapply endoapply 
endoapply assays assays

which can actually be avoided by asking for the assays without their dimnames

 > a = assays(se, withDimnames=FALSE)
 >

and from there

   names(a) = "counts"
   assays(se) = a

verifying that we haven't actually copied the matrix

 > .Internal(inspect(assays(se, withDimnames=FALSE)[[1]]))
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
ATTRIB:
   @3449b4b0 02 LISTSXP g0c0 []
     TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
     @3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0
 > .Internal(inspect(m))
@3449b4e8 14 REALSXP g0c0 [NAM(2),TR,ATT] (len=0, tl=0)
ATTRIB:
   @3449b4b0 02 LISTSXP g0c0 []
     TAG: @b9c778 01 SYMSXP g0c0 [LCK,gp=0x4000] "dim" (has value)
     @3449a118 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) 0,0

One would hope (a) that I'd followed through on a previous promise to just apply 
the dimnames up-front, so that there is no need to use withDimnames=FALSE to 
avoid the copying (there might have been a price on the way in) and (b) that the 
following would work

   names(assays(se, withDimnames=FALSE)) = "counts"

it didn't

 > names(assays(se, withDimnames=FALSE)) = "counts"
Error in slot(x, nm) :
   no slot of name "withDimnames" for this object of class "SummarizedExperiment"

but does in 1.17.13

Martin