Skip to content

[Bioc-devel] SummarizedExperiment: duplication of metadata, when modifying colData

2 messages · Felix Ernst, Hervé Pagès

#
Hi all,

 

I got a bit of weird behaviour with SummarizedExperiments in Bioc 3.6 and
3.7. I suppose it is a bug, but I might be wrong, since the accession to the
SummarizedExperiment object is not really straight forward. Any suggestions?

library(GenomicRanges)

library(SummarizedExperiment)

 

nrows <- 200; ncols <- 6

counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)

colnames(counts) <- LETTERS[1:6]

rownames(counts) <- 1:nrows

counts2 <- counts-floor(counts)

rowRanges <- GRanges(rep(c("chr1", "chr2"), c(50, 150)),

                     IRanges(floor(runif(200, 1e5, 1e6)), width=100),

                     strand=sample(c("+", "-"), 200, TRUE),

                     feature_id=sprintf("ID%03d", 1:200)) 

colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),

                     row.names=LETTERS[1:6])

 

se <- SummarizedExperiment(assays=list(counts=counts),

                           rowRanges=rowRanges,

                           colData=colData)

colData(se)$xyz <- rep("",ncol(se))

metadata(se) <- list("meep" = "meep")

 

str(metadata(se))

colData(se[, 1])$xyz <- "abc"

str(metadata(se))

The first metadata() returns a list, length of 1, with the correct data. The
second call returns a list of two, with a duplicated entries and every
further colData modification (and replacing data) duplicates the entries in
the metadata further.
List of 1

$ meep: chr "meep"
List of 2

$ meep: chr "meep"

$ meep: chr "meep"
List of 4

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"
List of 8

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"

$ meep: chr "meep"

Thanks for any advice and suggestions.

Felix



---



Felix Ernst, PhD

Universit? Libre de Bruxelles

RNA MOLECULAR BIOLOGY

BIOPARK Charleroi Brussels-South CAMPUS

Rue Profs Jeener & Brachet, 12

B-6041 Charleroi - Gosselies

BELGIUM

+32(2)650 9774 (office phone)

 <mailto:felix.ernst at ulb.ac.be> felix.ernst at ulb.ac.be
2 days later
#
Hi Felix,

Nice catch. This can actually be reproduced with just:

   > example(SummarizedExperiment)
   > metadata(se0) <- list(aa="aa")
   > se0[1 , ] <- se0[1 , ]
   > metadata(se0)
   $aa
   [1] "aa"

   $aa
   [1] "aa"

The culprit is this line:

   ans_metadata <- c(metadata(x), metadata(value))

in the "[<-" method for SummarizedExperiment objects.

So somehow it looks like it was a deliberate decision to have
[<- combine the metadata of 'x' and 'value'. Problem is that
this breaks the more-than-reasonable expectation that something
like x[i , j] <- x[i , j] should be a no-op.

I replaced the above line with:

   ans_metadata <- metadata(x)

in SummarizedExperiment 1.9.5 (devel). With this change [<-
leaves metadata(x) intact and x[i , j] <- x[i , j] behaves like
a no-op:

 
https://github.com/Bioconductor/SummarizedExperiment/commit/e4fcb99c442e2f17b0ccddfb05df9f160e0bbe40

Will port to release soon.

Cheers,
H.
On 12/12/2017 01:05 AM, Felix Ernst wrote: