-----Original Message-----
From: Bioc-devel [mailto:bioc-devel-bounces at r-project.org] On Behalf Of
Herv? Pag?s
Sent: Sunday, November 15, 2015 2:54 PM
To: Aaron Lun; bioc-devel at r-project.org
Subject: Re: [Bioc-devel] Unexpected behaviour with Assays and Vector
classes
Hi Aaron,
On 11/15/2015 10:59 AM, Aaron Lun wrote:
Hello all,
I've encountered some unexpected behaviour with some of the base
classes while developing stuff for genomic interactions. The first
issue lies with the subset replacement in the Vector class. Let's say
I make a derived class "foo" inheriting from Vector, as below:
> require(S4Vectors)
> setClass("foo", contains="Vector", slots=c(blah="integer")) >
setMethod("parallelSlotNames", "foo", function(x) {
> c("blah", callNextMethod())
> })
> setMethod("c", "foo", function(x, ..., recursive=TRUE) {
> new.blah <- do.call(c, lapply(list(x, ...), FUN=slot, name="blah"))
> new.mcols <- do.call(rbind, lapply(list(x, ...), FUN=mcols))
> new("foo", blah=new.blah, metadata=metadata(x),
> elementMetadata=new.mcols)
> })
[1] "c"
Construction gives what you'd expect:
> a <- new("foo", blah=1:5, elementMetadata=DataFrame(stuff=1:5))
> a at blah
[1] 1 2 3 4 5
However, if I try to do subset replacement, I get this:
> a[1] <- a[2]
> a at blah
[1] 1 2 3 4 5
So, "blah" is replaced properly, but "elementMetadata" is not. This is
attributable to a line in "replaceROWS" which preserves the mcols of
the original object during replacement (also for "names"). Should this
line be removed to give expected behaviour for the elementMetadata?
For the treatment of the metadata columns we are usually mimicking how
names are treated in base R:
> x <- c(a=1, b=2, c=3, d=4)
> x[1] <- x[2]
> x
a b c d
2 2 3 4
The names are not affected.
IRanges objects are following that model:
> library(IRanges)
> ir <- IRanges(11:14, 20, names=letters[1:4]) > mcols(ir) <-
DataFrame(stuff=1:4) > ir IRanges of length 4
start end width names
[1] 11 20 10 a
[2] 12 20 9 b
[3] 13 20 8 c
[4] 14 20 7 d
DataFrame with 4 rows and 1 column
stuff
<integer>
1 1
2 2
3 3
4 4
IRanges of length 4
start end width names
[1] 12 20 9 a
[2] 12 20 9 b
[3] 13 20 8 c
[4] 14 20 7 d
DataFrame with 4 rows and 1 column
stuff
<integer>
1 1
2 2
3 3
4 4
However it seems that GRanges objects are not:
> gr <- GRanges("chr1", ir)
> gr
GRanges object with 4 ranges and 1 metadata column:
seqnames ranges strand | stuff
<Rle> <IRanges> <Rle> | <integer>
a chr1 [12, 20] * | 1
b chr1 [12, 20] * | 2
c chr1 [13, 20] * | 3
d chr1 [14, 20] * | 4
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths > gr[1] <-
gr[2] > gr GRanges object with 4 ranges and 1 metadata column:
seqnames ranges strand | stuff
<Rle> <IRanges> <Rle> | <integer>
a chr1 [12, 20] * | 2
b chr1 [12, 20] * | 2
c chr1 [13, 20] * | 3
d chr1 [14, 20] * | 4
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
So we have an inconsistency within our Vector-based classes.
We need to fix that. It seems that you would have expected the metadata
columns to be altered by [<-. Is this what most people feel?
The other issue is that r/cbind'ing doesn't seem to work properly for
unnamed multi-matrix Assays objects. Consider:
> require(SummarizedExperiment)
> whee <- Assays(list(x1=matrix(1, 3, 4), x2=matrix(2, 3, 4))) >
whee2 <- Assays(list(x1=matrix(3, 3, 4), x2=matrix(4, 3, 4))) >
rbind(whee, whee2) Reference class object of class
"ShallowSimpleListAssays"
Field "data":
List of length 2
names(2): x1 x2
>
> names(whee) <- names(whee2) <- NULL > rbind(whee, whee2)
class object of class "ShallowSimpleListAssays"
Field "data":
List of length 1
So, unnaming and rbind'ing results in the loss of a matrix. This is
the same issue I reported for unnamed multi-matrix assays when
rbinding multiple SummarizedExperiment objects; I recall that being
resolved by r/cbind'ing based on position. Should this be done here as
well? If not, perhaps we should force people to name their assays.
Maybe Martin or Val can chime in for this one.