Skip to content

[Bioc-devel] Unexpected behaviour with Assays and Vector classes

2 messages · Hervé Pagès, Aaron Lun

#
Hi Aaron,
On 11/15/2015 10:59 AM, Aaron Lun wrote:
For the treatment of the metadata columns we are usually mimicking
how names are treated in base R:

 > x <- c(a=1, b=2, c=3, d=4)
 > x[1] <- x[2]
 > x
a b c d
2 2 3 4

The names are not affected.

IRanges objects are following that model:

 > library(IRanges)
 > ir <- IRanges(11:14, 20, names=letters[1:4])
 > mcols(ir) <- DataFrame(stuff=1:4)
 > ir
IRanges of length 4
     start end width names
[1]    11  20    10     a
[2]    12  20     9     b
[3]    13  20     8     c
[4]    14  20     7     d
 > mcols(ir)
DataFrame with 4 rows and 1 column
       stuff
   <integer>
1         1
2         2
3         3
4         4
 > ir[1] <- ir[2]
 > ir
IRanges of length 4
     start end width names
[1]    12  20     9     a
[2]    12  20     9     b
[3]    13  20     8     c
[4]    14  20     7     d
 > mcols(ir)
DataFrame with 4 rows and 1 column
       stuff
   <integer>
1         1
2         2
3         3
4         4

However it seems that GRanges objects are not:

 > gr <- GRanges("chr1", ir)
 > gr
GRanges object with 4 ranges and 1 metadata column:
     seqnames    ranges strand |     stuff
        <Rle> <IRanges>  <Rle> | <integer>
   a     chr1  [12, 20]      * |         1
   b     chr1  [12, 20]      * |         2
   c     chr1  [13, 20]      * |         3
   d     chr1  [14, 20]      * |         4
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths
 > gr[1] <- gr[2]
 > gr
GRanges object with 4 ranges and 1 metadata column:
     seqnames    ranges strand |     stuff
        <Rle> <IRanges>  <Rle> | <integer>
   a     chr1  [12, 20]      * |         2
   b     chr1  [12, 20]      * |         2
   c     chr1  [13, 20]      * |         3
   d     chr1  [14, 20]      * |         4
   -------
   seqinfo: 1 sequence from an unspecified genome; no seqlengths

So we have an inconsistency within our Vector-based classes.
We need to fix that. It seems that you would have expected the
metadata columns to be altered by [<-. Is this what most people feel?
Maybe Martin or Val can chime in for this one.

Thanks,
H.

  
    
#
Hi Herve,

I would have expected GRanges behaviour, where the metadata is affected
by the replacement. For example, with GRanges objects, I often use the
metadata to store statistics or descriptors relevant to each genomic
interval, e.g., peak scores, log-fold changes and so on. If I did a
replacement to change the genomic coordinates, I would expect that the
associated metadata would come along for the ride. The same argument
would be applied to other classes derived from Vector.

I should add that I encountered this problem in the context of doing
subset replacement for metadata. To re-use your IRanges example:
[1] 1 2 3 4

I would expect that the first entry of "stuff" would now be 2. But
because metadata doesn't get replaced during subset replacement, it
remains at 1. This is problematic, because if I wanted to update a
subset of entries in the metadata, I would use something similar to the
above replacement and expect it to change the existing object.

Cheers,

Aaron
Herv? Pag?s wrote: