Skip to content

[Bioc-devel] 'GRangesList' does not keep metadata of items

7 messages · Julian Gehring, Michael Lawrence, Hervé Pagès

#
Hi,

It seems to me that constructing a 'GRangesList' object containing 
'GRanges' with metadata does not keep the metadata.  As an example:

* Example 1

library(GenomicRanges)
gr = GRanges()
metadata(gr) = list(a = "1")
metadata(gr) ## the metadata was stored
grl = GRangesList(gr, gr) ## put it in a 'GRangesList
metadata(grl[[1]]) ## no metadata anymore


Also, concaternating 'GRangesList's seem to keep them:

Example 2:

grl = GRangesList(gr, gr)
metadata(grl) = list(b = "2")
metadata(grl) ## it's there
grlc = c(grl, grl)
metadata(grlc) ## now it's gone


The second case would be hard to handle in a general way since it is not 
clear how to combine different metadata list.  However, the first case 
looks not like a expected behavior.

Best wishes
Julian
#
Hi Michael,

The use case is storing experimental metadata togther with a GRanges 
object that does not fit the tabular structure of a GRange.  And at a 
later stage, storing multiple of these annotated GRanges objects 
together as a list/GRangesList.

Best wishes
Julian
#
Hi Michael,

Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially 
solves my issues.  Could you please add a small note to the 
documentation that mentions the different behaviors for the two classes?

Best wishes
Julian
On 09/03/2013 03:34 PM, Michael Lawrence wrote:
#
Hi Julian, Michael,

Alternatively a trick is to use the outer mcols of the GRangesList
object. If the experimental metadata of each GRanges has the same
structure/fields, and those fields contain single values:

   library(GenomicRanges)
   gr1 <- GRanges()
   metadata(gr1) = list(a="1", b="hello")
   gr2 <- GRanges()
   metadata(gr2) = list(a="2", b="world")

   grl <- GRangesList(gr1, gr2)
   mcols(grl) <- DataFrame(a=c(metadata(gr1)$a, metadata(gr2)$a),
                           b=c(metadata(gr1)$b, metadata(gr2)$b))

Then:

   > mcols(grl)
   DataFrame with 2 rows and 2 columns
               a           b
     <character> <character>
   1           1       hello
   2           2       world

If the experimental metadata fields are going to be completely
arbitrary:

   metadata(gr1) = list(a="1", b="hello")
   metadata(gr2) = list(a=c("2", "3"), z="foo", y=letters[1:3])

   grl <- GRangesList(gr1, gr2)
   mcols(grl) <- DataFrame(metadata=I(list(metadata(gr1), metadata(gr2))))

Then:

   > mcols(grl)
   DataFrame with 2 rows and 1 column
     metadata
       <list>
   1 ########
   2 ########

'mcols(grl)$metadata' is a list of lists:

   > mcols(grl)$metadata
   [[1]]
   [[1]]$a
   [1] "1"

   [[1]]$b
   [1] "hello"


   [[2]]
   [[2]]$a
   [1] "2" "3"

   [[2]]$z
   [1] "foo"

   [[2]]$y
   [1] "a" "b" "c"

Cheers,
H.
On 09/03/2013 06:47 AM, Julian Gehring wrote:

  
    
#
Related to the storage of a list inside a DataFrame (as a column),
I found 2 issues:

   df <- DataFrame(A=I(list(a=1:3, b="BB")))

1. The name of the col is not as specified:

     > df
     DataFrame with 2 rows and 1 column
              X
         <list>
     1 ########
     2 ########

2. rbind() doesn't work as expected:

     > rbind(df, df)
     DataFrame with 3 rows and 4 columns
             X.a         X.b     X.a.1       X.b.1
       <integer> <character> <integer> <character>
     1         1          BB         1          BB
     2         2          BB         2          BB
     3         3          BB         3          BB

   or it can break:

     > df <- DataFrame(A=I(list(a=1:3, b=character(0))))
     > rbind(df, df)
     Error in DataFrame(cols) : cannot coerce class "list" to a DataFrame

This last issue will break c() on GRangesList objects that have mcols
of the kind I showed previously.

Cheers,
H.


 > sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicRanges_1.13.39 XVector_0.1.0         IRanges_1.19.28
[4] BiocGenerics_0.7.4

loaded via a namespace (and not attached):
[1] stats4_3.0.1 tools_3.0.1
On 09/03/2013 02:40 PM, Herv? Pag?s wrote: