Hi, It seems to me that constructing a 'GRangesList' object containing 'GRanges' with metadata does not keep the metadata. As an example: * Example 1 library(GenomicRanges) gr = GRanges() metadata(gr) = list(a = "1") metadata(gr) ## the metadata was stored grl = GRangesList(gr, gr) ## put it in a 'GRangesList metadata(grl[[1]]) ## no metadata anymore Also, concaternating 'GRangesList's seem to keep them: Example 2: grl = GRangesList(gr, gr) metadata(grl) = list(b = "2") metadata(grl) ## it's there grlc = c(grl, grl) metadata(grlc) ## now it's gone The second case would be hard to handle in a general way since it is not clear how to combine different metadata list. However, the first case looks not like a expected behavior. Best wishes Julian
[Bioc-devel] 'GRangesList' does not keep metadata of items
7 messages · Julian Gehring, Michael Lawrence, Hervé Pagès
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130902/76ae00b9/attachment.pl>
Hi Michael, The use case is storing experimental metadata togther with a GRanges object that does not fit the tabular structure of a GRange. And at a later stage, storing multiple of these annotated GRanges objects together as a list/GRangesList. Best wishes Julian
This second case is exactly what happens to the individual GRanges that constitute the list. They are concatenated to form a single GRanges, which is stored along side a partitioning that defines the individual elements. There is no longer two separate GRanges objects, so there is no easy way to keep the metadata around. It's unfortunate that an implementation detail is exposed in this way, but it would take some effort to support this feature. This is a property of all CompressedList derivatives. What's the use case?
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/bioc-devel/attachments/20130903/63f75930/attachment.pl>
Hi Michael, Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially solves my issues. Could you please add a small note to the documentation that mentions the different behaviors for the two classes? Best wishes Julian
On 09/03/2013 03:34 PM, Michael Lawrence wrote:
If the number of GRanges is small (not thousands), and you don't need the semantic of treating each GRanges as a "compound range", then use GenomicRangesList(). It's a SimpleList, so metadata should be preserved. It's the data structure for storing per-sample GRanges. Michael On Tue, Sep 3, 2013 at 2:39 AM, Julian Gehring <julian.gehring at embl.de>wrote:
Hi Michael, The use case is storing experimental metadata togther with a GRanges object that does not fit the tabular structure of a GRange. And at a later stage, storing multiple of these annotated GRanges objects together as a list/GRangesList. Best wishes Julian This second case is exactly what happens to the individual GRanges that
constitute the list. They are concatenated to form a single GRanges, which is stored along side a partitioning that defines the individual elements. There is no longer two separate GRanges objects, so there is no easy way to keep the metadata around. It's unfortunate that an implementation detail is exposed in this way, but it would take some effort to support this feature. This is a property of all CompressedList derivatives. What's the use case?
Hi Julian, Michael,
Alternatively a trick is to use the outer mcols of the GRangesList
object. If the experimental metadata of each GRanges has the same
structure/fields, and those fields contain single values:
library(GenomicRanges)
gr1 <- GRanges()
metadata(gr1) = list(a="1", b="hello")
gr2 <- GRanges()
metadata(gr2) = list(a="2", b="world")
grl <- GRangesList(gr1, gr2)
mcols(grl) <- DataFrame(a=c(metadata(gr1)$a, metadata(gr2)$a),
b=c(metadata(gr1)$b, metadata(gr2)$b))
Then:
> mcols(grl)
DataFrame with 2 rows and 2 columns
a b
<character> <character>
1 1 hello
2 2 world
If the experimental metadata fields are going to be completely
arbitrary:
metadata(gr1) = list(a="1", b="hello")
metadata(gr2) = list(a=c("2", "3"), z="foo", y=letters[1:3])
grl <- GRangesList(gr1, gr2)
mcols(grl) <- DataFrame(metadata=I(list(metadata(gr1), metadata(gr2))))
Then:
> mcols(grl)
DataFrame with 2 rows and 1 column
metadata
<list>
1 ########
2 ########
'mcols(grl)$metadata' is a list of lists:
> mcols(grl)$metadata
[[1]]
[[1]]$a
[1] "1"
[[1]]$b
[1] "hello"
[[2]]
[[2]]$a
[1] "2" "3"
[[2]]$z
[1] "foo"
[[2]]$y
[1] "a" "b" "c"
Cheers,
H.
On 09/03/2013 06:47 AM, Julian Gehring wrote:
Hi Michael, Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially solves my issues. Could you please add a small note to the documentation that mentions the different behaviors for the two classes? Best wishes Julian On 09/03/2013 03:34 PM, Michael Lawrence wrote:
If the number of GRanges is small (not thousands), and you don't need the semantic of treating each GRanges as a "compound range", then use GenomicRangesList(). It's a SimpleList, so metadata should be preserved. It's the data structure for storing per-sample GRanges. Michael On Tue, Sep 3, 2013 at 2:39 AM, Julian Gehring <julian.gehring at embl.de>wrote:
Hi Michael, The use case is storing experimental metadata togther with a GRanges object that does not fit the tabular structure of a GRange. And at a later stage, storing multiple of these annotated GRanges objects together as a list/GRangesList. Best wishes Julian This second case is exactly what happens to the individual GRanges that
constitute the list. They are concatenated to form a single GRanges, which is stored along side a partitioning that defines the individual elements. There is no longer two separate GRanges objects, so there is no easy way to keep the metadata around. It's unfortunate that an implementation detail is exposed in this way, but it would take some effort to support this feature. This is a property of all CompressedList derivatives. What's the use case?
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319
Related to the storage of a list inside a DataFrame (as a column),
I found 2 issues:
df <- DataFrame(A=I(list(a=1:3, b="BB")))
1. The name of the col is not as specified:
> df
DataFrame with 2 rows and 1 column
X
<list>
1 ########
2 ########
2. rbind() doesn't work as expected:
> rbind(df, df)
DataFrame with 3 rows and 4 columns
X.a X.b X.a.1 X.b.1
<integer> <character> <integer> <character>
1 1 BB 1 BB
2 2 BB 2 BB
3 3 BB 3 BB
or it can break:
> df <- DataFrame(A=I(list(a=1:3, b=character(0))))
> rbind(df, df)
Error in DataFrame(cols) : cannot coerce class "list" to a DataFrame
This last issue will break c() on GRangesList objects that have mcols
of the kind I showed previously.
Cheers,
H.
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] GenomicRanges_1.13.39 XVector_0.1.0 IRanges_1.19.28
[4] BiocGenerics_0.7.4
loaded via a namespace (and not attached):
[1] stats4_3.0.1 tools_3.0.1
On 09/03/2013 02:40 PM, Herv? Pag?s wrote:
Hi Julian, Michael,
Alternatively a trick is to use the outer mcols of the GRangesList
object. If the experimental metadata of each GRanges has the same
structure/fields, and those fields contain single values:
library(GenomicRanges)
gr1 <- GRanges()
metadata(gr1) = list(a="1", b="hello")
gr2 <- GRanges()
metadata(gr2) = list(a="2", b="world")
grl <- GRangesList(gr1, gr2)
mcols(grl) <- DataFrame(a=c(metadata(gr1)$a, metadata(gr2)$a),
b=c(metadata(gr1)$b, metadata(gr2)$b))
Then:
> mcols(grl)
DataFrame with 2 rows and 2 columns
a b
<character> <character>
1 1 hello
2 2 world
If the experimental metadata fields are going to be completely
arbitrary:
metadata(gr1) = list(a="1", b="hello")
metadata(gr2) = list(a=c("2", "3"), z="foo", y=letters[1:3])
grl <- GRangesList(gr1, gr2)
mcols(grl) <- DataFrame(metadata=I(list(metadata(gr1), metadata(gr2))))
Then:
> mcols(grl)
DataFrame with 2 rows and 1 column
metadata
<list>
1 ########
2 ########
'mcols(grl)$metadata' is a list of lists:
> mcols(grl)$metadata
[[1]] [[1]]$a [1] "1" [[1]]$b [1] "hello" [[2]] [[2]]$a [1] "2" "3" [[2]]$z [1] "foo" [[2]]$y [1] "a" "b" "c" Cheers, H. On 09/03/2013 06:47 AM, Julian Gehring wrote:
Hi Michael, Thanks, using 'GenomicRangesList' instead of 'GRangesList' essentially solves my issues. Could you please add a small note to the documentation that mentions the different behaviors for the two classes? Best wishes Julian On 09/03/2013 03:34 PM, Michael Lawrence wrote:
If the number of GRanges is small (not thousands), and you don't need the semantic of treating each GRanges as a "compound range", then use GenomicRangesList(). It's a SimpleList, so metadata should be preserved. It's the data structure for storing per-sample GRanges. Michael On Tue, Sep 3, 2013 at 2:39 AM, Julian Gehring <julian.gehring at embl.de>wrote:
Hi Michael, The use case is storing experimental metadata togther with a GRanges object that does not fit the tabular structure of a GRange. And at a later stage, storing multiple of these annotated GRanges objects together as a list/GRangesList. Best wishes Julian This second case is exactly what happens to the individual GRanges that
constitute the list. They are concatenated to form a single GRanges, which is stored along side a partitioning that defines the individual elements. There is no longer two separate GRanges objects, so there is no easy way to keep the metadata around. It's unfortunate that an implementation detail is exposed in this way, but it would take some effort to support this feature. This is a property of all CompressedList derivatives. What's the use case?
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319