do we have a facility for this? if not, we have https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd it occurred to me that we might want something like this in GenomicRanges (that's where SummarizedExperiment is managed, right?) and I will add it if there are no objections the arguments are currently assayname = "exprs", # for naming SimpleList element fngetter = function(z) rownames(exprs(z)), # extract usable feature names annDbGetter = function(z) { clnanno = sub(".db", "", annotation(z)) stopifnot(require(paste0(annotation(z), ".db"), character.only=TRUE) ) get(paste0(annotation(z), ".db")) # obtain resource for mapping feature names to coordinates }, probekeytype = "PROBEID", # chipDb field to use duphandler = function(z) { # action to take to process duplicated features if (any(isd <- duplicated(z[,"PROBEID"]))) return(z[!isd,,drop=FALSE]) z }, signIsStrand = TRUE, # verify that signs of addresses define strand ucsdChrnames = TRUE # prefix 'chr' to chromosome token
[Bioc-devel] coerce ExpressionSet to SummarizedExperiment
6 messages · Vincent Carey, Sean Davis, Martin Morgan +3 more
Hi, Vince. Looks like a good start. I'd probably pull all the assays from ExpressionSet into SummarizedExperiment as the default, avoiding data coercion methods that are unnecessarily lossy. Also, as it stands, the assayname argument is not used anyway? Sean On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
do we have a facility for this? if not, we have https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd it occurred to me that we might want something like this in GenomicRanges (that's where SummarizedExperiment is managed, right?) and I will add it if there are no objections the arguments are currently assayname = "exprs", # for naming SimpleList element fngetter = function(z) rownames(exprs(z)), # extract usable feature names annDbGetter = function(z) { clnanno = sub(".db", "", annotation(z)) stopifnot(require(paste0(annotation(z), ".db"), character.only=TRUE) ) get(paste0(annotation(z), ".db")) # obtain resource for mapping feature names to coordinates }, probekeytype = "PROBEID", # chipDb field to use duphandler = function(z) { # action to take to process duplicated features if (any(isd <- duplicated(z[,"PROBEID"]))) return(z[!isd,,drop=FALSE]) z }, signIsStrand = TRUE, # verify that signs of addresses define strand ucsdChrnames = TRUE # prefix 'chr' to chromosome token [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 09/20/2014 10:43 AM, Sean Davis wrote:
Hi, Vince. Looks like a good start. I'd probably pull all the assays from ExpressionSet into SummarizedExperiment as the default, avoiding data coercion methods that are unnecessarily lossy. Also, as it stands, the assayname argument is not used anyway?
I think there will be some resistance to uniting the 'Biobase' and 'IRanges' realms under 'GenomicRanges'; considerable effort has gone in to making a rational hierarchy of package dependencies [perhaps Herve will point to some of his ASCII art on the subject]. I have some recollection of (recent) discussion related to this topic in the DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or ... will chime in. Martin
Sean On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
do we have a facility for this? if not, we have https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd it occurred to me that we might want something like this in GenomicRanges (that's where SummarizedExperiment is managed, right?) and I will add it if there are no objections the arguments are currently assayname = "exprs", # for naming SimpleList element fngetter = function(z) rownames(exprs(z)), # extract usable feature names annDbGetter = function(z) { clnanno = sub(".db", "", annotation(z)) stopifnot(require(paste0(annotation(z), ".db"), character.only=TRUE) ) get(paste0(annotation(z), ".db")) # obtain resource for mapping feature names to coordinates }, probekeytype = "PROBEID", # chipDb field to use duphandler = function(z) { # action to take to process duplicated features if (any(isd <- duplicated(z[,"PROBEID"]))) return(z[!isd,,drop=FALSE]) z }, signIsStrand = TRUE, # verify that signs of addresses define strand ucsdChrnames = TRUE # prefix 'chr' to chromosome token [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
On Sep 20, 2014 2:15 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
On 09/20/2014 10:43 AM, Sean Davis wrote:
Hi, Vince. Looks like a good start. I'd probably pull all the assays from ExpressionSet into SummarizedExperiment as the default, avoiding data coercion methods that are unnecessarily lossy. Also, as it stands, the assayname argument is not used anyway?
I think there will be some resistance to uniting the 'Biobase' and
'IRanges' realms under 'GenomicRanges'; considerable effort has gone in to making a rational hierarchy of package dependencies [perhaps Herve will point to some of his ASCII art on the subject].
I have some recollection of (recent) discussion related to this topic in
the DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or ... will chime in.
Andrzej was working on a conversion function for DESeqDataSet <=> DGEList. I don't think it was for SummarizedExperiment and eSet. Mike
Martin
Sean On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
do we have a facility for this? if not, we have https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R
https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd
it occurred to me that we might want something like this in
GenomicRanges
(that's where SummarizedExperiment is managed, right?) and I will add it
if there are no objections
the arguments are currently
assayname = "exprs", # for naming SimpleList element
fngetter =
function(z) rownames(exprs(z)), # extract usable feature
names
annDbGetter =
function(z) {
clnanno = sub(".db", "", annotation(z))
stopifnot(require(paste0(annotation(z), ".db"),
character.only=TRUE) )
get(paste0(annotation(z), ".db")) # obtain resource for
mapping feature names to coordinates
},
probekeytype = "PROBEID", # chipDb field to use
duphandler = function(z) { # action to take to process
duplicated
features
if (any(isd <- duplicated(z[,"PROBEID"])))
return(z[!isd,,drop=FALSE])
z
},
signIsStrand = TRUE, # verify that signs of addresses define
strand
ucsdChrnames = TRUE # prefix 'chr' to chromosome token
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Would be very useful in epivizr. Right now we have a little bit of code that can be made much more general: https://github.com/epiviz/epivizr/blob/master/R/register-methods.R#L73 On Sat, Sep 20, 2014 at 5:19 PM, Michael Love <michaelisaiahlove at gmail.com> wrote:
On Sep 20, 2014 2:15 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
On 09/20/2014 10:43 AM, Sean Davis wrote:
Hi, Vince. Looks like a good start. I'd probably pull all the assays from ExpressionSet into SummarizedExperiment as the default, avoiding data coercion methods that are unnecessarily lossy. Also, as it stands, the assayname argument is not used anyway?
I think there will be some resistance to uniting the 'Biobase' and
'IRanges' realms under 'GenomicRanges'; considerable effort has gone in to making a rational hierarchy of package dependencies [perhaps Herve will point to some of his ASCII art on the subject].
I have some recollection of (recent) discussion related to this topic in
the DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or ... will chime in.
Andrzej was working on a conversion function for DESeqDataSet <=> DGEList. I don't think it was for SummarizedExperiment and eSet. Mike
Martin
Sean On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
do we have a facility for this? if not, we have https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R
it occurred to me that we might want something like this in
GenomicRanges
(that's where SummarizedExperiment is managed, right?) and I will add
it
if there are no objections
the arguments are currently
assayname = "exprs", # for naming SimpleList element
fngetter =
function(z) rownames(exprs(z)), # extract usable feature
names
annDbGetter =
function(z) {
clnanno = sub(".db", "", annotation(z))
stopifnot(require(paste0(annotation(z), ".db"),
character.only=TRUE) )
get(paste0(annotation(z), ".db")) # obtain resource for
mapping feature names to coordinates
},
probekeytype = "PROBEID", # chipDb field to use
duphandler = function(z) { # action to take to process
duplicated
features
if (any(isd <- duplicated(z[,"PROBEID"])))
return(z[!isd,,drop=FALSE])
z
},
signIsStrand = TRUE, # verify that signs of addresses define
strand
ucsdChrnames = TRUE # prefix 'chr' to chromosome token
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
1 day later
Hi,
On 09/20/2014 11:14 AM, Martin Morgan wrote:
On 09/20/2014 10:43 AM, Sean Davis wrote:
Hi, Vince. Looks like a good start. I'd probably pull all the assays from ExpressionSet into SummarizedExperiment as the default, avoiding data coercion methods that are unnecessarily lossy. Also, as it stands, the assayname argument is not used anyway?
I think there will be some resistance to uniting the 'Biobase' and 'IRanges' realms under 'GenomicRanges';
This coercion method could be defined (1) in Biobase (where ExpressionSet is defined), (2) in GenomicRanges (where SummarizedExperiment is defined), or (3) in a package that depends on Biobase and GenomicRanges. Since it's probably undesirable to make Biobase depend on GenomicRanges or vice-versa, we would need to use Suggests for (1) or (2). That means we would get a note like this at installation time: ** preparing package for lazy loading in method for ?coerce? with signature ?"ExpressionSet","SummarizedExperiment"?: no definition for class ?SummarizedExperiment? Not very clean but it works. (3) is a cleaner solution but then the coercion method would not necessarily be available to the user when s/he needs it (unless s/he knows what extra package to load). The obvious advantage of putting the method in Biobase is that if a user has an ExpressionSet, then s/he necessarily has Biobase attached and the method is already in her/his search path. Another solution would be (4) to move SummarizedExperiment somewhere else. That would be in a package that depends on GenomicRanges and Biobase, and the coercion method would be defined there. H.
considerable effort has gone in to making a rational hierarchy of package dependencies [perhaps Herve will point to some of his ASCII art on the subject]. I have some recollection of (recent) discussion related to this topic in the DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or ... will chime in. Martin
Sean On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
do we have a facility for this? if not, we have https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd it occurred to me that we might want something like this in GenomicRanges (that's where SummarizedExperiment is managed, right?) and I will add it if there are no objections the arguments are currently assayname = "exprs", # for naming SimpleList element fngetter = function(z) rownames(exprs(z)), # extract usable feature names annDbGetter = function(z) { clnanno = sub(".db", "", annotation(z)) stopifnot(require(paste0(annotation(z), ".db"), character.only=TRUE) ) get(paste0(annotation(z), ".db")) # obtain resource for mapping feature names to coordinates }, probekeytype = "PROBEID", # chipDb field to use duphandler = function(z) { # action to take to process duplicated features if (any(isd <- duplicated(z[,"PROBEID"]))) return(z[!isd,,drop=FALSE]) z }, signIsStrand = TRUE, # verify that signs of addresses define strand ucsdChrnames = TRUE # prefix 'chr' to chromosome token [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319