[Bioc-devel] coerce ExpressionSet to SummarizedExperiment - Bioc-devel

Sat, Sep 20, 2014 7:38 AM #

do we have a facility for this?

if not, we have

https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R

https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd

it occurred to me that we might want something like this in GenomicRanges
(that's where SummarizedExperiment is managed, right?) and I will add it
if there are no objections

the arguments are currently

     assayname = "exprs",    # for naming SimpleList element
     fngetter =
           function(z) rownames(exprs(z)),   # extract usable feature names
     annDbGetter =
          function(z) {
              clnanno = sub(".db", "", annotation(z))
              stopifnot(require(paste0(annotation(z), ".db"),
character.only=TRUE) )
              get(paste0(annotation(z), ".db"))  # obtain resource for
mapping feature names to coordinates
              },
     probekeytype = "PROBEID",   # chipDb field to use
     duphandler = function(z) {    # action to take to process duplicated
features
          if (any(isd <- duplicated(z[,"PROBEID"])))
              return(z[!isd,,drop=FALSE])
          z
          },
     signIsStrand = TRUE,   # verify that signs of addresses define strand
     ucsdChrnames = TRUE    # prefix 'chr' to chromosome token

Sean Davis

Sat, Sep 20, 2014 10:43 AM #

Hi, Vince.

Looks like a good start.  I'd probably pull all the assays from
ExpressionSet into SummarizedExperiment as the default, avoiding data
coercion methods that are unnecessarily lossy.  Also, as it stands, the
assayname argument is not used anyway?

Sean


On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Martin Morgan

Sat, Sep 20, 2014 11:14 AM #

On 09/20/2014 10:43 AM, Sean Davis wrote:

I think there will be some resistance to uniting the 'Biobase' and 'IRanges' 
realms under 'GenomicRanges'; considerable effort has gone in to making a 
rational hierarchy of package dependencies [perhaps Herve will point to some of 
his ASCII art on the subject].

I have some recollection of (recent) discussion related to this topic in the 
DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or ... will 
chime in.

Martin

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

Michael Love

Sat, Sep 20, 2014 2:19 PM #

On Sep 20, 2014 2:15 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:

'IRanges' realms under 'GenomicRanges'; considerable effort has gone in to
making a rational hierarchy of package dependencies [perhaps Herve will
point to some of his ASCII art on the subject].

the DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or
... will chime in.

Andrzej was working on a  conversion function for  DESeqDataSet <=>
DGEList. I don't think it was for SummarizedExperiment and eSet.

Mike

stvjc at channing.harvard.edu>

https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd

GenomicRanges

names

duplicated

strand

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hector Corrada Bravo

Sat, Sep 20, 2014 2:45 PM #

Would be very useful in epivizr. Right now we have a little bit of code
that can be made much more general:
https://github.com/epiviz/epivizr/blob/master/R/register-methods.R#L73


On Sat, Sep 20, 2014 at 5:19 PM, Michael Love <michaelisaiahlove at gmail.com>
wrote:

On Sep 20, 2014 2:15 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:

On 09/20/2014 10:43 AM, Sean Davis wrote:

Hi, Vince.

Looks like a good start.  I'd probably pull all the assays from
ExpressionSet into SummarizedExperiment as the default, avoiding data
coercion methods that are unnecessarily lossy.  Also, as it stands, the
assayname argument is not used anyway?


I think there will be some resistance to uniting the 'Biobase' and

'IRanges' realms under 'GenomicRanges'; considerable effort has gone in to
making a rational hierarchy of package dependencies [perhaps Herve will
point to some of his ASCII art on the subject].

I have some recollection of (recent) discussion related to this topic in

the DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or
... will chime in.

Andrzej was working on a  conversion function for  DESeqDataSet <=>
DGEList. I don't think it was for SummarizedExperiment and eSet.

Mike

Martin

Sean


On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <

stvjc at channing.harvard.edu>

wrote:

do we have a facility for this?

if not, we have

https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R

https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd

it occurred to me that we might want something like this in

GenomicRanges

(that's where SummarizedExperiment is managed, right?) and I will add

it

if there are no objections

the arguments are currently

      assayname = "exprs",    # for naming SimpleList element
      fngetter =
            function(z) rownames(exprs(z)),   # extract usable feature

names

      annDbGetter =
           function(z) {
               clnanno = sub(".db", "", annotation(z))
               stopifnot(require(paste0(annotation(z), ".db"),
character.only=TRUE) )
               get(paste0(annotation(z), ".db"))  # obtain resource for
mapping feature names to coordinates
               },
      probekeytype = "PROBEID",   # chipDb field to use
      duphandler = function(z) {    # action to take to process

duplicated

features
           if (any(isd <- duplicated(z[,"PROBEID"])))
               return(z[!isd,,drop=FALSE])
           z
           },
      signIsStrand = TRUE,   # verify that signs of addresses define

strand

      ucsdChrnames = TRUE    # prefix 'chr' to chromosome token

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Hervé Pagès

Sun, Sep 21, 2014 10:54 PM #

Hi,

On 09/20/2014 11:14 AM, Martin Morgan wrote:

This coercion method could be defined (1) in Biobase (where
ExpressionSet is defined), (2) in GenomicRanges (where
SummarizedExperiment is defined), or (3) in a package that
depends on Biobase and GenomicRanges.

Since it's probably undesirable to make Biobase depend on GenomicRanges
or vice-versa, we would need to use Suggests for (1) or (2). That
means we would get a note like this at installation time:

  ** preparing package for lazy loading
  in method for ?coerce? with signature 
?"ExpressionSet","SummarizedExperiment"?:
  no definition for class ?SummarizedExperiment?

Not very clean but it works.

(3) is a cleaner solution but then the coercion method would
not necessarily be available to the user when s/he needs it (unless
s/he knows what extra package to load). The obvious advantage of
putting the method in Biobase is that if a user has an ExpressionSet,
then s/he necessarily has Biobase attached and the method is already
in her/his search path.

Another solution would be (4) to move SummarizedExperiment somewhere
else. That would be in a package that depends on GenomicRanges and
Biobase, and the coercion method would be defined there.

H.

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319