Skip to content

[Bioc-devel] coerce ExpressionSet to SummarizedExperiment

6 messages · Vincent Carey, Sean Davis, Martin Morgan +3 more

#
do we have a facility for this?

if not, we have

https://github.com/vjcitn/biocMultiAssay/blob/master/R/exs2se.R

https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd

it occurred to me that we might want something like this in GenomicRanges
(that's where SummarizedExperiment is managed, right?) and I will add it
if there are no objections

the arguments are currently

     assayname = "exprs",    # for naming SimpleList element
     fngetter =
           function(z) rownames(exprs(z)),   # extract usable feature names
     annDbGetter =
          function(z) {
              clnanno = sub(".db", "", annotation(z))
              stopifnot(require(paste0(annotation(z), ".db"),
character.only=TRUE) )
              get(paste0(annotation(z), ".db"))  # obtain resource for
mapping feature names to coordinates
              },
     probekeytype = "PROBEID",   # chipDb field to use
     duphandler = function(z) {    # action to take to process duplicated
features
          if (any(isd <- duplicated(z[,"PROBEID"])))
              return(z[!isd,,drop=FALSE])
          z
          },
     signIsStrand = TRUE,   # verify that signs of addresses define strand
     ucsdChrnames = TRUE    # prefix 'chr' to chromosome token
#
Hi, Vince.

Looks like a good start.  I'd probably pull all the assays from
ExpressionSet into SummarizedExperiment as the default, avoiding data
coercion methods that are unnecessarily lossy.  Also, as it stands, the
assayname argument is not used anyway?

Sean


On Sat, Sep 20, 2014 at 10:38 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

  
  
#
On 09/20/2014 10:43 AM, Sean Davis wrote:
I think there will be some resistance to uniting the 'Biobase' and 'IRanges' 
realms under 'GenomicRanges'; considerable effort has gone in to making a 
rational hierarchy of package dependencies [perhaps Herve will point to some of 
his ASCII art on the subject].

I have some recollection of (recent) discussion related to this topic in the 
DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or ... will 
chime in.

Martin

  
    
#
On Sep 20, 2014 2:15 PM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:
'IRanges' realms under 'GenomicRanges'; considerable effort has gone in to
making a rational hierarchy of package dependencies [perhaps Herve will
point to some of his ASCII art on the subject].
the DESeq2 realm, but am drawing a blank; presumably Michael or Wolfgang or
... will chime in.
Andrzej was working on a  conversion function for  DESeqDataSet <=>
DGEList. I don't think it was for SummarizedExperiment and eSet.

Mike
stvjc at channing.harvard.edu>
https://github.com/vjcitn/biocMultiAssay/blob/master/man/coerce-methods.Rd
GenomicRanges
names
duplicated
strand

  
  
#
Would be very useful in epivizr. Right now we have a little bit of code
that can be made much more general:
https://github.com/epiviz/epivizr/blob/master/R/register-methods.R#L73


On Sat, Sep 20, 2014 at 5:19 PM, Michael Love <michaelisaiahlove at gmail.com>
wrote:

  
  
1 day later
#
Hi,
On 09/20/2014 11:14 AM, Martin Morgan wrote:
This coercion method could be defined (1) in Biobase (where
ExpressionSet is defined), (2) in GenomicRanges (where
SummarizedExperiment is defined), or (3) in a package that
depends on Biobase and GenomicRanges.

Since it's probably undesirable to make Biobase depend on GenomicRanges
or vice-versa, we would need to use Suggests for (1) or (2). That
means we would get a note like this at installation time:

  ** preparing package for lazy loading
  in method for ?coerce? with signature 
?"ExpressionSet","SummarizedExperiment"?:
  no definition for class ?SummarizedExperiment?

Not very clean but it works.

(3) is a cleaner solution but then the coercion method would
not necessarily be available to the user when s/he needs it (unless
s/he knows what extra package to load). The obvious advantage of
putting the method in Biobase is that if a user has an ExpressionSet,
then s/he necessarily has Biobase attached and the method is already
in her/his search path.

Another solution would be (4) to move SummarizedExperiment somewhere
else. That would be in a package that depends on GenomicRanges and
Biobase, and the coercion method would be defined there.

H.