Skip to content

[Bioc-devel] coerce ExpressionSet to SummarizedExperiment

16 messages · Levi Waldron, Martin Morgan, Michael Lawrence +3 more

#
I just dug up this old thread because I realized we still don't have a
coercion method as(sample.ExpressionSet, "SummarizedExperiment"). Since we
do have SummarizedExperiment(sample.ExpressionSet), could the coercion
method also be added easily?
example("ExpressionSet")
dim: 500 26
metadata(0):
assays(1): ''
rownames(500): AFFX-MurIL2_at AFFX-MurIL10_at ... 31738_at 31739_at
rowData names(0):
colnames(26): A B ... Y Z
colData names(0):> as(sample.ExpressionSet,
"SummarizedExperiment")Error in as(sample.ExpressionSet,
"SummarizedExperiment") :
  no method or default for coercing ?ExpressionSet? to ?SummarizedExperiment?
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils
datasets  methods   base

other attached packages:
[1] SummarizedExperiment_1.7.5 DelayedArray_0.3.16
matrixStats_0.52.2
[4] GenomicRanges_1.29.6       GenomeInfoDb_1.13.4
IRanges_2.11.7
[7] S4Vectors_0.15.5           Biobase_2.37.2
BiocGenerics_0.23.0

loaded via a namespace (and not attached):
 [1] lattice_0.20-35         bitops_1.0-6            grid_3.4.0
 [4] zlibbioc_1.23.0         XVector_0.17.0          Matrix_1.2-11
 [7] tools_3.4.0             RCurl_1.95-4.8          compiler_3.4.0
[10] GenomeInfoDbData_0.99.1

        
On Mon, Sep 22, 2014 at 1:54 AM, Herv? Pag?s <hpages at fhcrc.org> wrote:

            

  
    
#
On 09/10/2017 08:38 PM, Levi Waldron wrote:
try as(sample.ExpressionSet, "RangedSummarizedExperiment"); see 
?makeSummarizedExperimentFromExpressionSet
This email message may contain legally privileged and/or...{{dropped:2}}
#
Thanks Martin! I see the RangedSummarizedExperiment coercion method works
when there are no mappable ranges (for example curatedMetagenomicData
ExpressionSet objects), although the rowRanges is a GRangesList of empty
elements. It might be worth also having a SummarizedExperiment coercion
method it it's not a problematic or big job. And now I suppose I can ask
the question I *really* wanted to know, which is why can't I coerce an
object that extends eSet? I can still use the SummarizedExperiment()
constructor, but for example:
attr(,"package")
[1] "metagenomeSeq"> is(mouseData, "ExpressionSet")[1] FALSE>
is(mouseData, "eSet")[1] TRUE
dim: 10172 139
metadata(0):
assays(1): ''
rownames(10172): Prevotellaceae:1 Lachnospiraceae:1 ... Bryantella:103
  Parabacteroides:956
rowData names(0):
colnames(139): PM1:20080107 PM1:20080108 ... PM9:20080225 PM9:20080303
colData names(0):
"RangedSummarizedExperiment") : no method or default for coercing
?MRexperiment? to ?RangedSummarizedExperiment? > as(mouseData,
"SummarizedExperiment") Error in as(mouseData, "SummarizedExperiment") : no
method or default for coercing ?MRexperiment? to
?SummarizedExperiment? > as(mouseData,
"ExpressionSet") Error in updateOldESet(from, "ExpressionSet") : no slot of
name "pData" for this object of class "AnnotatedDataFrame" >




On Mon, Sep 11, 2017 at 6:58 AM, Martin Morgan <
martin.morgan at roswellpark.org> wrote:

            

  
    
#
I guess we discussed this with Davide Risso @Bioc2017 in the
MultiAssayExperiment workshop.
puts the eSet (rather counterintuitively) into `assays` of
`SummarizedExperiment`, it does not really coerce it to
SummarizedExperiment, eg. `fData` and `pData` are not accordingly
transferred to colData and rowData.

While I can understand that this is by design of `SummarizedExperiment`, I
really wonder whether there are use cases where somebody would like to put
an `ExpressionSet` in `assays` of `SummarizedExperiment`, and not rather
would like to coerce it that way.

Furthermore, if you would indeed like to have several `ExpressionSet`s in
a `SummarizedExperiment`, haven't you already arrived at a scenario where
use of `MultiAssayExperiment` is indicated?

  
    
#
On Mon, Sep 11, 2017 at 11:56 AM, Ludwig Geistlinger <
Ludwig.Geistlinger at bio.ifi.lmu.de> wrote:

            
Right, I had forgotten about that - this isn't a coercion but a
construction, which should be obvious from the use of a constructor
function. This behavior is intuitive if you remember that
SummarizedExperiment(assays, ...) is a constructor that accepts as assays
any object or list of objects supporting square bracket matrix-like
subsetting. Sorry for my brain hiccup there.
I think the behavior of the constructor SummarizedExperiment() here is
correct and expected, the issue here is that we're actually looking for
coercion methods.
#
It's probably good keeping coercion and construction distinct,
although we have violated that recently with GRanges(). It now
attempts to coerce its first argument to a GRanges. Don't want to
derail the discussion, but it's another data point.

Michael

On Mon, Sep 11, 2017 at 9:26 AM, Levi Waldron
<lwaldron.research at gmail.com> wrote:
#
Hi,

I added coercion from ExpressionSet to SummarizedExperiment in
SummarizedExperiment 1.7.6.

The current behavior of the SummarizedExperiment() constructor
when called on a ExpressionSet object doesn't make much sense to
me. I'd rather have it consistent with what the coercion does.
Will fix it.

Cheers,
H.
On 09/11/2017 09:58 AM, Michael Lawrence wrote:

  
    
#
On Mon, Sep 11, 2017 at 2:02 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:

            
Thank you Herv?!
Thank you, again.

A couple more questions while I'm at it, that may expose the limitations in
my understanding of inheritance and project history... 1) Why have some
developers chosen to extend eSet instead of ExpressionSet (definition
<https://github.com/Bioconductor/Biobase/blob/536f137165ca08b3be22819e51e055b3e7afe86d/R/DataClasses.R#L166>),
and 2) why are these coercion methods developed for ExpressionSet rather
than eSet? Wouldn't an eSet coercion method be preferable because it would
cover ExpressionSet as well as all the classes that extend eSet?
#
Concerning 1) Why have some developers chosen to extend eSet instead of
ExpressionSet:

As far as I understand it, ExpressionSet was thought to exclusively
represent a microarray experiment (MIAME = Minimum Information About a
Microarray Experiment).

Thus, back in the days when more and more people started using RNA-seq and
there was no SummarizedExperiment, developers extended eSet with e.g.
assayData slots called `counts` instead of `exprs` to represent RNA-seq
data.

  
    
#
An ExpressionSet is an eSet that is guaranteed to have an "exprs" assay.
That makes no sense for example for methylation where we have (say)
Green/Red assays or Meth/Unmeth assays (or transformations of these).

Best,
Kasper

On Mon, Sep 11, 2017 at 3:31 PM, Ludwig Geistlinger <
Ludwig.Geistlinger at bio.ifi.lmu.de> wrote:

            

  
  
#
Thanks Ludwig and Kasper. This old presentation from Martin also helped me
a lot:

https://www.bioconductor.org/packages/devel/bioc/vignettes/Biobase/inst/doc/BiobaseDevelopment.pdf

But I still wonder, why provide the coercion for ExpressionSet, if
providing it for eSet would work not only for ExpressionSet but for
everything else derived from eSet? The coercion function seems to work fine
on the eSet-derived NChannelSet-class {the assays=as.list(assayData(from))
 seems to work regardless of the storage mode}:
attr(,"package")
[1] "Biobase"> is(obj, "eSet")[1] TRUE
RangedSummarizedExperiment dim: 10 3 metadata(3): experimentData annotation
protocolData assays(2): G R rownames(10): 1 2 ... 9 10 rowData names(0):
colnames(3): A B C colData names(3): ChannelRData ChannelGData ChannelRAndG
Error in as(obj, "RangedSummarizedExperiment") :
  no method or default for coercing ?NChannelSet? to
?RangedSummarizedExperiment?

  
  
#
I don't know the reasons behind this choice, I didn't implement
these methods. It would make sense to have these coercions defined
for the eSet,SummarizedExperiment and eSet,RangedSummarizedExperiment
signatures if they only access the eSet part of the object.
I'll look into this.

H.
On 09/11/2017 04:50 PM, Levi Waldron wrote:

  
    
#
I see how an eSet maps to a SummarizedExperiment; one can look at the
assays in the object.  I agree with Levi/Herve that this is the natural
(correct) choice.  It is less clear how you get the ranges for a
RangedSummarizedExperiment without making assumptions.
On Mon, Sep 11, 2017 at 8:09 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:

            

  
  
1 day later
#
Coercing vice versa, i.e. from SummarizedExperiment to ExpressionSet,
which is defined in

SummarizedExperiment/R/makeSummarizedExperimentFromExpressionSet.R

as follows:

setAs("SummarizedExperiment", "ExpressionSet", function(from)
    as(as(from, "RangedSummarizedExperiment"), "ExpressionSet")
)

also seems to be a bit problematic, as it makes you lose your rowData/fData.



Here is an example:

## Constructing the SE similar to examples of ?SummarizedExperiment
row.names=LETTERS[1:6])


## some rowData with simulated gene IDs
1:200))
colData=colData, rowData=rowData)

# this is how it looks
DataFrame with 200 rows and 1 column
     EntrezID
    <integer>
1         289
2         476
3         608
4         998
5         684
...       ...
196       331
197       590
198       445
199        95
200       129

(why did I actually lost the rownames g1-g200 here?)


## Coercing to Expression makes me losing the rowData/fData
data frame with 0 columns and 200 rows


## So where is the problem?
## Apparently in the coercion
##    from SummarizedExperiment to RangedSummarizedExperiment
DataFrame with 200 rows and 0 columns
#
Hi Ludwig,

Excellent catch! Thanks for the report.

This should be fixed in SummarizedExperiment release (1.6.4) and devel
(1.7.7).

Cheers,
H.
On 09/13/2017 02:54 PM, Ludwig Geistlinger wrote:

  
    
#
One more thing. See below...
On 09/13/2017 02:54 PM, Ludwig Geistlinger wrote:
Your rownames were moved to the names of the object:

 > head(names(se))
[1] "g1" "g2" "g3" "g4" "g5" "g6"

The rowData() accessor (like the mcols() accessor, note that rowData()
is just an alias for mcols) does not restore them by default, unless
you use 'use.names=TRUE'.

 > rowData(se, use.names=TRUE)
DataFrame with 200 rows and 1 column
       EntrezID
      <integer>
g1         616
g2          45
g3         944
g4         632
g5         270
...        ...
g196       827
g197       943
g198       291
g199       432
g200       106

All Vector derivatives do that (e.g. GRanges), not just
SummarizedExperiment.

The reason for this design is that the rownames must be unique
(this is a base R requirement). By moving them from the DataFrame
containing the metadata columns to the names of the object, Vector
derivatives can be subsetted in a way that repeat some of their
elements. If the rownames were on the DataFrame containing the
metadata columns, these subsetting operations wouldn't be
possible.

Hope this makes sense,
H.