[Bioc-devel] SummarizedExperiment vs ExpressionSet
One thing that?s become apparent working on epivizr is that it may be useful to think about ?rowData? in a SummarizedExperiment as having two distinct components: row coordinates and row metadata. In the current class rowData is a ?GenomicRanges? which contains both coordinates (the ranges) and metadata (mcols(rowData)). In metagenomics (the other application my group works a lot with), we think of the taxonomy as providing coordinates. The distinction is worthwhile thinking about since there are certain operations we do on coordinates that we don?t do with metadata (and conversely). Thinking about it this way, the ?ExpressionSet? object would be data without coordinates. So, I would avoid making ?GenomicRanges? behave like ?DataFrame? since this distinction between coordinates and metadata is lost. The ?emptyRanges? proposal gets closer to this since this corresponds to ?no coordinates?, but it may be worth thinking in the long term on making the coordinate/metadata distinction more general. Hector On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
so as a simple experiment, I did the following: library(GenomicRanges) bar <- matrix(rnorm(100), ncol=10) colnames(bar) <- as.character(1:10) rownames(bar) <- letters[1:10] foo <- SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## <Rle> <IRanges> <Rle> ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## <7 more elements> colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com> wrote:
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.
Hope this helps.
Best wishes,
Laurent
Thanks and best wishes
Wolfgang
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel