A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment(?s interface). Are there any pertinent insights from this group? Thanks and best wishes Wolfgang
[Bioc-devel] SummarizedExperiment vs ExpressionSet
10 messages · Wolfgang Huber, Laurent Gatto, Michael Lawrence +4 more
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics data, and we are debating whether to base it on the SummarizedExperiment or the ExpressionSet class. There is no immediate use for the ranges aspect of SummarizedExperiment, so that would have to be carried around with NAs, and this is a parsimony argument for using ExpressionSet instead. OTOH, the interface of SummarizedExperiment is cleaner, its code more modern and more likely to be updated, and users of the Bioconductor project are likely to benefit from having to deal with a single interface that works the same or similarly across packages, rather than a variety of formats; which argues that new packages should converge towards SummarizedExperiment(?s interface). Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent
Thanks and best wishes Wolfgang
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Laurent Gatto http://cpu.sysbiol.cam.ac.uk/
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is essentially an ExpressionSet for quantitative proteomics (i.e it has a MIAPE slot, instead of MIAME for example). Ideally, a SummarizedExperiment for proteomics would use peptide/protein ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent
Thanks and best wishes Wolfgang
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com> wrote:
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData?
(I believe GenomicRanges should inherit from
DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.)
Have to disagree on that. A GRanges is a vector of ranges; a table is a list of vectors all of the same length. Different things. There was a lot of thought invested in that. But it does subset as x[i,j], so in theory SummarizedExperiment could be generalized to contain something with the contract of 2D extraction.
I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.
Hope this helps.
Best wishes,
Laurent
Thanks and best wishes
Wolfgang
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
so as a simple experiment, I did the following: library(GenomicRanges) bar <- matrix(rnorm(100), ncol=10) colnames(bar) <- as.character(1:10) rownames(bar) <- letters[1:10] foo <- SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## <Rle> <IRanges> <Rle> ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## <7 more elements> colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com> wrote:
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.
Hope this helps.
Best wishes,
Laurent
Thanks and best wishes
Wolfgang
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
GRangesList is very compact, so this would definitely get the job done. But having an empty range is not the same as a NA, nor does it mean that ranges are "irrelevant". There are definitely times, especially as we extend beyond genomics, when we need something more general, as Pete suggests. As an aside I think there is an interesting structural relationship between something like an eSet and a pivot table in a spreadsheet, except an eSet has multiple measurement tables and the column/row annotations are not just for aggregation. If we start to think more broadly, we should consider such specializations and try to unify them into a single framework. On Wed, Nov 26, 2014 at 9:37 AM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
so as a simple experiment, I did the following: library(GenomicRanges) bar <- matrix(rnorm(100), ncol=10) colnames(bar) <- as.character(1:10) rownames(bar) <- letters[1:10] foo <- SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## <Rle> <IRanges> <Rle> ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## <7 more elements> colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com> wrote:
Hi all, I believe there is a strong need for an object that organizes a
collection
of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that
has
a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data
into
an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use
peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that becomes available, there should be infrastructure to coerce and MSnSet (and/or other relevant data) into an SummarizedExperiment. Hope this helps. Best wishes, Laurent
Thanks and best wishes Wolfgang
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Laurent Gatto http://cpu.sysbiol.cam.ac.uk/
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
One thing that?s become apparent working on epivizr is that it may be useful to think about ?rowData? in a SummarizedExperiment as having two distinct components: row coordinates and row metadata. In the current class rowData is a ?GenomicRanges? which contains both coordinates (the ranges) and metadata (mcols(rowData)). In metagenomics (the other application my group works a lot with), we think of the taxonomy as providing coordinates. The distinction is worthwhile thinking about since there are certain operations we do on coordinates that we don?t do with metadata (and conversely). Thinking about it this way, the ?ExpressionSet? object would be data without coordinates. So, I would avoid making ?GenomicRanges? behave like ?DataFrame? since this distinction between coordinates and metadata is lost. The ?emptyRanges? proposal gets closer to this since this corresponds to ?no coordinates?, but it may be worth thinking in the long term on making the coordinate/metadata distinction more general. Hector On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
so as a simple experiment, I did the following: library(GenomicRanges) bar <- matrix(rnorm(100), ncol=10) colnames(bar) <- as.character(1:10) rownames(bar) <- letters[1:10] foo <- SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## <Rle> <IRanges> <Rle> ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## <7 more elements> colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com> wrote:
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.
Hope this helps.
Best wishes,
Laurent
Thanks and best wishes
Wolfgang
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi guys, I like the idea of separating the row data from the row ranges. This could be formalized with 2 distinct accessors: rowData() and rowRanges(). The former would return a DataFrame, and the latter NULL or a range-based object (GRanges or GRangesList). I don't think there is the need for an emptyRanges class. H.
On 11/26/2014 11:40 AM, Hector Corrada Bravo wrote:
One thing that?s become apparent working on epivizr is that it may be useful to think about ?rowData? in a SummarizedExperiment as having two distinct components: row coordinates and row metadata. In the current class rowData is a ?GenomicRanges? which contains both coordinates (the ranges) and metadata (mcols(rowData)). In metagenomics (the other application my group works a lot with), we think of the taxonomy as providing coordinates. The distinction is worthwhile thinking about since there are certain operations we do on coordinates that we don?t do with metadata (and conversely). Thinking about it this way, the ?ExpressionSet? object would be data without coordinates. So, I would avoid making ?GenomicRanges? behave like ?DataFrame? since this distinction between coordinates and metadata is lost. The ?emptyRanges? proposal gets closer to this since this corresponds to ?no coordinates?, but it may be worth thinking in the long term on making the coordinate/metadata distinction more general. Hector On Wed, Nov 26, 2014 at 12:38 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
so as a simple experiment, I did the following: library(GenomicRanges) bar <- matrix(rnorm(100), ncol=10) colnames(bar) <- as.character(1:10) rownames(bar) <- letters[1:10] foo <- SummarizedExperiment(assays=list(bar=bar)) rowData(foo) ## GRangesList object of length 10: ## $a ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## <Rle> <IRanges> <Rle> ## ## $b ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## $c ## GRanges object with 0 ranges and 0 metadata columns: ## seqnames ranges strand ## ## ... ## <7 more elements> colData(foo) ## DataFrame with 10 rows and 0 columns This got me to thinking, why not have an emptyRanges class, or else the ability to index a bunch of NULL ranges without a lot of hoohah? The defaults mostly do what they're supposed to; why not have a compact representation of empty rowData as for empty colData (i.e., a DataFrame with 0 rows)? Or is a GRangesList of empty GRanges as compact as it is practicable to get for this purpose? Just pondering what the lowest-impact solution to the problem at hand might be. Statistics is the grammar of science. Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science> On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com> wrote:
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData? (I believe GenomicRanges should inherit from DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.) I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.
Hope this helps.
Best wishes,
Laurent
Thanks and best wishes
Wolfgang
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319
OK, GRanges as vector that does overlap stuff makes sense, but I think putting a DataFrame of metadata on that confuses the purpose of the object. How about a "GRangesTable" that inherits from both GenomicRanges and DataTable? It would be a DataFrame with a fancy index. The DataFrame API would make stuff like colnames work (rather than needing colnames(mcols(x)) ). If this were used as the rowData for SummarizedExperiment, then a plain DataFrame could be made to work too. Pete ____________________ Peter M. Haverty, Ph.D. Genentech, Inc. phaverty at gene.com On Wed, Nov 26, 2014 at 9:33 AM, Michael Lawrence <lawrence.michael at gene.com
wrote:
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com> wrote:
Hi all, I believe there is a strong need for an object that organizes a collection of rectangular data (matrices, etc.) with metadata on the rows and columns. Can SummarizedExperiment inherit from something simpler that has a DataFrame as rowData?
(I believe GenomicRanges should inherit from
DataTable, rather than Vector, and subset as x[i,j], but maybe that's getting a bit off topic.)
Have to disagree on that. A GRanges is a vector of ranges; a table is a list of vectors all of the same length. Different things. There was a lot of thought invested in that. But it does subset as x[i,j], so in theory SummarizedExperiment could be generalized to contain something with the contract of 2D extraction.
I often see people stuffing arbitrary data into an ExpressionSet and calling one of the assays "exprs" as a work-around. Regards, Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.
Hope this helps.
Best wishes,
Laurent
Thanks and best wishes
Wolfgang
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
The two objects have conflicting APIs. For example, 1D extraction indexes into the ranges for a GRanges, but into the columns for a table. So I would not recommend multiple inheritance. Instead, define something new with the semantics you want and use composition. Maybe just a subclass of DataFrame that adds a GenomicRanges slot. On Wed, Nov 26, 2014 at 1:55 PM, Peter Haverty <haverty.peter at gene.com> wrote:
OK, GRanges as vector that does overlap stuff makes sense, but I think putting a DataFrame of metadata on that confuses the purpose of the object. How about a "GRangesTable" that inherits from both GenomicRanges and DataTable? It would be a DataFrame with a fancy index. The DataFrame API would make stuff like colnames work (rather than needing colnames(mcols(x)) ). If this were used as the rowData for SummarizedExperiment, then a plain DataFrame could be made to work too. Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 9:33 AM, Michael Lawrence <
lawrence.michael at gene.com> wrote:
On Wed, Nov 26, 2014 at 9:07 AM, Peter Haverty <haverty.peter at gene.com>
wrote:
Hi all,
I believe there is a strong need for an object that organizes a
collection
of rectangular data (matrices, etc.) with metadata on the rows and
columns. Can SummarizedExperiment inherit from something simpler that
has
a DataFrame as rowData?
(I believe GenomicRanges should inherit from
DataTable, rather than Vector, and subset as x[i,j], but maybe that's
getting a bit off topic.)
Have to disagree on that. A GRanges is a vector of ranges; a table is a
list of vectors all of the same length. Different things. There was a lot
of thought invested in that. But it does subset as x[i,j], so in theory
SummarizedExperiment could be generalized to contain something with the
contract of 2D extraction.
I often see people stuffing arbitrary data into
an ExpressionSet and calling one of the assays "exprs" as a work-around.
Regards,
Pete
____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phaverty at gene.com
On Wed, Nov 26, 2014 at 7:19 AM, Laurent Gatto <lg390 at cam.ac.uk> wrote:
On 26 November 2014 14:59, Wolfgang Huber wrote:
A colleague and I are designing a package for quantitative proteomics
data, and we are debating whether to base it on the
SummarizedExperiment or the ExpressionSet class.
There is no immediate use for the ranges aspect of
SummarizedExperiment, so that would have to be carried around with
NAs, and this is a parsimony argument for using ExpressionSet
instead. OTOH, the interface of SummarizedExperiment is cleaner, its
code more modern and more likely to be updated, and users of the
Bioconductor project are likely to benefit from having to deal with a
single interface that works the same or similarly across packages,
rather than a variety of formats; which argues that new packages
should converge towards SummarizedExperiment('s interface).
Are there any pertinent insights from this group?
Instead of ExpressionSet, you could use MSnbase::MSnSet, which is
essentially an ExpressionSet for quantitative proteomics (i.e it has a
MIAPE slot, instead of MIAME for example).
Ideally, a SummarizedExperiment for proteomics would use
peptide/protein
ranges, which is in the pipeline, as far as I am concerned. When that
becomes available, there should be infrastructure to coerce and MSnSet
(and/or other relevant data) into an SummarizedExperiment.
Hope this helps.
Best wishes,
Laurent
Thanks and best wishes
Wolfgang
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Laurent Gatto
http://cpu.sysbiol.cam.ac.uk/
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________
Bioc-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel