Hi, I'm trying to build a MultiAssayExperiment. However, in my case each assay should ideally include two matrices: one with a statistic and another one with the corresponding p-value. I'm currently managing each of them simply as a list of two matrices, but assay class expects table-like data. I must also be able to quickly extract entire rows or columns from each matrix. Is there a suitable way to model this into a MultiAssayExperiment? Thank you, Francesco
[Bioc-devel] Modeling (statistic, p-value) pairs in MultiAssayExperiment
13 messages · Vincent Carey, Levi Waldron, Gmail +2 more
no answers yet? would it work to put your matrices as separate assays in a SummarizedExperiment? as long as they are conformant in dimensions and dimnames I think that would work. That SummarizedExperiment would then work well in an MAE. On Mon, Oct 23, 2017 at 1:00 PM, Francesco Napolitano <franapoli at gmail.com> wrote:
Hi, I'm trying to build a MultiAssayExperiment. However, in my case each assay should ideally include two matrices: one with a statistic and another one with the corresponding p-value. I'm currently managing each of them simply as a list of two matrices, but assay class expects table-like data. I must also be able to quickly extract entire rows or columns from each matrix. Is there a suitable way to model this into a MultiAssayExperiment? Thank you, Francesco
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Are you discussing statistics of the same dimension as the data (unusual) or summary statistics? We should think about a MAE version of summary statistics, but that is not captured in current representation I would say. Best, Kasper On Mon, Oct 23, 2017 at 4:50 PM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
no answers yet? would it work to put your matrices as separate assays in a SummarizedExperiment? as long as they are conformant in dimensions and dimnames I think that would work. That SummarizedExperiment would then work well in an MAE. On Mon, Oct 23, 2017 at 1:00 PM, Francesco Napolitano <franapoli at gmail.com
wrote:
Hi, I'm trying to build a MultiAssayExperiment. However, in my case each assay should ideally include two matrices: one with a statistic and another one with the corresponding p-value. I'm currently managing each of them simply as a list of two matrices, but assay class expects table-like data. I must also be able to quickly extract entire rows or columns from each matrix. Is there a suitable way to model this into a MultiAssayExperiment? Thank you, Francesco
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi, thanks for the suggestion. The point is that I have multiple assays, each of which is made of statistic-p-value pairs. Within each assay, the two matrices have of course same rows and columns, but different assays will have different rows (same columns). So I should flatten everything and model both the matrices for the same experiment and the matrices for different experiments all as different assays of a MultiAssay, which sounds rather stretched, or doesn't it? Francesco Il 23/10/2017 22:50, Vincent Carey ha scritto: no answers yet? would it work to put your matrices as separate assays in a SummarizedExperiment? as long as they are conformant in dimensions and dimnames I think that would work. That SummarizedExperiment would then work well in an MAE. On Mon, Oct 23, 2017 at 1:00 PM, Francesco Napolitano <franapoli at gmail.com> wrote:
Hi, I'm trying to build a MultiAssayExperiment. However, in my case each assay should ideally include two matrices: one with a statistic and another one with the corresponding p-value. I'm currently managing each of them simply as a list of two matrices, but assay class expects table-like data. I must also be able to quickly extract entire rows or columns from each matrix. Is there a suitable way to model this into a MultiAssayExperiment? Thank you, Francesco
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
I'm converting gene expression profiles to "pathway expression profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each pathway I have an enrichment score and a p-value. I guess it would be like modeling gene expression data where limma-like preprocessing was performed, so you have a fold change - p-value pair for each gene. Isn't there a data model for that? thanks, Francesco On Tue, Oct 24, 2017 at 3:15 AM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
Are you discussing statistics of the same dimension as the data (unusual) or summary statistics? We should think about a MAE version of summary statistics, but that is not captured in current representation I would say. Best, Kasper On Mon, Oct 23, 2017 at 4:50 PM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
no answers yet? would it work to put your matrices as separate assays in a SummarizedExperiment? as long as they are conformant in dimensions and dimnames I think that would work. That SummarizedExperiment would then work well in an MAE. On Mon, Oct 23, 2017 at 1:00 PM, Francesco Napolitano <franapoli at gmail.com> wrote:
Hi, I'm trying to build a MultiAssayExperiment. However, in my case each assay should ideally include two matrices: one with a statistic and another one with the corresponding p-value. I'm currently managing each of them simply as a list of two matrices, but assay class expects table-like data. I must also be able to quickly extract entire rows or columns from each matrix. Is there a suitable way to model this into a MultiAssayExperiment? Thank you, Francesco
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Just realized my answer yesterday went to Francesco and not the list: Since it sounds like you have two matrices of the same dimensions, why not represent these as two assays in a SummarizedExperiment? E.g.:
statvals <- matrix(rnorm(100), ncol=5) pvals <- pnorm(statvals) library(SummarizedExperiment) se <- SummarizedExperiment(list(statvals = statvals, pvals = pvals)) se
class: SummarizedExperiment dim: 20 5 metadata(0): assays(2): statvals pvals rownames: NULL rowData names(0): colnames: NULL colData names(0):
If you then have more than one of these, with different dimensions, then MultiAssayExperiment would be of use to you. (PS: this question is probably better suited for support.bioconductor.org)
On Oct 23, 2017 4:50 PM, "Vincent Carey" <stvjc at channing.harvard.edu> wrote:
no answers yet? would it work to put your matrices as separate assays in a SummarizedExperiment? as long as they are conformant in dimensions and dimnames I think that would work. That SummarizedExperiment would then work well in an MAE. On Mon, Oct 23, 2017 at 1:00 PM, Francesco Napolitano <franapoli at gmail.com
wrote:
Hi, I'm trying to build a MultiAssayExperiment. However, in my case each assay should ideally include two matrices: one with a statistic and another one with the corresponding p-value. I'm currently managing each of them simply as a list of two matrices, but assay class expects table-like data. I must also be able to quickly extract entire rows or columns from each matrix. Is there a suitable way to model this into a MultiAssayExperiment? Thank you, Francesco
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Oct 24, 2017 6:14 AM, "Francesco Napolitano" <franapoli at gmail.com> wrote:
I'm converting gene expression profiles to "pathway expression profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each pathway I have an enrichment score and a p-value. I guess it would be like modeling gene expression data where limma-like preprocessing was performed, so you have a fold change - p-value pair for each gene. Isn't there a data model for that? Nice paper, thanks for the link! Could you explain the problem a little more using the terminology of your paper? I see your enrichment values matrix (fig 1c *ES*ij) of pathways x cell lines, and imagine additional associated matrices of p-values and ranks, but where do assays with different rows come in?
Thank you! Fig 1 shows the pipeline for a single database of pathways, but we used 10 different databases (GO, KEGG, Reactome...). Currently we use all of MSigDB, which includes 24 subcategories, and we have a matrix of ES and a matrix of pvalues for each. You always have the same drugs over columns, but different pathways over rows. Keeping them separated is necessary (you don't want to rank pathways across unrelated databases). On the other hand, if I build one SummarizedExperiment for each database, I have to replicate the common metadata across all of them, and also lose most of the features that going through the burden of modeling my data with SE were all about :-/. Note I'm considering all this for a package under review to possibly improve its interoperability with existing packages. On Tue, Oct 24, 2017 at 2:45 PM, Levi Waldron
<lwaldron.research at gmail.com> wrote:
On Oct 24, 2017 6:14 AM, "Francesco Napolitano" <franapoli at gmail.com> wrote: I'm converting gene expression profiles to "pathway expression profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each pathway I have an enrichment score and a p-value. I guess it would be like modeling gene expression data where limma-like preprocessing was performed, so you have a fold change - p-value pair for each gene. Isn't there a data model for that? Nice paper, thanks for the link! Could you explain the problem a little more using the terminology of your paper? I see your enrichment values matrix (fig 1c ESij) of pathways x cell lines, and imagine additional associated matrices of p-values and ranks, but where do assays with different rows come in?
OK, I think I'm understanding better now. The best immediate solution that
I can think of is a SummarizedExperiment for each signatures database, then
pasting those SummarizedExperiments together with a MultiAssayExperiment.
Something like this:
set.seed(1)
statvals <- matrix(rnorm(100), ncol=5)
rownames(statvals) <- paste0("pathway", 1:nrow(statvals))
colnames(statvals) <- paste0("cell", 1:ncol(statvals))
pvals <- pnorm(statvals)
coldat <- DataFrame(name=letters[1:ncol(statvals)])
rownames(coldat) <- colnames(statvals)
library(SummarizedExperiment)
se1 <- SummarizedExperiment(list(statvals = statvals[1:12, ], pvals =
pvals[1:12, ]))
se2 <- SummarizedExperiment(list(statvals = statvals[13:20, ], pvals =
pvals[13:20, ]))
library(MultiAssayExperiment)
mae <- MultiAssayExperiment(list(database1=se1, database2=se2),
colData=coldat)
Then you can extract with assays() or integrate with wideFormat(), examples
below. The wideFormat example currently only extracts the statvals but you
should be able to select between assays for wideFormat too; I've just
opened an issue
<https://github.com/waldronlab/MultiAssayExperiment/issues/221> for this.
assays(mae, i="statvals")List of length 2
names(2): database1 database2> assays(mae, i="pvals")List of length 2 names(2): database1 database2> head(assays(mae, i="pvals")[["database2"]]) cell1 cell2 cell3 cell4 cell5 pathway13 0.26722067 0.65087047 0.6334933 0.7293096 0.8770575 pathway14 0.01339034 0.47854525 0.1293723 0.1751268 0.7581031 pathway15 0.86969085 0.08424692 0.9240745 0.1049876 0.9437248 pathway16 0.48208011 0.33907294 0.9761707 0.6146450 0.7117439 pathway17 0.49354130 0.34668349 0.3567269 0.3287773 0.1008731 pathway18 0.82737332 0.47635125 0.1482116 0.5004410 0.2832325
(res <- wideFormat(mae[1, , ], colDataCols="name"))DataFrame with 5 rows and 4 columns
primary name database1_pathway1 database2_pathway13 <factor> <character> <numeric> <numeric> 1 cell1 a -0.6264538 -0.6212406 2 cell2 b 0.9189774 0.3876716 3 cell3 c -0.1645236 0.3411197 4 cell4 d 2.4016178 0.6107264 5 cell5 e -0.5686687 1.1604026 On Tue, Oct 24, 2017 at 9:43 AM, Francesco Napolitano <franapoli at gmail.com> wrote:
Thank you! Fig 1 shows the pipeline for a single database of pathways, but we used 10 different databases (GO, KEGG, Reactome...). Currently we use all of MSigDB, which includes 24 subcategories, and we have a matrix of ES and a matrix of pvalues for each. You always have the same drugs over columns, but different pathways over rows. Keeping them separated is necessary (you don't want to rank pathways across unrelated databases). On the other hand, if I build one SummarizedExperiment for each database, I have to replicate the common metadata across all of them, and also lose most of the features that going through the burden of modeling my data with SE were all about :-/. Note I'm considering all this for a package under review to possibly improve its interoperability with existing packages. On Tue, Oct 24, 2017 at 2:45 PM, Levi Waldron <lwaldron.research at gmail.com> wrote:
On Oct 24, 2017 6:14 AM, "Francesco Napolitano" <franapoli at gmail.com>
wrote:
I'm converting gene expression profiles to "pathway expression profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each pathway I have an enrichment score and a p-value. I guess it would be like modeling gene expression data where limma-like preprocessing was performed, so you have a fold change - p-value pair for each gene. Isn't there a data model for that? Nice paper, thanks for the link! Could you explain the problem a little
more
using the terminology of your paper? I see your enrichment values matrix (fig 1c ESij) of pathways x cell lines, and imagine additional associated matrices of p-values and ranks, but where do assays with different rows
come
in?
Levi Waldron http://www.waldronlab.org Assistant Professor of Biostatistics CUNY School of Public Health US: +1 646-364-9616 Skype: levi.waldron [[alternative HTML version deleted]]
On Mon, Oct 23, 2017 at 9:15 PM, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:
Are you discussing statistics of the same dimension as the data (unusual) or summary statistics? We should think about a MAE version of summary statistics, but that is not captured in current representation I would say.
What do you have in mind Kasper? I assumed that summary statistics could usually be kept in the rowData of a SummarizedExperiment.
That's great help, Levi, I will try your suggestions.
thank you,
francesco
Il 25/10/2017 00:28, Levi Waldron ha scritto:
OK, I think I'm understanding better now. The best immediate solution that
I can think of is a SummarizedExperiment for each signatures database, then
pasting those SummarizedExperiments together with a MultiAssayExperiment.
Something like this:
set.seed(1)
statvals <- matrix(rnorm(100), ncol=5)
rownames(statvals) <- paste0("pathway", 1:nrow(statvals))
colnames(statvals) <- paste0("cell", 1:ncol(statvals))
pvals <- pnorm(statvals)
coldat <- DataFrame(name=letters[1:ncol(statvals)])
rownames(coldat) <- colnames(statvals)
library(SummarizedExperiment)
se1 <- SummarizedExperiment(list(statvals = statvals[1:12, ], pvals =
pvals[1:12, ]))
se2 <- SummarizedExperiment(list(statvals = statvals[13:20, ], pvals =
pvals[13:20, ]))
library(MultiAssayExperiment)
mae <- MultiAssayExperiment(list(database1=se1, database2=se2),
colData=coldat)
Then you can extract with assays() or integrate with wideFormat(), examples
below. The wideFormat example currently only extracts the statvals but you
should be able to select between assays for wideFormat too; I've just
opened an issue
<https://github.com/waldronlab/MultiAssayExperiment/issues/221> for this.
assays(mae, i="statvals")List of length 2
names(2): database1 database2> assays(mae, i="pvals")List of length 2 names(2): database1 database2> head(assays(mae, i="pvals")[["database2"]]) cell1 cell2 cell3 cell4 cell5 pathway13 0.26722067 0.65087047 0.6334933 0.7293096 0.8770575 pathway14 0.01339034 0.47854525 0.1293723 0.1751268 0.7581031 pathway15 0.86969085 0.08424692 0.9240745 0.1049876 0.9437248 pathway16 0.48208011 0.33907294 0.9761707 0.6146450 0.7117439 pathway17 0.49354130 0.34668349 0.3567269 0.3287773 0.1008731 pathway18 0.82737332 0.47635125 0.1482116 0.5004410 0.2832325
(res <- wideFormat(mae[1, , ], colDataCols="name"))DataFrame with 5 rows and 4 columns
primary name database1_pathway1 database2_pathway13 <factor> <character> <numeric> <numeric> 1 cell1 a -0.6264538 -0.6212406 2 cell2 b 0.9189774 0.3876716 3 cell3 c -0.1645236 0.3411197 4 cell4 d 2.4016178 0.6107264 5 cell5 e -0.5686687 1.1604026 On Tue, Oct 24, 2017 at 9:43 AM, Francesco Napolitano <franapoli at gmail.com> wrote:
Thank you! Fig 1 shows the pipeline for a single database of pathways, but we used 10 different databases (GO, KEGG, Reactome...). Currently we use all of MSigDB, which includes 24 subcategories, and we have a matrix of ES and a matrix of pvalues for each. You always have the same drugs over columns, but different pathways over rows. Keeping them separated is necessary (you don't want to rank pathways across unrelated databases). On the other hand, if I build one SummarizedExperiment for each database, I have to replicate the common metadata across all of them, and also lose most of the features that going through the burden of modeling my data with SE were all about :-/. Note I'm considering all this for a package under review to possibly improve its interoperability with existing packages. On Tue, Oct 24, 2017 at 2:45 PM, Levi Waldron <lwaldron.research at gmail.com> wrote:
On Oct 24, 2017 6:14 AM, "Francesco Napolitano" <franapoli at gmail.com>
wrote:
I'm converting gene expression profiles to "pathway expression profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each pathway I have an enrichment score and a p-value. I guess it would be like modeling gene expression data where limma-like preprocessing was performed, so you have a fold change - p-value pair for each gene. Isn't there a data model for that? Nice paper, thanks for the link! Could you explain the problem a little
more
using the terminology of your paper? I see your enrichment values matrix (fig 1c ESij) of pathways x cell lines, and imagine additional associated matrices of p-values and ranks, but where do assays with different rows
come
in?
Levi Waldron http://www.waldronlab.org Assistant Professor of Biostatistics CUNY School of Public Health US: +1 646-364-9616 <(646)%20364-9616> Skype: levi.waldron [[alternative HTML version deleted]]
I think analysis of multiassay experiments often will consists of integration following assay-specific models. Not necessarily, but it will be a usecase. Organizing multiple model fits together could be useful, for downstream comparison / integration. Say you find DMRs and DE genes. Now you want to do something with them. Right now we have multiple objects floating around. Would it be useful to have a collector for this? Best, Kasper On Tue, Oct 24, 2017 at 6:30 PM, Levi Waldron <lwaldron.research at gmail.com> wrote:
On Mon, Oct 23, 2017 at 9:15 PM, Kasper Daniel Hansen < kasperdanielhansen at gmail.com> wrote:
Are you discussing statistics of the same dimension as the data (unusual) or summary statistics? We should think about a MAE version of summary statistics, but that is not captured in current representation I would say.
What do you have in mind Kasper? I assumed that summary statistics could usually be kept in the rowData of a SummarizedExperiment.
Model results could be stored in another SE. The contrasts are treated as samples, and stuff like p-values, effect sizes, etc as assays. Question is whether those should just be tacked onto a MAE, or kept as separate objects, or stored along with the MAE in a larger analysis-level workflow object. On Wed, Oct 25, 2017 at 8:57 AM, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:
I think analysis of multiassay experiments often will consists of integration following assay-specific models. Not necessarily, but it will be a usecase. Organizing multiple model fits together could be useful, for downstream comparison / integration. Say you find DMRs and DE genes. Now you want to do something with them. Right now we have multiple objects floating around. Would it be useful to have a collector for this? Best, Kasper On Tue, Oct 24, 2017 at 6:30 PM, Levi Waldron <lwaldron.research at gmail.com
wrote:
On Mon, Oct 23, 2017 at 9:15 PM, Kasper Daniel Hansen < kasperdanielhansen at gmail.com> wrote:
Are you discussing statistics of the same dimension as the data
(unusual)
or summary statistics? We should think about a MAE version of summary statistics, but that is not captured in current representation I would say.
What do you have in mind Kasper? I assumed that summary statistics could usually be kept in the rowData of a SummarizedExperiment.
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel