Skip to content

[Bioc-devel] Modeling (statistic, p-value) pairs in MultiAssayExperiment

13 messages · Vincent Carey, Levi Waldron, Gmail +2 more

#
Hi,

I'm trying to build a MultiAssayExperiment. However, in my case each
assay should ideally include two matrices: one with a statistic and
another one with the corresponding p-value. I'm currently managing
each of them simply as a list of two matrices, but assay class expects
table-like data. I must also be able to quickly extract entire rows or
columns from each matrix.

Is there a suitable way to model this into a MultiAssayExperiment?

Thank you,
Francesco
#
no answers yet?  would it work to put your matrices as separate assays in a
SummarizedExperiment?
as long as they are conformant in dimensions and dimnames I think that
would work.  That
SummarizedExperiment would then work well in an MAE.

On Mon, Oct 23, 2017 at 1:00 PM, Francesco Napolitano <franapoli at gmail.com>
wrote:

  
  
#
Are you discussing statistics of the same dimension as the data (unusual)
or summary statistics? We should think about a MAE version of summary
statistics, but that is not captured in current representation I would say.

Best,
Kasper

On Mon, Oct 23, 2017 at 4:50 PM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

  
  
#
Hi,

thanks for the suggestion. The point is that I have multiple assays, each
of which is made of statistic-p-value pairs. Within each assay, the two
matrices have of course same rows and columns, but different assays will
have different rows (same columns). So I should flatten everything and
model both the matrices for the same experiment and the matrices for
different experiments all as different assays of a MultiAssay, which sounds
rather stretched, or doesn't it?

Francesco

Il 23/10/2017 22:50, Vincent Carey ha scritto:

no answers yet?  would it work to put your matrices as separate assays in a
SummarizedExperiment?
as long as they are conformant in dimensions and dimnames I think that
would work.  That
SummarizedExperiment would then work well in an MAE.

On Mon, Oct 23, 2017 at 1:00 PM, Francesco Napolitano <franapoli at gmail.com>
wrote:

  
  
#
I'm converting gene expression profiles to "pathway expression
profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each
pathway I have an enrichment score and a p-value. I guess it would be
like modeling gene expression data where limma-like preprocessing was
performed, so you have a fold change - p-value pair for each gene.
Isn't there a data model for that?

thanks,
Francesco


On Tue, Oct 24, 2017 at 3:15 AM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
#
Just realized my answer yesterday went to Francesco and not the list:

Since it sounds like you have two matrices of the same dimensions, why not
represent these as two assays in a SummarizedExperiment?  E.g.:
class: SummarizedExperiment
dim: 20 5
metadata(0):
assays(2): statvals pvals
rownames: NULL
rowData names(0):
colnames: NULL
colData names(0):
If you then have more than one of these, with different dimensions, then
MultiAssayExperiment would be of use to you.

(PS: this question is probably better suited for support.bioconductor.org)
On Oct 23, 2017 4:50 PM, "Vincent Carey" <stvjc at channing.harvard.edu> wrote:

            

  
  
#
On Oct 24, 2017 6:14 AM, "Francesco Napolitano" <franapoli at gmail.com> wrote:
I'm converting gene expression profiles to "pathway expression
profiles" (https://doi.org/10.1093/bioinformatics/btv536), so for each
pathway I have an enrichment score and a p-value. I guess it would be
like modeling gene expression data where limma-like preprocessing was
performed, so you have a fold change - p-value pair for each gene.
Isn't there a data model for that?


Nice paper, thanks for the link! Could you explain the problem a little
more using the terminology of your paper? I see your enrichment values
matrix (fig 1c *ES*ij) of pathways x cell lines, and imagine additional
associated matrices of p-values and ranks, but where do assays with
different rows come in?
#
Thank you!

Fig 1 shows the pipeline for a single database of pathways, but we
used 10 different databases (GO, KEGG, Reactome...). Currently we use
all of MSigDB, which includes 24 subcategories, and we have a matrix
of ES and a matrix of pvalues for each. You always have the same drugs
over columns, but different pathways over rows. Keeping them separated
is necessary (you don't want to rank pathways across unrelated
databases). On the other hand, if I build one SummarizedExperiment for
each database, I have to replicate the common metadata across all of
them, and also lose most of the features that going through the burden
of modeling my data with SE were all about :-/.

Note I'm considering all this for a package under review to possibly
improve its interoperability with existing packages.


On Tue, Oct 24, 2017 at 2:45 PM, Levi Waldron
<lwaldron.research at gmail.com> wrote:
#
OK, I think I'm understanding better now. The best immediate solution that
I can think of is a SummarizedExperiment for each signatures database, then
pasting those SummarizedExperiments together with a MultiAssayExperiment.
Something like this:

set.seed(1)
statvals <- matrix(rnorm(100), ncol=5)
rownames(statvals) <- paste0("pathway", 1:nrow(statvals))
colnames(statvals) <- paste0("cell", 1:ncol(statvals))
pvals <- pnorm(statvals)

coldat <- DataFrame(name=letters[1:ncol(statvals)])
rownames(coldat) <- colnames(statvals)

library(SummarizedExperiment)
se1 <- SummarizedExperiment(list(statvals = statvals[1:12, ], pvals =
pvals[1:12, ]))
se2 <- SummarizedExperiment(list(statvals = statvals[13:20, ], pvals =
pvals[13:20, ]))
library(MultiAssayExperiment)
mae <- MultiAssayExperiment(list(database1=se1, database2=se2),
                            colData=coldat)

Then you can extract with assays() or integrate with wideFormat(), examples
below. The wideFormat example currently only extracts the statvals but you
should be able to select between assays for wideFormat too; I've just
opened an issue
<https://github.com/waldronlab/MultiAssayExperiment/issues/221> for this.
names(2): database1 database2> assays(mae, i="pvals")List of length 2
names(2): database1 database2> head(assays(mae,
i="pvals")[["database2"]])               cell1      cell2     cell3
 cell4     cell5
pathway13 0.26722067 0.65087047 0.6334933 0.7293096 0.8770575
pathway14 0.01339034 0.47854525 0.1293723 0.1751268 0.7581031
pathway15 0.86969085 0.08424692 0.9240745 0.1049876 0.9437248
pathway16 0.48208011 0.33907294 0.9761707 0.6146450 0.7117439
pathway17 0.49354130 0.34668349 0.3567269 0.3287773 0.1008731
pathway18 0.82737332 0.47635125 0.1482116 0.5004410 0.2832325
primary        name database1_pathway1 database2_pathway13
  <factor> <character>          <numeric>           <numeric>
1    cell1           a         -0.6264538          -0.6212406
2    cell2           b          0.9189774           0.3876716
3    cell3           c         -0.1645236           0.3411197
4    cell4           d          2.4016178           0.6107264
5    cell5           e         -0.5686687           1.1604026


On Tue, Oct 24, 2017 at 9:43 AM, Francesco Napolitano <franapoli at gmail.com>
wrote:

  
    
#
On Mon, Oct 23, 2017 at 9:15 PM, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote:

            
What do you have in mind Kasper? I assumed that summary statistics could
usually be kept in the rowData of a SummarizedExperiment.
#
That's great help, Levi, I will try your suggestions.

thank you,

francesco

Il 25/10/2017 00:28, Levi Waldron ha scritto:

OK, I think I'm understanding better now. The best immediate solution that
I can think of is a SummarizedExperiment for each signatures database, then
pasting those SummarizedExperiments together with a MultiAssayExperiment.
Something like this:

set.seed(1)
statvals <- matrix(rnorm(100), ncol=5)
rownames(statvals) <- paste0("pathway", 1:nrow(statvals))
colnames(statvals) <- paste0("cell", 1:ncol(statvals))
pvals <- pnorm(statvals)

coldat <- DataFrame(name=letters[1:ncol(statvals)])
rownames(coldat) <- colnames(statvals)

library(SummarizedExperiment)
se1 <- SummarizedExperiment(list(statvals = statvals[1:12, ], pvals =
pvals[1:12, ]))
se2 <- SummarizedExperiment(list(statvals = statvals[13:20, ], pvals =
pvals[13:20, ]))
library(MultiAssayExperiment)
mae <- MultiAssayExperiment(list(database1=se1, database2=se2),
                            colData=coldat)

Then you can extract with assays() or integrate with wideFormat(), examples
below. The wideFormat example currently only extracts the statvals but you
should be able to select between assays for wideFormat too; I've just
opened an issue
<https://github.com/waldronlab/MultiAssayExperiment/issues/221> for this.
names(2): database1 database2> assays(mae, i="pvals")List of length 2
names(2): database1 database2> head(assays(mae,
i="pvals")[["database2"]])               cell1      cell2     cell3
 cell4     cell5
pathway13 0.26722067 0.65087047 0.6334933 0.7293096 0.8770575
pathway14 0.01339034 0.47854525 0.1293723 0.1751268 0.7581031
pathway15 0.86969085 0.08424692 0.9240745 0.1049876 0.9437248
pathway16 0.48208011 0.33907294 0.9761707 0.6146450 0.7117439
pathway17 0.49354130 0.34668349 0.3567269 0.3287773 0.1008731
pathway18 0.82737332 0.47635125 0.1482116 0.5004410 0.2832325
primary        name database1_pathway1 database2_pathway13
  <factor> <character>          <numeric>           <numeric>
1    cell1           a         -0.6264538          -0.6212406
2    cell2           b          0.9189774           0.3876716
3    cell3           c         -0.1645236           0.3411197
4    cell4           d          2.4016178           0.6107264
5    cell5           e         -0.5686687           1.1604026


On Tue, Oct 24, 2017 at 9:43 AM, Francesco Napolitano <franapoli at gmail.com>
wrote:

  
    
#
I think analysis of multiassay experiments often will consists of
integration following assay-specific models. Not necessarily, but it will
be a usecase. Organizing multiple model fits together could be useful, for
downstream comparison / integration.

Say you find DMRs and DE genes. Now you want to do something with them.
Right now we have multiple objects floating around. Would it be useful to
have a collector for this?

Best,
Kasper

On Tue, Oct 24, 2017 at 6:30 PM, Levi Waldron <lwaldron.research at gmail.com>
wrote:

  
  
#
Model results could be stored in another SE. The contrasts are treated as
samples, and stuff like p-values, effect sizes, etc as assays. Question is
whether those should just be tacked onto a MAE, or kept as separate
objects, or stored along with the MAE in a larger analysis-level workflow
object.

On Wed, Oct 25, 2017 at 8:57 AM, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com> wrote: