[Bioc-devel] requirement for named assays in SummarizedExperiment
Hi, After talking with others the vote was against enforcing names on assays() and for positional matching if all names are NULL. A mixture of names and NULL throws an error. example(SummarizedExperiment) ## all named > se2 = se1 > assays(cbind(se1, se2)) List of length 1 names(1): counts ## mixture of names and NULL -> error > names(assays(se1)) = NULL > assays(cbind(se1, se2)) Error in assays(cbind(se1, se2)) : error in evaluating the argument 'x' in selecting a method for function 'assays': Error in .bind.arrays(args, cbind, "assays") : elements in ?assays? must have the same names ## all NULL -> positional matching > names(assays(se2)) = NULL > assays(cbind(se1, se2)) List of length 1 If we find common use cases where positional matching is needed with a mixture of names and NULL we can always relax this constraint. Changes are in 1.19.46. Valerie
On 03/06/2015 08:20 AM, Valerie Obenchain wrote:
Hi Aaron, Thanks for catching this. I favor enforcing names in 'assays'. Combining by position alone is too dangerous. I'm thinking of the VCF class where the genome information is stored in 'assays' and the fields are rarely in the same order. Looks like we also need a more informative error message when names don't match.
> assays(se1)
List of length 1 names(1): counts1
> assays(se2)
List of length 1 names(1): counts2
> cbind(se1, se2)
Error in sQuote(accessorName) : argument "accessorName" is missing, with no default Valerie On 03/05/2015 11:09 PM, Aaron Lun wrote:
Dear all, I stumbled upon some unexpected behaviour with cbind'ing SummarizedExperiment objects with unnamed assays:
require(GenomicRanges)
nrows <- 5; ncols <- 4
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowData <- GRanges("chr1", IRanges(1:nrows, 1:nrows))
colData <- DataFrame(Treatment=1:ncols, row.names=LETTERS[1:ncols])
sset <- SummarizedExperiment(counts, rowData=rowData, colData=colData)
sset
class: SummarizedExperiment dim: 5 4 exptData(0): assays(1): '' rownames: NULL rowData metadata column names(0): colnames(4): A B C D colData names(1): Treatment
cbind(sset, sset)
dim: 5 8 exptData(0): assays(0): rownames: NULL rowData metadata column names(0): colnames(8): A B ... C1 D1 colData names(1): Treatment Upon cbind'ing, the assays in the SE object are lost. I think this is due to the fact that the cbind code matches up assays by their names. Thus, if there are no names, the code assumes that there are no assays. I guess this could be prevented by enforcing naming of assays in the SummarizedExperiment constructor. Or, the binding code could be modified to work positionally when there are no assay names, e.g., by cbind'ing the first assays across all SE objects, then the second assays, etc. Any thoughts? Regards, Aaron
sessionInfo()
R Under development (unstable) (2014-12-14 r67167) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] GenomicRanges_1.19.42 GenomeInfoDb_1.3.13 IRanges_2.1.41 [4] S4Vectors_0.5.21 BiocGenerics_0.13.6 loaded via a namespace (and not attached): [1] XVector_0.7.4
______________________________________________________________________
The information in this email is confidential and inte...{{dropped:15}}
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, Seattle, WA 98109 Email: vobencha at fredhutch.org Phone: (206) 667-3158