[Bioc-devel] SummarizedExperiments
For symmetry, could we get granges<- added? It is confusing that granges() work, but not the replacement function. Thanks, Kasper On Wed, Sep 19, 2012 at 3:17 PM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:
For extending SummarizedExperiments it would be convenient to have something like Biobase::assayDataValidMembers We might also consider putting Biobase::validMsg into BiocGenerics. Kasper On Fri, Sep 14, 2012 at 5:19 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 09/14/2012 11:46 AM, Kasper Daniel Hansen wrote:
On Fri, Sep 14, 2012 at 2:25 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
For what it's worth I already wrote a CombineSEwithNAs() function to do this on the disjoint ranges for RRBS. It assumes that there isn't any additional colData or elementMetadata of interest (for reasons that will become clear) and further assumes that the user will want to smooth.
I have one in bsseq as well (as I said earlier), but this would still be nice to think about in the most general case possible.
seqlevels, seqlengths, genome are implemented via seqinfo as of the latest GenomicRanges package (went in some time ago, thanks to MM):
There is something I am missing here. It clearly works. But
showMethods("genome") tells me that methods are defined for Any,
Seqinfo, but if I use
example(SummarizedExperiment)
to get sset defined, I still get
is(sset, "Seqinfo")
to be FALSE. I thought this would check for inheritance.
The 'ANY' method on genome is implemented so that if you have a seqinfo,SummarizedExperiment-method, you get 'genome' for free. Another example is 'rownames' and 'colnames', which are provided for free when a dimnames,SummarizedExperiment-method is defined.
selectMethod("genome", "SummarizedExperiment")
Method Definition:
function (x)
genome(seqinfo(x))
<environment: namespace:GenomicRanges>
Signatures:
x
target "SummarizedExperiment"
defined "ANY"
Martin
R> head(genome(LAML))
chr1 chr10 chr11 chr12 chr13 chr14
"hg19" "hg19" "hg19" "hg19" "hg19" "hg19"
R> head(seqlengths(LAML))
chr1 chr10 chr11 chr12 chr13 chr14
249250621 135534747 135006516 133851895 115169878 107349540
I wrote a trivial addSeqinfo(x) function that, given a genome, will
populate
an SE's seqinfo automatically from a BSgenome (if there is one). The
function calls
rtracklayer:::SeqinfoForBSGenome(unique(na.omit(genome(x))))[seqlevels(x)]
to get the correct information.
I hate the fact that there can be NA or differing genomes specified
per-chromosome for SummarizedExperiments. It makes me sad.
I don't like you can mix hg18/hg19 but on the other hand we routinely spike in lambda phage and that is not really part of the human genome.
On Fri, Sep 14, 2012 at 10:54 AM, Kasper Daniel Hansen <kasperdanielhansen at gmail.com> wrote:
Thanks for all the additional methods. I still miss seqlevels, seqlengths, genome Below, On Wed, Sep 12, 2012 at 3:33 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:
On 09/12/2012 12:15 PM, Kasper Daniel Hansen wrote:
One thing I have in my package that I find indispensable is combine and (my own) combineList. The later for combining > 2 objects, which has a lot of possibilities for speed up especially if (very common) all the objects have the same rowData, as opposed to Reduce(combine, LIST).. Usecase: you need to add additional samples to your SummarizedExperiment.
I found it difficult in Biobase to write combine methods for eSet, where you're really requiring a lot from the user (about the phenoData / featureData structured in the same way) or going through contortions to make it the same in a reasonable-but-ad-hoc way (e.g., when two columns are factors with the same set of levels but encoded differently). Maybe the effort required is proportional to the utility of the function provided... I'll give it some more thought.
In the abstract case it is hard to imagine combining different SummarizedExperiments. My usecase is almost always "additional samples from the same experiment", and for that situation it is a lot easier to imagine combining it. You still need to check that the granges are similar (and if not, expand some of the assayData with zeroes or NA's), since the new samples may have coverage in locations not assayed earlier. Clearly factors are hard to handle and I assume there are other hard to handle cases. Nevertheless, I find such a function incredibly useful. I think it is entirely ok to assume that the user knows what (s)he is doing. Kasper
-- A model is a lie that helps you see the truth. Howard Skipper
-- Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793