[Bioc-devel] SummarizedExperiments

Thu, Oct 4, 2012 11:13 AM

For symmetry, could we get
  granges<-
added?  It is confusing that granges() work, but not the replacement function.

Thanks,
Kasper

On Wed, Sep 19, 2012 at 3:17 PM, Kasper Daniel Hansen

<kasperdanielhansen at gmail.com> wrote:

For extending SummarizedExperiments it would be convenient to have
something like Biobase::assayDataValidMembers

We might also consider putting Biobase::validMsg into BiocGenerics.

Kasper

On Fri, Sep 14, 2012 at 5:19 PM, Martin Morgan <mtmorgan at fhcrc.org> wrote:

On 09/14/2012 11:46 AM, Kasper Daniel Hansen wrote:

On Fri, Sep 14, 2012 at 2:25 PM, Tim Triche, Jr. <tim.triche at gmail.com>
wrote:

For what it's worth I already wrote a CombineSEwithNAs() function to do
this
on the disjoint ranges for RRBS.  It assumes that there isn't any
additional
colData or elementMetadata of interest (for reasons that will become
clear)
and further assumes that the user will want to smooth.


I have one in bsseq as well (as I said earlier), but this would still
be nice to think about in the most general case possible.

seqlevels, seqlengths, genome are implemented via seqinfo as of the
latest
GenomicRanges package (went in some time ago, thanks to MM):


There is something I am missing here.  It clearly works.  But
showMethods("genome") tells me that methods are defined for Any,
Seqinfo, but if I use
   example(SummarizedExperiment)
to get sset defined, I still get
   is(sset, "Seqinfo")
to be FALSE.  I thought this would check for inheritance.


The 'ANY' method on genome is implemented so that if you have a
seqinfo,SummarizedExperiment-method, you get 'genome' for free. Another
example is 'rownames' and 'colnames', which are provided for free when a
dimnames,SummarizedExperiment-method is defined.

selectMethod("genome", "SummarizedExperiment")

Method Definition:

function (x)
genome(seqinfo(x))
<environment: namespace:GenomicRanges>

Signatures:
        x
target  "SummarizedExperiment"
defined "ANY"

Martin

R> head(genome(LAML))
   chr1  chr10  chr11  chr12  chr13  chr14
"hg19" "hg19" "hg19" "hg19" "hg19" "hg19"
R> head(seqlengths(LAML))
      chr1     chr10     chr11     chr12     chr13     chr14
249250621 135534747 135006516 133851895 115169878 107349540

I wrote a trivial addSeqinfo(x) function that, given a genome, will
populate
an SE's seqinfo automatically from a BSgenome (if there is one).  The
function calls

rtracklayer:::SeqinfoForBSGenome(unique(na.omit(genome(x))))[seqlevels(x)]
to get the correct information.

I hate the fact that there can be NA or differing genomes specified
per-chromosome for SummarizedExperiments.  It makes me sad.


I don't like you can mix hg18/hg19 but on the other hand we routinely
spike in lambda phage and that is not really part of the human genome.



On Fri, Sep 14, 2012 at 10:54 AM, Kasper Daniel Hansen
<kasperdanielhansen at gmail.com> wrote:


Thanks for all the additional methods.  I still miss
   seqlevels, seqlengths, genome

Below,

On Wed, Sep 12, 2012 at 3:33 PM, Martin Morgan <mtmorgan at fhcrc.org>
wrote:

On 09/12/2012 12:15 PM, Kasper Daniel Hansen wrote:

One thing I have in my package that I find indispensable is combine
and (my own) combineList.  The later for combining > 2 objects, which
has a lot of possibilities for speed up especially if (very common)
all the objects have the same rowData, as opposed to Reduce(combine,
LIST)..  Usecase: you need to add additional samples to your
SummarizedExperiment.



I found it difficult in Biobase to write combine methods for eSet,
where
you're really requiring a lot from the user (about the phenoData /
featureData structured in the same way) or going through contortions to
make
it the same in a reasonable-but-ad-hoc way (e.g., when two columns are
factors with the same set of levels but encoded differently). Maybe the
effort required is proportional to the utility of the function
provided...
I'll give it some more thought.


In the abstract case it is hard to imagine combining different
SummarizedExperiments.  My usecase is almost always "additional
samples from the same experiment", and for that situation it is a lot
easier to imagine combining it.  You still need to check that the
granges are similar (and if not, expand some of the assayData with
zeroes or NA's), since the new samples may have coverage in locations
not assayed earlier.  Clearly factors are hard to handle and I assume
there are other hard to handle cases.  Nevertheless, I find such a
function incredibly useful.

I think it is entirely ok to assume that the user knows what (s)he is
doing.

Kasper





--
A model is a lie that helps you see the truth.

Howard Skipper


--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793