Skip to content
Prev 6463 / 21307 Next

[Bioc-devel] 'semantically rich' subsetting of SummarizedExperiments

Hi,
On 10/11/2014 02:25 PM, Vincent Carey wrote:
Also coming a little late to the party, but I also have a preference
for Kasper's proposal of using subsetByXXX.

Supporting 'txdb[GeneList]' is arbitrarily making gene ids special,
when a TxDb contains other ids (transcript and exon ids).

Also if you push a little bit this concept, you quickly run into
some semantic headaches:

   - First, let's keep in mind that for a common track like the
     "UCSC Genes" track, a lot of transcripts are not linked to any
     gene.

   - Then, allowing subsetting a TxDb by a character vector means
     a TxDb has names. At least conceptually. So it's tempting to
     also support 'names(txdb)' (would return all the gene ids).

   - Finally, the names being unique, it seems natural to expect that
     'txdb[names(txdb)]' is a no-op. But it won't because
     'txdb[names(txdb)]' will drop all the transcripts that are not
     linked to a gene.

But before any TxDb subsetting can happen (via [ or subsetByXXX), we
need to bring back the classic (and healthier) pass-by-value semantic
on these objects. (Right now TxDb is a reference class and thus TxDb
objects have a pass-by-reference semantic.)

H.