Skip to content

[Bioc-devel] 'semantically rich' subsetting of SummarizedExperiments

11 messages · Tim Triche, Jr., Gabriel Becker, Sean Davis +2 more

#
https://github.com/vjcitn/biocMultiAssay/blob/master/vignettes/SEresolver.Rnw

shows some modifications to [ that allow subsetting of SE by
gene or pathway name

it may be premature to work at the [ level.  Kasper suggested defining
a suite of subsetBy operations that would accomplish this

i think we could get something along these lines into the release without
too much more work.  votes?
#
Hi, Vince.

I'm coming a little late to the party, but I agree with Kasper's sentiment
that the less "magical" approach of using subsetByXXX might be the cleaner
way to go for the time being.

Sean


On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

  
  
#
Agreed with Sean, having tried implementing to "magical" alternative

--t
#
OK by me to leave [ alone.  We could start with subsetByEntrez,
subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID.

Utilities to generate GRanges for queries in each of these vocabularies
should, perhaps, be in the OrganismDb space?  Once those are in place
no additional infrastructure is necessary?

On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <tim.triche at gmail.com>
wrote:

  
  
#
Hey all,

We are in the (very) early stages of experimenting with something that
seems relevant here: classed identifiers. We are using them for
database/mart queries, but the same concept could be useful for the cases
you're describing I think.

E.g.
An object of class "GeneSymbol"
[1] "BRAF"  "BRCA1"
...


This approach has the benefit of being declarative instead of heuristic
(people won't be able to accidentally invoke it), while still giving most
of the convenience I believe you are looking for.

The object classes inherit directly from character, so should "just work"
most of the time, but as I said it's early days; lots more testing for
functionality and usefulness is needed.

~G


On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey <stvjc at channing.harvard.edu>
wrote:

  
    
#
On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com> wrote:

            
This approach has the flavor of some of the functionality that Martin put
together for the GSEABase package (EntrezIdentifier, etc.).

Sean

  
  
#
On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com> wrote:

            
yes, there has been some code around of that nature ... seems reasonable,
but perhaps a bit heavy.

there are identifier grouping translation facilities in GSEAbase that are
are also pertinent.

  
  
#
Sean and Vincent,

The goal of what we are doing builds off of what Martin has in GSEABase. We
were looking to see how much benefit we can get with something
lighter-weight that lies between indistinguishable character vectors and
the full machinery of GeneSets.

Either way, it seems like formalizing the semantic information is a way to
do what you want. Furthermore, these classed id objects can be created
automatically when there is contextual information e.g. during queries to
databases (or db-like objects), and then simply added to metadata
DataFrames and re-used.

~G
On Sat, Sep 20, 2014 at 12:19 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:

            

  
    
#
Sounds very nice.  Anything for the impending release?
On Sat, Sep 20, 2014 at 11:34 PM, Gabe Becker <becker.gabe at gene.com> wrote:

            

  
  
1 day later
#
Hi,
>
 >shows some modifications to [ that allow subsetting of SE by
 >gene or pathway name

Without reading the code, do you intend that SE[i,j]  will , if i is provided as vector of string, will subset those rows where the name of the GRanges == i?
>a suite of subsetBy operations that would accomplish this

Reminder of a little background to where we are now,  with warnings of lurking dragons:

	http://thread.gmane.org/gmane.science.biology.informatics.conductor/52971/focus=52993

 >
 >i think we could get something along these lines into the release without
 >too much more work.  votes?
 >
Abstaining for now. 

Thanks - the topic is dear to me too.
#
On Mon, Sep 22, 2014 at 10:17 AM, Cook, Malcolm <MEC at stowers.org> wrote:

            
By 'the GRanges' do you mean rowData(SE)?

My code breaks that capability -- e.g.,  ALLse["982_at",] now fails.
But I think that could be handled if desired.
Thanks for the reminder.  I understand that training dragons can cause
tears to be shed.