https://github.com/vjcitn/biocMultiAssay/blob/master/vignettes/SEresolver.Rnw shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested defining a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release without too much more work. votes?
[Bioc-devel] 'semantically rich' subsetting of SummarizedExperiments
11 messages · Tim Triche, Jr., Gabriel Becker, Sean Davis +2 more
Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's sentiment that the less "magical" approach of using subsetByXXX might be the cleaner way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
https://github.com/vjcitn/biocMultiAssay/blob/master/vignettes/SEresolver.Rnw shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested defining a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release without too much more work. votes? [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Agreed with Sean, having tried implementing to "magical" alternative --t
On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote: Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's sentiment that the less "magical" approach of using subsetByXXX might be the cleaner way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
https://github.com/vjcitn/biocMultiAssay/blob/master/vignettes/SEresolver.Rnw shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested defining a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release without too much more work. votes? [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
OK by me to leave [ alone. We could start with subsetByEntrez, subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID. Utilities to generate GRanges for queries in each of these vocabularies should, perhaps, be in the OrganismDb space? Once those are in place no additional infrastructure is necessary? On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
Agreed with Sean, having tried implementing to "magical" alternative --t
On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote: Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's
sentiment
that the less "magical" approach of using subsetByXXX might be the
cleaner
way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested defining a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release
without
too much more work. votes?
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hey all, We are in the (very) early stages of experimenting with something that seems relevant here: classed identifiers. We are using them for database/mart queries, but the same concept could be useful for the cases you're describing I think. E.g.
mysyms = GeneSymbol(c("BRAF", "BRCA1"))
mysyms
An object of class "GeneSymbol" [1] "BRAF" "BRCA1"
yourSE[mysyms, ]
... This approach has the benefit of being declarative instead of heuristic (people won't be able to accidentally invoke it), while still giving most of the convenience I believe you are looking for. The object classes inherit directly from character, so should "just work" most of the time, but as I said it's early days; lots more testing for functionality and usefulness is needed. ~G On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey <stvjc at channing.harvard.edu> wrote:
OK by me to leave [ alone. We could start with subsetByEntrez, subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID. Utilities to generate GRanges for queries in each of these vocabularies should, perhaps, be in the OrganismDb space? Once those are in place no additional infrastructure is necessary? On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
Agreed with Sean, having tried implementing to "magical" alternative --t
On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote: Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's
sentiment
that the less "magical" approach of using subsetByXXX might be the
cleaner
way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested defining a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release
without
too much more work. votes?
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biologist Genentech Research [[alternative HTML version deleted]]
On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com> wrote:
Hey all, We are in the (very) early stages of experimenting with something that seems relevant here: classed identifiers. We are using them for database/mart queries, but the same concept could be useful for the cases you're describing I think. E.g.
mysyms = GeneSymbol(c("BRAF", "BRCA1"))
mysyms
An object of class "GeneSymbol" [1] "BRAF" "BRCA1"
yourSE[mysyms, ]
...
This approach has the flavor of some of the functionality that Martin put together for the GSEABase package (EntrezIdentifier, etc.). Sean
This approach has the benefit of being declarative instead of heuristic (people won't be able to accidentally invoke it), while still giving most of the convenience I believe you are looking for. The object classes inherit directly from character, so should "just work" most of the time, but as I said it's early days; lots more testing for functionality and usefulness is needed. ~G On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey < stvjc at channing.harvard.edu> wrote:
OK by me to leave [ alone. We could start with subsetByEntrez, subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID. Utilities to generate GRanges for queries in each of these vocabularies should, perhaps, be in the OrganismDb space? Once those are in place no additional infrastructure is necessary? On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
Agreed with Sean, having tried implementing to "magical" alternative --t
On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov>
wrote:
Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's
sentiment
that the less "magical" approach of using subsetByXXX might be the
cleaner
way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested
defining
a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release
without
too much more work. votes?
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biologist
Genentech Research
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com> wrote:
Hey all, We are in the (very) early stages of experimenting with something that seems relevant here: classed identifiers. We are using them for database/mart queries, but the same concept could be useful for the cases you're describing I think. E.g.
mysyms = GeneSymbol(c("BRAF", "BRCA1"))
mysyms
An object of class "GeneSymbol" [1] "BRAF" "BRCA1"
yourSE[mysyms, ]
yes, there has been some code around of that nature ... seems reasonable, but perhaps a bit heavy. there are identifier grouping translation facilities in GSEAbase that are are also pertinent.
... This approach has the benefit of being declarative instead of heuristic (people won't be able to accidentally invoke it), while still giving most of the convenience I believe you are looking for. The object classes inherit directly from character, so should "just work" most of the time, but as I said it's early days; lots more testing for functionality and usefulness is needed. ~G On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey < stvjc at channing.harvard.edu> wrote:
OK by me to leave [ alone. We could start with subsetByEntrez, subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID. Utilities to generate GRanges for queries in each of these vocabularies should, perhaps, be in the OrganismDb space? Once those are in place no additional infrastructure is necessary? On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <tim.triche at gmail.com> wrote:
Agreed with Sean, having tried implementing to "magical" alternative --t
On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov> wrote: Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's
sentiment
that the less "magical" approach of using subsetByXXX might be the
cleaner
way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested
defining
a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release
without
too much more work. votes?
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biologist Genentech Research
Sean and Vincent, The goal of what we are doing builds off of what Martin has in GSEABase. We were looking to see how much benefit we can get with something lighter-weight that lies between indistinguishable character vectors and the full machinery of GeneSets. Either way, it seems like formalizing the semantic information is a way to do what you want. Furthermore, these classed id objects can be created automatically when there is contextual information e.g. during queries to databases (or db-like objects), and then simply added to metadata DataFrames and re-used. ~G
On Sat, Sep 20, 2014 at 12:19 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com> wrote:
Hey all, We are in the (very) early stages of experimenting with something that seems relevant here: classed identifiers. We are using them for database/mart queries, but the same concept could be useful for the cases you're describing I think. E.g.
mysyms = GeneSymbol(c("BRAF", "BRCA1"))
mysyms
An object of class "GeneSymbol" [1] "BRAF" "BRCA1"
yourSE[mysyms, ]
...
This approach has the flavor of some of the functionality that Martin put together for the GSEABase package (EntrezIdentifier, etc.). Sean
This approach has the benefit of being declarative instead of heuristic (people won't be able to accidentally invoke it), while still giving most of the convenience I believe you are looking for. The object classes inherit directly from character, so should "just work" most of the time, but as I said it's early days; lots more testing for functionality and usefulness is needed. ~G On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey < stvjc at channing.harvard.edu> wrote:
OK by me to leave [ alone. We could start with subsetByEntrez, subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID. Utilities to generate GRanges for queries in each of these vocabularies should, perhaps, be in the OrganismDb space? Once those are in place no additional infrastructure is necessary? On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <tim.triche at gmail.com wrote:
Agreed with Sean, having tried implementing to "magical" alternative --t
On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov>
wrote:
Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's
sentiment
that the less "magical" approach of using subsetByXXX might be the
cleaner
way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested
defining
a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release
without
too much more work. votes?
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biologist
Genentech Research
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Computational Biologist Genentech Research [[alternative HTML version deleted]]
Sounds very nice. Anything for the impending release?
On Sat, Sep 20, 2014 at 11:34 PM, Gabe Becker <becker.gabe at gene.com> wrote:
Sean and Vincent, The goal of what we are doing builds off of what Martin has in GSEABase. We were looking to see how much benefit we can get with something lighter-weight that lies between indistinguishable character vectors and the full machinery of GeneSets. Either way, it seems like formalizing the semantic information is a way to do what you want. Furthermore, these classed id objects can be created automatically when there is contextual information e.g. during queries to databases (or db-like objects), and then simply added to metadata DataFrames and re-used. ~G On Sat, Sep 20, 2014 at 12:19 PM, Sean Davis <sdavis2 at mail.nih.gov> wrote:
On Sat, Sep 20, 2014 at 3:11 PM, Gabe Becker <becker.gabe at gene.com> wrote:
Hey all, We are in the (very) early stages of experimenting with something that seems relevant here: classed identifiers. We are using them for database/mart queries, but the same concept could be useful for the cases you're describing I think. E.g.
mysyms = GeneSymbol(c("BRAF", "BRCA1"))
mysyms
An object of class "GeneSymbol" [1] "BRAF" "BRCA1"
yourSE[mysyms, ]
...
This approach has the flavor of some of the functionality that Martin put together for the GSEABase package (EntrezIdentifier, etc.). Sean
This approach has the benefit of being declarative instead of heuristic (people won't be able to accidentally invoke it), while still giving most of the convenience I believe you are looking for. The object classes inherit directly from character, so should "just work" most of the time, but as I said it's early days; lots more testing for functionality and usefulness is needed. ~G On Sat, Sep 20, 2014 at 11:38 AM, Vincent Carey < stvjc at channing.harvard.edu> wrote:
OK by me to leave [ alone. We could start with subsetByEntrez, subsetByKEGG, subsetBySymbol, subsetByGOTERM, subsetByGOID. Utilities to generate GRanges for queries in each of these vocabularies should, perhaps, be in the OrganismDb space? Once those are in place no additional infrastructure is necessary? On Sat, Sep 20, 2014 at 12:49 PM, Tim Triche, Jr. <
tim.triche at gmail.com>
wrote:
Agreed with Sean, having tried implementing to "magical" alternative --t
On Sep 20, 2014, at 9:31 AM, Sean Davis <sdavis2 at mail.nih.gov>
wrote:
Hi, Vince. I'm coming a little late to the party, but I agree with Kasper's
sentiment
that the less "magical" approach of using subsetByXXX might be the
cleaner
way to go for the time being. Sean On Sat, Sep 20, 2014 at 10:42 AM, Vincent Carey <
stvjc at channing.harvard.edu>
wrote:
shows some modifications to [ that allow subsetting of SE by gene or pathway name it may be premature to work at the [ level. Kasper suggested
defining
a suite of subsetBy operations that would accomplish this i think we could get something along these lines into the release
without
too much more work. votes?
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
--
Computational Biologist
Genentech Research
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
-- Computational Biologist Genentech Research
1 day later
Hi,
> >shows some modifications to [ that allow subsetting of SE by >gene or pathway name Without reading the code, do you intend that SE[i,j] will , if i is provided as vector of string, will subset those rows where the name of the GRanges == i?
it may be premature to work at the [ level. Kasper suggested defining
>a suite of subsetBy operations that would accomplish this Reminder of a little background to where we are now, with warnings of lurking dragons: http://thread.gmane.org/gmane.science.biology.informatics.conductor/52971/focus=52993 > >i think we could get something along these lines into the release without >too much more work. votes? > Abstaining for now. Thanks - the topic is dear to me too.
On Mon, Sep 22, 2014 at 10:17 AM, Cook, Malcolm <MEC at stowers.org> wrote:
Hi,
> >shows some modifications to [ that allow subsetting of SE by >gene or pathway name
Without reading the code, do you intend that SE[i,j] will , if i is provided as vector of string, will subset those rows where the name of the GRanges == i?
By 'the GRanges' do you mean rowData(SE)? My code breaks that capability -- e.g., ALLse["982_at",] now fails. But I think that could be handled if desired.
it may be premature to work at the [ level. Kasper suggested defining >a suite of subsetBy operations that would accomplish this
Reminder of a little background to where we are now, with warnings of lurking dragons:
Thanks for the reminder. I understand that training dragons can cause tears to be shed.
> >i think we could get something along these lines into the release without >too much more work. votes? >
Abstaining for now. Thanks - the topic is dear to me too.