Hi Using select/mapIDs to annotate probe IDs is an additional step that confuses many. May I suggest we automatically populate fData with minimal annotation (ProbeID, entrez ID, symbol) if a known platform is detected. We record the version and parameters (eg mapIDs multi=first) used to create fData. But for beginners I think it would be a helpful start. What do you think? Aedin
[Bioc-devel] Bioc-devel Digest, Vol 145, Issue 60
5 messages · Aedin Culhane, Vincent Carey, Martin Morgan
I am in favor of simplifying the binding of useful metadata to our genome-wide objects. Before we automate this I think we should define a widely applicable procedure for this task ... and see how it works in examples from the ExperimentData library and ExperimentHub. Using fData for ExpressionSet and rowData for SummarizedExperiment and rowRanges for RangedSummarizedExperiment might also be susceptible of simplification. fAnno? On Wed, Apr 20, 2016 at 4:10 PM, Aedin Culhane <aedin at jimmy.harvard.edu> wrote:
Hi Using select/mapIDs to annotate probe IDs is an additional step that confuses many. May I suggest we automatically populate fData with minimal annotation (ProbeID, entrez ID, symbol) if a known platform is detected. We record the version and parameters (eg mapIDs multi=first) used to create fData. But for beginners I think it would be a helpful start. What do you think? Aedin
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Hi Vince Agreed, I agree fData is over-simplification. But if data have an associated "annotation", the feature annotation associated with it should be available. I was recently trying to map between one of the hugene st1.0 and primeview arrays. The first has multiple .db packages (including hugene10stprobeset.db, hugene10sttranscriptcluster.db) and its not very clear which is the correct one to use. There is no .db package for primeview, so I had to download the .csv file from the Affy website and build the package. The probe genome co-ordinates would allow better merging of platforms (as opposed to mapping identifiers to a common entrez gene id/transcript id). Moreover, with GRanges, we could use mapToTranscripts, findOverlaps, countOverlaps to map between platforms. A.
On 4/20/16 22:20, Vincent Carey wrote:
I am in favor of simplifying the binding of useful metadata to our
genome-wide objects. Before we automate this I think we
should define a widely applicable procedure for this task ... and see
how it works in examples from the ExperimentData library and
ExperimentHub. Using fData for ExpressionSet and rowData for
SummarizedExperiment and rowRanges for RangedSummarizedExperiment
might also be susceptible of simplification. fAnno?
On Wed, Apr 20, 2016 at 4:10 PM, Aedin Culhane
<aedin at jimmy.harvard.edu <mailto:aedin at jimmy.harvard.edu>> wrote:
Hi
Using select/mapIDs to annotate probe IDs is an additional step
that confuses many.
May I suggest we automatically populate fData with minimal
annotation (ProbeID, entrez ID, symbol) if a known platform is
detected. We record the version and parameters (eg mapIDs
multi=first) used to create fData. But for beginners I think it
would be a helpful start.
What do you think?
Aedin
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
On 04/20/2016 11:15 PM, Aedin Culhane wrote:
Hi Vince Agreed, I agree fData is over-simplification. But if data have an associated "annotation", the feature annotation associated with it should be available. I was recently trying to map between one of the hugene st1.0 and primeview arrays. The first has multiple .db packages (including hugene10stprobeset.db, hugene10sttranscriptcluster.db) and its not very clear which is the correct one to use. There is no .db package for primeview, so I had to download the .csv file from the Affy website and build the package.
so auto-filling annotations wouldn't have helped here, because there is not an automatic choice between alternate packages and because the primeview array doesn't have an annotation package?
The probe genome co-ordinates would allow better merging of platforms (as opposed to mapping identifiers to a common entrez gene id/transcript id). Moreover, with GRanges, we could use mapToTranscripts, findOverlaps, countOverlaps to map between platforms.
the brain array project makes it clear that there's more than one way to map a probe to a gene. All we do is report what the manufacturer says. I think we don't have the resources (technical expertise, in addition to labor) to re-inventing and maintain our own mappings. Martin
A. On 4/20/16 22:20, Vincent Carey wrote:
I am in favor of simplifying the binding of useful metadata to our
genome-wide objects. Before we automate this I think we
should define a widely applicable procedure for this task ... and see
how it works in examples from the ExperimentData library and
ExperimentHub. Using fData for ExpressionSet and rowData for
SummarizedExperiment and rowRanges for RangedSummarizedExperiment
might also be susceptible of simplification. fAnno?
On Wed, Apr 20, 2016 at 4:10 PM, Aedin Culhane
<aedin at jimmy.harvard.edu <mailto:aedin at jimmy.harvard.edu>> wrote:
Hi
Using select/mapIDs to annotate probe IDs is an additional step
that confuses many.
May I suggest we automatically populate fData with minimal
annotation (ProbeID, entrez ID, symbol) if a known platform is
detected. We record the version and parameters (eg mapIDs
multi=first) used to create fData. But for beginners I think it
would be a helpful start.
What do you think?
Aedin
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.
On Thu, Apr 21, 2016 at 8:35 AM, Martin Morgan <
martin.morgan at roswellpark.org> wrote:
On 04/20/2016 11:15 PM, Aedin Culhane wrote:
Hi Vince Agreed, I agree fData is over-simplification. But if data have an associated "annotation", the feature annotation associated with it should be available. I was recently trying to map between one of the hugene st1.0 and primeview arrays. The first has multiple .db packages (including hugene10stprobeset.db, hugene10sttranscriptcluster.db) and its not very clear which is the correct one to use. There is no .db package for primeview, so I had to download the .csv file from the Affy website and build the package.
so auto-filling annotations wouldn't have helped here, because there is not an automatic choice between alternate packages and because the primeview array doesn't have an annotation package? The probe genome co-ordinates would allow better merging of platforms
(as opposed to mapping identifiers to a common entrez gene id/transcript id). Moreover, with GRanges, we could use mapToTranscripts, findOverlaps, countOverlaps to map between platforms.
the brain array project makes it clear that there's more than one way to map a probe to a gene. All we do is report what the manufacturer says. I think we don't have the resources (technical expertise, in addition to labor) to re-inventing and maintain our own mappings.
Yes. I think there are two threads emerging here. First, one focused on support for affy arrays. We've lost momentum here but may still be the system of first resort for people who want to preprocess and analyze them. More vignettes and benchmarking for the newer releases like HTA and primeview would probably be welcome to researchers who use them (120 GDS in GEO for primeview, one of which cites affy/RMA; 82 GDS on HTA 2.0 gene version). I think someone who is heavily invested in these platforms would have to step up to fill gaps in our support. Second, the concept of streamlining the attachment of annotation information to array or sequence-based quantifications. Here we need data from the community about current gaps and successes. I think it would be helpful (and feasible) to have a generic that addresses this, but there is variation in the number of resources to be consulted to annotate a given platform, and no natural choice of resource for various types of feature. So programming and documentation at the user level seem inevitable for any given solution. Centralized efforts may not pay off.
Martin
A. On 4/20/16 22:20, Vincent Carey wrote:
I am in favor of simplifying the binding of useful metadata to our
genome-wide objects. Before we automate this I think we
should define a widely applicable procedure for this task ... and see
how it works in examples from the ExperimentData library and
ExperimentHub. Using fData for ExpressionSet and rowData for
SummarizedExperiment and rowRanges for RangedSummarizedExperiment
might also be susceptible of simplification. fAnno?
On Wed, Apr 20, 2016 at 4:10 PM, Aedin Culhane
<aedin at jimmy.harvard.edu <mailto:aedin at jimmy.harvard.edu>> wrote:
Hi
Using select/mapIDs to annotate probe IDs is an additional step
that confuses many.
May I suggest we automatically populate fData with minimal
annotation (ProbeID, entrez ID, symbol) if a known platform is
detected. We record the version and parameters (eg mapIDs
multi=first) used to create fData. But for beginners I think it
would be a helpful start.
What do you think?
Aedin
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.