Hi I'm in the final stage of preparing an mzIdentML parser for submission to Bioconductor (https://github.com/thomasp85/mzID) The parser is intended to be quite sparse and not interpret the content of the mzIdentML file that much. One feature I would like to include though, is that each scan gets annotated with an mzR compatible acquisition number for better interoperability between the two parsers. The HUPO specifications for the mzIdentML format specifies that each scan in the file is labelled with a spectrumID and a reference to the ms data file. Furthermore each ms data file should have a spectrum ID format specified according to the controlled vocabulary. The content of the spectrumID can thus be either e.g. 'scanID=<someInteger>' , 'spectrum=<someInteger>', 'scan=<someInteger>' or even more elaborate: 'sample=<someInteger> period=<someInteger> cycle=<someInteger> experiment=<someInteger>', depending on the machine producing the ms data. When an ms data file gets parsed by mzR it is all conveniently dropped and replaced by an acquisitionNum, that uniquely identifies the scan. This is quite easy to handle for spectrumID's consisting of only e.g. 'scan=<someInteger>' but for spectrumID's with more than one identifier it gets a bit more fuzzy and I don't like guessing. So the question is: How can I ensure that I extract the right value from the spectrumID for an mzR compatible acquisitionNum? I realize that the generation of the acquisitionNum in mzR is probably handled by the RAMP module, but I hope some of the mzR folks (or others) can help. best Thomas Pedersen, PhD student at the Technical University of Denmark (DTU)
[Bioc-devel] Extracting mzR compatible acquisition number from mzIdentML files
3 messages · Thomas Lin Pedersen, Neumann, Steffen, Laurent Gatto
Hi,
On Wed, 2013-05-01 at 14:43 +0200, Thomas Dybdal Pedersen wrote:
...
I'm in the final stage of preparing an mzIdentML parser for submission to Bioconductor (https://github.com/thomasp85/mzID) The parser is intended to be quite sparse and not interpret the content of the mzIdentML file that much.
That's great news to hear that PSI standards see more adoption in BioC. [...]
So the question is: How can I ensure that I extract the right value from the spectrumID for an mzR compatible acquisitionNum? I realize that the generation of the acquisitionNum in mzR is probably handled by the RAMP module, but I hope some of the mzR folks (or others) can help.
RAMP is indeed the "problem" here. mzR has been developed with the concepts of pluggable backends in mind, and netCDF and RAMP are currently implemented. Please see https://github.com/sneumann/mzR/wiki/Extending-mzR for our ideas to add a "proper" pwiz backend, which directly wraps the pwiz msdata C++ object instead of indirectly going through RAMP. That new backend would solve your problems. Note also that Laurent Gatto also envisioned that the same approach could add a backend to mzIdentML. Yours, Steffen
IPB Halle AG Massenspektrometrie & Bioinformatik Dr. Steffen Neumann http://www.IPB-Halle.DE Weinberg 3 http://msbi.bic-gh.de 06120 Halle Tel. +49 (0) 345 5582 - 1470 +49 (0) 345 5582 - 0 sneumann(at)IPB-Halle.DE Fax. +49 (0) 345 5582 - 1409
Dear Thomas,
On 1 May 2013 13:43, Thomas Dybdal Pedersen <thomasp85 at gmail.com> wrote:
Hi I'm in the final stage of preparing an mzIdentML parser for submission to Bioconductor (https://github.com/thomasp85/mzID) The parser is intended to be quite sparse and not interpret the content of the mzIdentML file that much.
That sounds very promising and a welcome addition to the PSI format tools.
One feature I would like to include though, is that each scan gets annotated with an mzR compatible acquisition number for better interoperability between the two parsers. The HUPO specifications for the mzIdentML format specifies that each scan in the file is labelled with a spectrumID and a reference to the ms data file. Furthermore each ms data file should have a spectrum ID format specified according to the controlled vocabulary. The content of the spectrumID can thus be either e.g. 'scanID=<someInteger>' , 'spectrum=<someInteger>', 'scan=<someInteger>' or even more elaborate: 'sample=<someInteger> period=<someInteger> cycle=<someInteger> experiment=<someInteger>', depending on the machine producing the ms data. When an ms data file gets parsed by mzR it is all conveniently dropped and replaced by an acquisitionNum, that uniquely identifies the scan. This is quite easy to handle for spectrumID's consisting of only e.g. 'scan=<someInteger>' but for spectrumID's with more than one identifier it gets a bit more fuzzy and I don't like guessing. So the question is: How can I ensure that I extract the right value from the spectrumID for an mzR compatible acquisitionNum? I realize that the generation of the acquisitionNum in mzR is probably handled by the RAMP module, but I hope some of the mzR folks (or others) can help.
You are right about RAMP. I am going to dig a bit more into a concrete example before saying anything silly about spectrumID and acquisitionNum and test mzID in the same time. Best wishes, Laurent
best Thomas Pedersen, PhD student at the Technical University of Denmark (DTU)
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel