memDecompress and zlib compressed base64 encoded string

3 messages · Johannes Graumann, Brian Ripley

Thu, Jan 14, 2010 5:03 AM #

Hi,

I have zlib compressed strings (example is attached) and would like to 
decompress them using memDecompress ...

I try this:

Error in memDecompress(as.raw(compressed), type = "g") :
  internal error -3 in memDecompress(2)
In addition: Warning messages:
1: In memDecompress(as.raw(compressed), type = "g") :
  NAs introduced by coercion
2: In memDecompress(as.raw(compressed), type = "g") :
  out-of-range values treated as 0 in coercion to raw

Can anyone nudge me into the right direction regarding this?

Thanks, Joh
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: compressed.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100114/e6345fea/attachment.txt>

Brian Ripley

Thu, Jan 14, 2010 1:37 PM #

On Thu, 14 Jan 2010, Johannes Graumann wrote:

What is that file? Not gzip compression:

gannet% file compressed.txt
compressed.txt: ASCII text, with very long lines

since gzip uses a magic header that 'file' knows about.  And even if 
the header was stripped, such files are 8-bit and yours is ASCII.
Try

[1] "x\x9c\xf3\xca\xcfH\xcc\xcbK-Vp/J,\xcd\0052\001:\n\006\x90"

to see what a real gzipped string looks like.

You have not told us the 'at a minimum' information requested in the 
posting guide.  But you should not expect that to read a binary file, 
especially not in a MBCS locale.  We have readBin for that purpose.

I don't think you know what as.raw does: it does not convert bytes in 
a character string to raw (for which you need charToRaw).

It is always a good idea to look at each stage of your computation:

[1] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00
[26] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00 00 00

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Johannes Graumann

Fri, Jan 15, 2010 2:48 AM #

Prof Brian Ripley wrote:

I am dealing with mass spectrometric data in a XML file format (mzXML). The 
biggest part of the contained data is actual mass spectra that are base64 
encoded and optionally compressed using http://zlib.net (saving quite some 
storage space). When they are compressed I just get an XML node that looks 
like this
   <peaks>CONTENT OF THE ORIGINAL ATTACHMENT HERE</peaks>
I would like to be able to decompress that string and thought that 
memDecompress was the right tool to do so ...

I'm actually reading this in as a string from the XML file ...

Yup, that was plain stupid and trying to make memDecompress run at all 
(since handing it the character string also resulted in an error.

R version 2.10.1 (2009-12-14) 
x86_64-pc-linux-gnu 

locale:
 [1] LC_CTYPE=en_US.UTF-8          LC_NUMERIC=C                 
 [3] LC_TIME=en_US.UTF-8           LC_COLLATE=en_US.UTF-8       
 [5] LC_MONETARY=en_US.UTF-8       LC_MESSAGES=en_US.UTF-8      
 [7] LC_PAPER=en_US.UTF-8          LC_NAME=en_US.UTF-8          
 [9] LC_ADDRESS=en_US.UTF-8        LC_TELEPHONE=en_US.UTF-8     
[11] LC_MEASUREMENT=en_US.UTF-8    LC_IDENTIFICATION=en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rkward_0.5.1

loaded via a namespace (and not attached):
[1] tools_2.10.1

Thanks for any further hints, Joh