Skip to content
Back to formatted view

Raw Message

Message-ID: <CABdHhvGYjNavx695yz32NBYw0Er_6oN60KnVXKQU=Vu8F724yA@mail.gmail.com>
Date: 2012-05-02T16:27:04Z
From: Hadley Wickham
Subject: Decompressing raw vectors in memory
In-Reply-To: <4FA15DDD.4040607@stats.ox.ac.uk>

> Well, it seems what you get there depends on the client, but I did
>
> tystie% curl -o foo "http://httpbin.org/gzip"
> tystie% file foo
> foo: gzip compressed data, last modified: Wed May ?2 17:06:24 2012, max
> compression
>
> and the final part worried me: I do not know if memDecompress() knows about
> that format. ?The help page does not claim it can do anything other than
> de-compress the results of memCompress() (although past experience has shown
> that it can in some cases). ?gzfile() supports a much wider range of
> formats.

Ah, ok.  Thanks.  Then in that case it's probably just as easy to save
it to a temp file and read that.

  con <- file(tmp) # R automatically detects compression
  open(con, "rb")
  on.exit(close(con), TRUE)

  readBin(con, raw(), file.info(tmp)$size * 10)

The only challenge is figuring out what n to give readBin. Is there a
good general strategy for this?  Guess based on the file size and then
iterate until result of readBin has length less than n?

  n <- file.info(tmp)$size * 2
  content <- readBin(con, raw(),  n)
  n_read <- length(content)
  while(n_read == n) {
    more <- readBin(con, raw(),  n)
    content <- c(content, more)
    n_read <- length(more)
  }

Which is not great style, but there shouldn't be many reads.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/