Message-ID: <CABdHhvGYjNavx695yz32NBYw0Er_6oN60KnVXKQU=Vu8F724yA@mail.gmail.com>
Date: 2012-05-02T16:27:04Z
From: Hadley Wickham
Subject: Decompressing raw vectors in memory
In-Reply-To: <4FA15DDD.4040607@stats.ox.ac.uk>
> Well, it seems what you get there depends on the client, but I did
>
> tystie% curl -o foo "http://httpbin.org/gzip"
> tystie% file foo
> foo: gzip compressed data, last modified: Wed May ?2 17:06:24 2012, max
> compression
>
> and the final part worried me: I do not know if memDecompress() knows about
> that format. ?The help page does not claim it can do anything other than
> de-compress the results of memCompress() (although past experience has shown
> that it can in some cases). ?gzfile() supports a much wider range of
> formats.
Ah, ok. Thanks. Then in that case it's probably just as easy to save
it to a temp file and read that.
con <- file(tmp) # R automatically detects compression
open(con, "rb")
on.exit(close(con), TRUE)
readBin(con, raw(), file.info(tmp)$size * 10)
The only challenge is figuring out what n to give readBin. Is there a
good general strategy for this? Guess based on the file size and then
iterate until result of readBin has length less than n?
n <- file.info(tmp)$size * 2
content <- readBin(con, raw(), n)
n_read <- length(content)
while(n_read == n) {
more <- readBin(con, raw(), n)
content <- c(content, more)
n_read <- length(more)
}
Which is not great style, but there shouldn't be many reads.
Hadley
--
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/