An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090207/60a068a1/attachment-0001.pl>
compressing data without writing output to file
2 messages · Markus Loecher, Brian Ripley
What do you want the compressed R object to be? (It is not an R object.) Omegahat package Rcompression may help you, but it returns a raw vector (and that has overheads such as the header: you could use its length if appropriate).
On Sat, 7 Feb 2009, Markus Loecher wrote:
This might seem like a strange question
It is ore than a little imprecise ....
but is there any way to compress an
R object (such as a matrix) and know its resulting size in bytes ?
Clearly, I could implement this in the following way (if x is my matrix):
zz <- gzfile(fname,"w");
write.table(x,zz);
close(zz);
file.info(fname)[,"size"];
Hmm, that calcuates the size of a compressed character representation of the object. So do you want the size of an object or of its character representation? object.size() calculated the first.
However, I need to do this for hundreds of thousands of objects and the overhead in terms of disk access due to the actual file creation is prohibitive.
The overheads of finding a character representation and of allocating an R object for the result would also be large.
I guess, I would like a modified object.size() function that returns the size of the compressed (e.g. gzip) version of the object.
I don't see the pooint of calculating the size of something you will not use. And anything involving 'hundreds of thousands of objects' is better done in C code. So why not just write a C function to do whatever it is you really want (but have not told us). In fact ehe way lazy-loading is implemented is pretty close to what you describe -- that uses an on-disk database and it not slow for 100,000 objects.
Thanks! Markus [[alternative HTML version deleted]]
PLEASE do read the posting guide (belatedly) and do not send HTML as you were asked.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595