Object and file sizes
On 28/06/2019 7:35 a.m., G?ran Brostr?m wrote:
Hello, I have two large data frames, 'liss' (170 million obs, 8 variables) and 'fobb' (52 million obs, 8 variables, same as for 'liss'), and checking their sizes I get
> object.size(liss)
7477492552 bytes
> object.size(fobb)
2494591736 bytes Fair enough, but when I save them to disk (saveRDS), the size relation is reversed: 'fobb.rds' takes up 273 MB while 'liss.rds' uses 146 MB! I was puzzled by this and thought that I had made a mistake in creating them, but the only explanation I can find for this is that 'liss' contains a lot more missing values.
saveRDS() uses compression by default. Compression works best if there are a lot of repetitive values; every NA is the same, so that would help compression. Other values may also be repeated. If you use saveRDS(compress=FALSE), you'll get much larger results, probably roughly proportional to the object.size() results. Duncan Murdoch