Skip to content

Large discrepancies in the same object being saved to .RData

5 messages · Julian.Taylor at csiro.au, Duncan Murdoch, Paul Johnson +1 more

#
On 06/07/2010 9:04 PM, Julian.Taylor at csiro.au wrote:
I haven't worked through your example, but in general the way that local 
objects get captured is when part of the return value includes an 
environment.  Examples of things that include an environment are locally 
created functions and formulas.  It's probably the latter that you're 
seeing.  When R computes the result of "y ~ ." or a similar formula, it 
attaches a pointer to the environment in which the calculation took 
place, so that later when the formula is used, it can look up y there.  
For example, in your line

lm(y ~ ., data = dat)


from your code, the formula "y ~ ." needs to be computed before R knows 
that you've explicitly listed a dataframe holding the data, and before 
it knows whether the variable y is in that dataframe or is just a local 
variable in the current function.

Since these are just pointers to the environment, this doesn't take up 
much space in memory, but when you save the object to disk, a copy of 
the whole environment will be made, and that can end up wasting up a lot 
of space if the environment contains a lot of things that aren't needed 
by the formula.

Duncan Murdoch
3 days later
#
On Wed, Jul 7, 2010 at 7:12 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
Hi, can I ask a follow up question?

Is there a tool to browse *.Rdata files without loading them into R?

In HDF5 (a data storage format we use sometimes), there is a CLI
program "h5dump" that will spit out line-by-line all the contents of a
storage entity.  It will literally track through all the metadata, all
the vectors of scores, etc.  I've found that handy to "see what's
really  in there" in cases like the one that OP asked about.
Sometimes, we find that there are things that are "in there" by
mistake, as Duncan describes, and then we can try to figure why they
are in there.

pj
#
On 10/07/2010 2:33 PM, Paul Johnson wrote:
I don't know of one.  You can load the whole file into an empty 
environment, but then you lose information about "where did it come from"?

Duncan Murdoch
#
I'm still a bit puzzled by the original question.  I don't think it
has much to do with .RData files and their sizes.  For me the puzzle
comes much earlier.  Here is an example of what I mean using a little
session
[1] 96345

### Now look at what happens when a function returns a formula as the
### value, with a big item floating around in the function closure:
+ junk <- rnorm(10000000)
+ y ~ x
+ }
[1] 10096355
y ~ x
### the extra Vcells are located.
372 bytes

### Does v0 have an enclosing environment?
<environment: 0x021cc538>
[1] "junk"
[1] 96355

### Now consider a second example where the object
### is not a formula, but contains one.
+ junk <- rnorm(10000000)
+ x <- 1:3
+ y <- rnorm(3)
+ lm(y ~ x)
+ }
[1] 10096455

### in this case, though, there is no 
### (obvious) enclosing environment
NULL
7744 bytes
Error in ls(envir = environment(v1)) : invalid 'envir' argument
[1] 96366
And in this second case, as noted by Julian Taylor, if you save() the
object the .RData file is also huge.  There is an environment attached
to the object somewhere, but it appears to be occluded and entirely
inaccessible.  (I have poked around the object components trying to
find the thing but without success.)

Have I missed something?

Bill Venables.

-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
Sent: Sunday, 11 July 2010 10:36 AM
To: Paul Johnson
Cc: r-devel at r-project.org
Subject: Re: [Rd] Large discrepancies in the same object being saved to .RData
On 10/07/2010 2:33 PM, Paul Johnson wrote:
I don't know of one.  You can load the whole file into an empty 
environment, but then you lose information about "where did it come from"?

Duncan Murdoch
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel