Skip to content

Large discrepancies in the same object being saved to .RData

6 messages · Bill Venables, Tony Plate, Brian Ripley +2 more

#
Well, I have answered one of my questions below.  The hidden
environment is attached to the 'terms' component of v1.

To see this
$coefficients
NULL

$residuals
NULL

$effects
NULL

$rank
NULL

$fitted.values
NULL

$assign
NULL

$qr
NULL

$df.residual
NULL

$xlevels
NULL

$call
NULL

$terms
<environment: 0x021b9e18>

$model
NULL
[1] 96532
This is still a bit of a trap for young (and old!) players...

I think the main point in my mind is why is it that object.size()
excludes enclosing environments in its reckonings?

Bill Venables.

-----Original Message-----
From: Venables, Bill (CMIS, Cleveland) 
Sent: Sunday, 11 July 2010 11:40 AM
To: 'Duncan Murdoch'; 'Paul Johnson'
Cc: 'r-devel at r-project.org'; Taylor, Julian (CMIS, Waite Campus)
Subject: RE: [Rd] Large discrepancies in the same object being saved to .RData

I'm still a bit puzzled by the original question.  I don't think it
has much to do with .RData files and their sizes.  For me the puzzle
comes much earlier.  Here is an example of what I mean using a little
session
[1] 96345

### Now look at what happens when a function returns a formula as the
### value, with a big item floating around in the function closure:
+ junk <- rnorm(10000000)
+ y ~ x
+ }
[1] 10096355
y ~ x
### the extra Vcells are located.
372 bytes

### Does v0 have an enclosing environment?
<environment: 0x021cc538>
[1] "junk"
[1] 96355

### Now consider a second example where the object
### is not a formula, but contains one.
+ junk <- rnorm(10000000)
+ x <- 1:3
+ y <- rnorm(3)
+ lm(y ~ x)
+ }
[1] 10096455

### in this case, though, there is no 
### (obvious) enclosing environment
NULL
7744 bytes
Error in ls(envir = environment(v1)) : invalid 'envir' argument
[1] 96366
And in this second case, as noted by Julian Taylor, if you save() the
object the .RData file is also huge.  There is an environment attached
to the object somewhere, but it appears to be occluded and entirely
inaccessible.  (I have poked around the object components trying to
find the thing but without success.)

Have I missed something?

Bill Venables.

-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
Sent: Sunday, 11 July 2010 10:36 AM
To: Paul Johnson
Cc: r-devel at r-project.org
Subject: Re: [Rd] Large discrepancies in the same object being saved to .RData
On 10/07/2010 2:33 PM, Paul Johnson wrote:
I don't know of one.  You can load the whole file into an empty 
environment, but then you lose information about "where did it come from"?

Duncan Murdoch
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
On 10/07/2010 10:10 PM, Bill.Venables at csiro.au wrote:
I think the idea is that the environment is not part of the object, it 
is just referenced by the object. In fact, there are at least two 
references to the environment in your second example:

environment(v1$terms)

and

attr(v1$terms, ".Environment")

both refer to it. So you can't just add the size of an environment every 
time you come across it, you would need to keep track of whether it had 
already been counted or not. So as ?object.size says,

"Associated space (e.g. the environment of a function and what the
pointer in a ?EXTPTRSXP? points to) is not included in the
calculation."
Duncan Murdoch
#
Another way of seeing the environments referenced in an object is using 
str(), e.g.:

 > f1 <- function() {
+ junk <- rnorm(10000000)
+ x <- 1:3
+ y <- rnorm(3)
+ lm(y ~ x)
+ }
 > v1 <- f1()
 > object.size(f1)
1636 bytes
 > grep("Environment", capture.output(str(v1)), value=TRUE)
[1] "  .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> "
[2] "  .. .. ..- attr(*, \".Environment\")=<environment: 0x01f11a30> "
 >

-- Tony Plate
On 7/10/2010 10:10 PM, Bill.Venables at csiro.au wrote:
#
On Sun, 11 Jul 2010, Tony Plate wrote:

            
'Some of the environments in a few cases': remember environments have 
environments (and so on), and that namespaces and packages are also 
environments.  So we need to know about the environment of 
environment(v1$terms), which also gets saved (either as a reference or 
as an environment, depending on what it is).

And this approach does not work for many of the commonest cases:
+ x <- pi
+ g <- function() print(x)
+ return(g)
+ }
function ()
  - attr(*, "source")= chr "function() print(x)"
[1] "g" "x"

In fact I think it works only for formulae.
Well, not really hidden.  A terms component is a formula (see 
?terms.object), and a formula has an environment just as a closure 
does.  In neither case does the print() method tell you about it -- 
but ?formula does.

  
    
#
On 11/07/2010 1:30 PM, Prof Brian Ripley wrote:
I've just changed the default print method for formulas to display the 
environment if it is not globalenv(), which is the rule used for 
closures as well.  So now in R-devel:

 > as.formula("y ~ x")
y ~ x

as before, but

 > as.formula("y ~ x", env=new.env())
y ~ x
<environment: 01f83400>

Duncan Murdoch
#

        
DM> On 11/07/2010 1:30 PM, Prof Brian Ripley wrote:
[........................]
>>> On 7/10/2010 10:10 PM, Bill.Venables at csiro.au wrote:
>>>> Well, I have answered one of my questions below.  The hidden
    >>>> environment is attached to the 'terms' component of v1.

    >> Well, not really hidden.  A terms component is a formula
    >> (see ?terms.object), and a formula has an environment
    >> just as a closure does.  In neither case does the print()
    >> method tell you about it -- but ?formula does.

    DM> I've just changed the default print method for formulas to display the 
    DM> environment if it is not globalenv(), which is the rule used for 
    DM> closures as well.  So now in R-devel:

    >> as.formula("y ~ x")
    DM> y ~ x

    DM> as before, but

    >> as.formula("y ~ x", env=new.env())
    DM> y ~ x
    DM> <environment: 01f83400>

I see that our print.formula() actually has not truely fulfilled
our own rule about print methods:

?print   has
 > Description:
 > 
 >      ?print? prints its argument and returns it _invisibly_ 
 >      ..........

Further, I completely agree that it's good to mention the
environment, however, it can be a nuisance when it's part of a
larger print(.) method, so I'd like allowing to suppress that
and hence I've committed the current

print.formula <- function(x, showEnv = !identical(e, .GlobalEnv), ...)
{
    e <- environment(.x <- x) ## return(.) original x
    attr(x, ".Environment") <- NULL
    print.default(unclass(x), ...)
    if (showEnv) print(e)
    invisible(.x)
}

--
Martin Maechler, ETH Zurich