variable scope

17 messages · Rui Barradas, Bert Gunter, Marc Schwartz +4 more

Original

1

17

Tue, Aug 28, 2012 11:29 AM #

At the end of a for loop its variables are still present:

for (i in 1:10) {
  x <- vector(length=100000000)
}
ls()

will print "i" and "x".
this means that at the end of the for loop body I have to write

  rm(x)
  gc()

is there a more elegant way to handle this?

Thanks.

Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://camera.org http://palestinefacts.org
http://iris.org.il http://www.PetitionOnline.com/tap12009/ http://truepeace.org
Computers are like air conditioners: they don't work with open windows!

R. Michael Weylandt

Tue, Aug 28, 2012 11:37 AM #

On Tue, Aug 28, 2012 at 1:29 PM, Sam Steingold <sds at gnu.org> wrote:

Wrap the loop in local() scope perhaps? This might get tricky if you
need to save some results from the loop, but I think you're ok if they
are initialized outside the loop and you use super-assignment. Almost
always you shouldn't need manual garbage collection.

Something like: # Terribly impractical, but gets the point across

y <- numeric(100)
local({
  for(i in 1:10){
      x <- rnorm(10)
      y[10*(i-1) + 1:10] <<- x
  }
})

print(x) # Error
print(y)

Doubt that works out to be significantly more elegant however.

Michael

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Rui Barradas

Tue, Aug 28, 2012 11:41 AM #

Hello,

Maybe local().
Continue your example with

#?local
local(for (i in 1:10) {
   x <- vector(length=100000000)
})

ls()  # not 'i' nor 'x'

Hope this helps,

Rui Barradas

Em 28-08-2012 19:29, Sam Steingold escreveu:

Bert Gunter

Tue, Aug 28, 2012 11:43 AM #

Perhaps I'm dense, but huh*?

-- Bert
*e.g. What are you trying to do? R does it's own garbage collection --
why do you think you need it?
And, as a general comment which may or may not be applicable, if you
create variables in a function they are local only to the function --
they disappear once the function returns. But I'm not sure this is
relevant to your query.

On Tue, Aug 28, 2012 at 11:29 AM, Sam Steingold <sds at gnu.org> wrote:

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

R. Michael Weylandt

Tue, Aug 28, 2012 11:45 AM #

On Tue, Aug 28, 2012 at 1:37 PM, R. Michael Weylandt

<michael.weylandt at gmail.com> wrote:

Elaborating a little more: the difficulty in this approach is that, if
you need results from the loop  in a variable (instead of just doing
something n times and printing the results),  you need R to know which
variables you intend to keep and which can be thrown away in the loop:
since a loop doesn't define its own scope like some languages (a
practice that always seemed strange to me), you have to resort to
tricks like `<<-` to move variables outside the local() scope.

The other answer is to use functions / apply statements like the good
lord and John Chambers intended :-)

M

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Tue, Aug 28, 2012 11:51 AM #

On Aug 28, 2012, at 1:29 PM, Sam Steingold <sds at gnu.org> wrote:

It is not clear why you want 'x' to be created and overwritten 10 times, but perhaps I am missing something. You end up with 1 'x' after the loop, not 10 objects.

More generally, I can think of a few options, all of which use functions to create your desired object, so that there are no other objects created during execution.

Use sapply() rather than a for() loop:

  NewObject <- sapply(seq(10), DoSomethingHere...)

Use replicate(), which will return an array by default:

  NewObject <- replicate(10, DoSomethingHere...)

Or...just create a function that takes requisite arguments and runs the for() loop within the function body and returns the object you actually need. That way, any variables created within the scope of the function are gone when the function exits.

Regards,

Marc Schwartz

Tue, Aug 28, 2012 12:01 PM #

On 28/08/2012 2:29 PM, Sam Steingold wrote:

You should put most code in functions, so  i and x will be locals and 
will be automatically collected when your function returns.  You rarely 
need to call gc() explicitly; R will do automatic collections.

There are exceptions to the automatic collection.  For example, if your 
return value is a function, its environment will include all the locals, 
and they will persist as long as the returned function does.  If you 
have big local values like your x and you don't need them as part of the 
environment of your function, then you might want to remove them 
explicitly.  The only reason I ever call gc() is for debugging, but 
others may have good reasons for asking space to be freed sooner rather 
than later.

Duncan Murdoch

Tue, Aug 28, 2012 2:55 PM #

my observation is that gc in R sucks.
(it cannot release small objects).
this is not specific to R; ocaml suffers too.

every level of indentation has its own scope.
seems reasonable.

so explicit loops are "deprecated" in some sense?

thanks for your kind and informative reply!

Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://mideasttruth.com
http://think-israel.org http://pmw.org.il http://honestreporting.com
Computers are like air conditioners: they don't work with open windows!

R. Michael Weylandt

Tue, Aug 28, 2012 3:13 PM #

On Tue, Aug 28, 2012 at 4:55 PM, Sam Steingold <sds at gnu.org> wrote:

That may be (I don't know enough about gc's to really say one way or
another), but if I remember correctly, allocation triggers gc, so
manual triggering shouldn't be too important. In my experience, the
one point I've needed it was after freeing multiple very large objects
when hitting memory limits. Rewriting that code to use functions
rather than as one long imperative slog was a real performance win.

Note that if you are compiling locally, you can modify the gc
parameters (frequency of various sweeps) for different performance
characteristics -- grep src/main/memory.c for "Tuning Constants" and
the lines that follow.

I guess I see something like

for(i in 1:2){
   f(i)
}

as little more than short hand for

f(1)
f(2)

rather than as something "more meaningful". I suppose you're thinking
of something like Ruby blocks here? Those correspond more closely to
anonymous functions in my mind. (scope wise)

Certainly, "to each his own" applies here.

No, not when they are really necessary (truly iterative computation),
but it's generally considered clearer/idiomatic to use higher order
functions like *apply (which I suppose is really just Map by another
name) for brevity. The fact that *apply is, in turn, a function call
means you get the "new scope" benefits for free. Looking forward,
apply() statements are stateless and hence much easier to parallelize
than loops. (In fact, the parallel package available with R >=2.14
uses that exact abstraction)

As a note, some folks worry about function calls in R: they do have
some cost, but R keeps a copy-on-write+lazy eval behavior for function
arguments so

x <- seq_len(1e7)
y <- 3

f <- function(a,b) print(b)

f(x,y)

doesn't actually copy x.

It's an implementation detail that I'm not sure is documented anywhere
(I may in fact just be making it up), but it makes some folks feel
better about using many small functions.

Cheers,
Michael

Tue, Aug 28, 2012 3:21 PM #

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Sam Steingold <sds at gnu.org> wrote:

Algorithms that work on small objects suck, to.  Vectorize, vectorize, vectorize. Then you won't worry about these limitations.

reasonable... to you. To me, as well, but the scoping in R has certain advantages for ad-hoc analyses, so leave your grumpy preconceptions behind and learn about environments and their parent environments.

Notice the emoticon. c.f. comments above on vectorizing. Loops that should be vectorized are strongly discouraged.  Apply functions do looping tasks like for loops, and have similar inefficiencies compared to vectorized code. However, they involve functions, and since any function, anonymous or named, comes with a new environment, your comfort level of scoping should be appeased.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

R. Michael Weylandt

Tue, Aug 28, 2012 3:30 PM #

On Tue, Aug 28, 2012 at 5:13 PM, R. Michael Weylandt

<michael.weylandt at gmail.com> wrote:

Note that according to R FAQ 3.3.3, this is actually something
intentionally different from S(+) so someone must have decided it was
important at one point. Without any reference, I'd imagine this
matches scheme, as most R scoping rules do.

http://cran.r-project.org/doc/FAQ/R-FAQ.html#R-and-S

Michael

Tue, Aug 28, 2012 6:06 PM #

On 12-08-28 5:55 PM, Sam Steingold wrote:

Sorry, I didn't realize you were just a troll.  Please ignore my 
previous reply.

Duncan Murdoch

Tue, Aug 28, 2012 9:42 PM #

that's what I am doing.

interesting.

People who compare R to scheme are flattering R.  :-)
R might be a step in the right direction from S, but no cigar yet.

Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://honestreporting.com http://memri.org
http://americancensorship.org http://camera.org http://pmw.org.il
Life is a sexually transmitted disease with 100% mortality.

Tue, Aug 28, 2012 9:46 PM #

I did vectorize.
No loops.
However, gsub/strsplit/substring &c allocate a lot of small objects
which are never GCed.

absolutely!

for (...) {
  x <- f()
  g(x)
}

when g fails, having access to x is a HUGE bonus.

I should have written "morally deprecated" :-)

should I also vectorize loops like

for (z in c("a","b","c")) {
  x <- read.table(z)
  ...
}

?

Again, thanks for your insight!

Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://think-israel.org http://palestinefacts.org
http://www.memritv.org http://pmw.org.il http://americancensorship.org
When you talk to God, it's prayer; when He talks to you, it's schizophrenia.

Tue, Aug 28, 2012 9:50 PM #

I am not.

I am referring here to a very specific deficiency which plagues all
non-moving GCs.

I am sorry if I offended you somehow, I do appreciate your insight.

Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://think-israel.org http://mideasttruth.com
http://thereligionofpeace.com http://www.memritv.org http://camera.org
main(a){printf(a,34,a="main(a){printf(a,34,a=%c%s%c,34);}",34);}

Wed, Aug 29, 2012 7:30 AM #

On 29/08/2012 12:50 AM, Sam Steingold wrote:

I don't think you're a troll because you offended me, I think you're a 
troll because you're making false statements, such as that gc in R 
cannot release small objects, without any evidence in support of them.

Duncan Murdoch

Wed, Aug 29, 2012 12:09 PM #

I guess "non-compacting GC" might be a more common expression.

This is common knowledge, discussed, e.g., here:
http://article.gmane.org/gmane.comp.lang.r.general:256174

Whether R GC "cannot release small objects" or "cannot reuse the
fragmented memory after it releases the small objects" is
inconsequential: R consumes RAM which it cannot use.

Again, this is a common deficiency in all memory management systems
which do not compact their storage; something studied in CS101.

Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il http://americancensorship.org
http://dhimmi.com http://openvotingconsortium.org http://truepeace.org
Never underestimate the power of stupid people in large groups.