understanding lexical scope

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081218/68b9a71a/attachment.pl>
2008/12/18  <joseph.g.boyer at gsk.com>:
I am trying to understand the concept of lexical scope in "An Introduction
to R" by the R Core development team.

I'd appreciate it if someone would explain why the following example does
not work:

q <- function(y) {x + y}; w <- function(x){q(x)}; w(2);

According to the discussion of Scope on page 46, it seems to me that R
will interpret the free variable x in q as the parameter x in w,
Why? R will look at the enclosing environment, which here is the
workspace. Maybe you meant:
w <- function(x){ q <- function(y) x+y; q(x)}; w(2)
which works as you said.

HTH,
Antonio.
and so
will
give w(2) = 2+2.

Joe Boyer
Statistical Sciences
Renaissance Bldg 510, 3233-D
Mail Stop RN0320
8-275-3661
cell: (610) 209-8531
       [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Antonio, Fabio Di Narzo
Ph.D. student at
Department of Statistical Sciences
University of Bologna, Italy

I am trying to understand the concept of lexical scope in "An Introduction
to R" by the R Core development team.

I'd appreciate it if someone would explain why the following example does
not work:

q <- function(y) {x + y}; w <- function(x){q(x)}; w(2);

According to the discussion of Scope on page 46, it seems to me that R
will interpret the free variable x in q as the parameter x in w, and so
will
give w(2) = 2+2.

No, not at all.   The function q() is not defined inside w(), it is defined in the global environment. Inside q(), x is first looked up as a local variable, without success, and then looked up in the environment where q() was defined (the global environment), also without success.

There is an x in the calling environment of q(), ie, inside w(), but finding things in the calling environment is dynamic scope rather than lexical scope.

       -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20081219/c85f60e6/attachment.pl>
-----Original Message-----
From: r-help-bounces at r-project.org 
[mailto:r-help-bounces at r-project.org] On Behalf Of 
joseph.g.boyer at gsk.com
Sent: Friday, December 19, 2008 7:41 AM
To: Thomas Lumley
Cc: r-help at r-project.org
Subject: Re: [R] understanding lexical scope

Thomas, Jeff, Mark, Antonio,

Thank you for your answers. They have helped me clarify how R 
functions 
work. They work differently from SAS functions (which SAS 
calls macros.)
Well, SAS macros are not functions in the traditional sense.  The SAS macro language for the most part just does text substitution prior to the SAS code being sent to the SAS "compiler"/interpreter. So, your description of rewriting the "function body" in step 1. below, is fairly accurate for SAS macro, but it is not accurate for R.  If you try to fit R functions into a SAS macro language mold you will only confuse yourself on both accounts.  I will leave the technical details of R functions to the R experts.
If you know SAS, consider the following code:

*********************
%macro q(y);

data one;
outvar = &y. + &x.; output;
call symputx("outvar", outvar, "G");
run;

%mend;

%macro w(x);

%q(&x.);
%put &outvar.;

%mend;
**************

Then %w(2); will result in the value 4 being placed in the SAS log.

To me, while the coding is quite awkward, the execution is 
logical. The 
variable x has been defined by the call to the macro w, so 
there is no 
problem when SAS encounters a reference to x in the macro q.

But in the equivalent code in R, 

q <- function(y) y +x; w <- function(x) q(x); w(2);

when R can't find the second argument of q in the local 
environment of the 
macro q, it doesn't look in the local environment of the 
macro w, it goes 
If you want to try to compare the R language to SAS language (not favorable to SAS for most on this list), the better comparison for understanding is the data step language, not SAS macro.
all the way back to 
the global environment, as you have all pointed out.

So in my little model of how R functions work, when a 
function is called

1. R rewrites the body of the function, replacing all of the 
parameter 
names with the values given to them in the function call.

2. R then tries to execute the expressions. But R only 
"remembers" the 
assignment of values to parameter names during step 1. Thus 
in our example
        it has to go the global environment to find a value for "x" 
referenced in q.

Is this right?

I bet one of the expeRts on the list will provide you with more detail than could have ever hoped for.

Dan

Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA  98504-5204
Thomas, Jeff, Mark, Antonio,

Thank you for your answers. They have helped me clarify how R functions 
work. They work differently from SAS functions (which SAS calls macros.)

To me, while the coding is quite awkward, the execution is logical. The 
variable x has been defined by the call to the macro w, so there is no 
problem when SAS encounters a reference to x in the macro q.

...

But in the equivalent code in R, 

q <- function(y) y +x; w <- function(x) q(x); w(2);

when R can't find the second argument of q in the local environment of the 
macro q, it doesn't look in the local environment of the macro w, it goes 
all the way back to  the global environment, as you have all pointed out.

When you think of it as "all the way back to the global environment", you're
introducing confusion. The lexical scoping way of doing it means that you
can look at q right now and tell where it's going to look for x: first in q,
then in the environment where you are defining it (global in this instance),
etc. There's nothing dynamic about where it finds x. It does not matter how
you call q or what w -- which might not even exist -- might or might not do.

The way you previously preferred depends entirely on how q is called.
Imagine you have not just w, but w1, w2, w3, w4, ..., w77, and each one does
something different -- many do not have x as parameter -- and several of
them call each other before calling q. You cannot begin to tell ahead of
time where x will come from, and it would be extremely hard to figure out
the order of calls that actually occur to figure out which x you're going to
get.

In your very simple example where q is always called by w and w never
attempts to do anything tricky, it's not hard to see, but in the real world,
it would make your head spin.

You're still thinking in the dynamic sense when you talk about "all the way
back".
View this message in context: http://www.nabble.com/understanding-lexical-scope-tp21084267p21101765.html
Sent from the R help mailing list archive at Nabble.com.
-----Original Message-----
From: r-help-bounces at r-project.org 
[mailto:r-help-bounces at r-project.org] On Behalf Of 
joseph.g.boyer at gsk.com
Sent: Friday, December 19, 2008 7:41 AM
To: Thomas Lumley
Cc: r-help at r-project.org
Subject: Re: [R] understanding lexical scope

Thomas, Jeff, Mark, Antonio,

Thank you for your answers. They have helped me clarify how R 
functions 
work. They work differently from SAS functions (which SAS 
calls macros.)
Well, SAS macros are not functions in the traditional sense. The SAS
macro language for the most part just does text substitution prior to
the SAS code being sent to the SAS "compiler"/interpreter. So, your
description of rewriting the "function body" in step 1. below, is fairly
accurate for SAS macro, but it is not accurate for R. If you try to fit
R functions into a SAS macro language mold you will only confuse
yourself on both accounts. I will leave the technical details of R
functions to the R experts.
[....]

I bet one of the expeRts on the list will provide you with more detail
than could have ever hoped for.

Not much, I think. It's one of those cases where you too easily end up 
rewriting manuals or even books. The text above is quite accurate: 
Macro-based languages substitute text, structured languages call 
functions with parameters. And some do a bit of each. And every now and 
again you wish that the language at hand would do the opposite of what 
it actually does.

One distinction is if you have things like

#define f(x) 2*x
#define g(y) f(y+2)

(in the C language preprocessor syntax), then you end up with g(y) as 
y+2*2 (i.e., y+4), whereas the corresponding function calls give 
2*(y+2). Also, and the flip side of the original question: Macros have 
difficulties with encapsulation; with a bit of bad luck, arguments given 
to f() can modify its internal variables.

In R there are things that you want to do that are macro-like, and you 
can generally achieve the same effect with substitute/match.call/eval 
constructions, but it does get a bit contorted (lines 3-10 of the lm 
function is required reading if you want to understand these matters). 
Some of us occasionally ponder whether it would be cleaner to have a 
real (LISP-style) macro facility, but nothing really convincing has come 
up this far.
O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907
One distinction is if you have things like

#define f(x) 2*x
#define g(y) f(y+2)

(in the C language preprocessor syntax), then you end up with g(y) as
y+2*2 (i.e., y+4), whereas the corresponding function calls give
2*(y+2). Also, and the flip side of the original question: Macros have
difficulties with encapsulation; with a bit of bad luck, arguments
given to f() can modify its internal variables. 
using c macros, you end up with g(y) substituted by 2*y+2, rather than
y+2*2, as you say (and rather than 2*(y+2), which you'd effectively get
using a function).

that's why you'd typically include all occurences of all macro
'parameters' in the macro 'body' in parentheses:

#define f(x) 2*(x)

some consider using c macros as not-so-good practice and favour inline
functions.  but macros are not always bad; in scheme, for example, you
have a hygienic macro system which let's you use the benefits of macros
while avoiding some of the risks.

vQ
Peter Dalgaard wrote:
One distinction is if you have things like

#define f(x) 2*x
#define g(y) f(y+2)

(in the C language preprocessor syntax), then you end up with g(y) as
y+2*2 (i.e., y+4), whereas the corresponding function calls give
2*(y+2). Also, and the flip side of the original question: Macros have
difficulties with encapsulation; with a bit of bad luck, arguments
given to f() can modify its internal variables. 
using c macros, you end up with g(y) substituted by 2*y+2, rather than
y+2*2, as you say (and rather than 2*(y+2), which you'd effectively get
using a function).
Oops. Yes. I suppose I had x*2 there at some point....
that's why you'd typically include all occurences of all macro
'parameters' in the macro 'body' in parentheses:

#define f(x) 2*(x)

some consider using c macros as not-so-good practice and favour inline
functions.  but macros are not always bad; in scheme, for example, you
have a hygienic macro system which let's you use the benefits of macros
while avoiding some of the risks.

vQ

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
  (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907