scoping/non-standard evaluation issue

Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------
mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed + 
+         Armed.Forces + Population + Year, data=longley)
mod.2 <- update(mod.1, . ~ . - Year + Year)
all.equal(mod.1, mod.2)
[1] TRUE
f <- function(mod){
+     subs <- 1:10
+     update(mod, subset=subs)
+     }
f(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2, or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------
f1 <- function(mod){
+     assign(".subs", 1:10, envir=.GlobalEnv)
+     on.exit(remove(".subs", envir=.GlobalEnv))
+     update(mod, subset=.subs)
+     }
f1(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
f1(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
f2 <- function(mod){
+     env <- new.env(parent=.GlobalEnv)
+     attach(NULL)
+     on.exit(detach())
+     assign(".subs", 1:10, pos=2)
+     update(mod, subset=.subs)
+     }
f2(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
f2(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00  

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in the
global environment; the problem with f2() is that .subs can be masked by a
variable in the global environment.

Is there a better approach?

Thanks,
 John

--------------------------------
John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------

mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed + 
+         Armed.Forces + Population + Year, data=longley)

mod.2 <- update(mod.1, . ~ . - Year + Year)

all.equal(mod.1, mod.2)
[1] TRUE
f <- function(mod){
+     subs <- 1:10
+     update(mod, subset=subs)
+     }

f(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
   Population + Year, data = longley, subset = subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
 Population          Year  
  1.164e+00    -1.911e+00  

f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2, or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------

f1 <- function(mod){
+     assign(".subs", 1:10, envir=.GlobalEnv)
+     on.exit(remove(".subs", envir=.GlobalEnv))
+     update(mod, subset=.subs)
+     }

f1(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
 Population          Year  
  1.164e+00    -1.911e+00  

f1(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
 Population          Year  
  1.164e+00    -1.911e+00  

f2 <- function(mod){
+     env <- new.env(parent=.GlobalEnv)
+     attach(NULL)
+     on.exit(detach())
+     assign(".subs", 1:10, pos=2)
+     update(mod, subset=.subs)
+     }

f2(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
 Population          Year  
  1.164e+00    -1.911e+00  

f2(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
 Population          Year  
  1.164e+00    -1.911e+00  

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in the
global environment; the problem with f2() is that .subs can be masked by a
variable in the global environment.

Is there a better approach?
I think the best way would be to modify the environment of the formula. Something like the below, except that it doesn't actually work...

f3 <- function(mod) {
  f <- formula(mod)
  environment(f) <- e <-  new.env(parent=environment(f))
  mod <- update(mod, formula=f)
  evalq(.subs <- 1:10, e)
  update(mod, subset=.subs)
}

The catch is that it is not quite so easy to update the formula of a model.
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------

mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed +
+ ? ? ? ? Armed.Forces + Population + Year, data=longley)

mod.2 <- update(mod.1, . ~ . - Year + Year)

all.equal(mod.1, mod.2)
[1] TRUE
f <- function(mod){
+ ? ? subs <- 1:10
+ ? ? update(mod, subset=subs)
+ ? ? }

f(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2, or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------

f1 <- function(mod){
+ ? ? assign(".subs", 1:10, envir=.GlobalEnv)
+ ? ? on.exit(remove(".subs", envir=.GlobalEnv))
+ ? ? update(mod, subset=.subs)
+ ? ? }

f1(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f1(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f2 <- function(mod){
+ ? ? env <- new.env(parent=.GlobalEnv)
+ ? ? attach(NULL)
+ ? ? on.exit(detach())
+ ? ? assign(".subs", 1:10, pos=2)
+ ? ? update(mod, subset=.subs)
+ ? ? }

f2(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f2(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in the
global environment; the problem with f2() is that .subs can be masked by a
variable in the global environment.

Is there a better approach?

I think there is something wrong with R here since the formula in the
call component of mod.1 has a "call" class whereas the corresponding
call component of mod.2 has "formula" class:
class(mod.1$call[[2]])
[1] "call"
class(mod.2$call[[2]])
[1] "formula"

If we reset call[[2]] to have "call" class then it works:
mod.2a <- mod.2
mod.2a$call[[2]] <- as.call(as.list(mod.2a$call[[2]]))
f(mod.2a)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
    Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
 Population          Year
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
  1.164e+00    -1.911e+00
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
Dear Peter,

I played around a bit with your suggestion but wasn't able to get it to
work. 

Thanks for this.

John

--------------------------------
John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of peter dalgaard
Sent: January-04-11 6:05 PM
To: John Fox
Cc: 'Sanford Weisberg'; r-devel at r-project.org
Subject: Re: [Rd] scoping/non-standard evaluation issue

On Jan 4, 2011, at 22:35 , John Fox wrote:

Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------

mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed +
+         Armed.Forces + Population + Year, data=longley)

mod.2 <- update(mod.1, . ~ . - Year + Year)

all.equal(mod.1, mod.2)
[1] TRUE
f <- function(mod){
+     subs <- 1:10
+     update(mod, subset=subs)
+     }

f(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2,
or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------

f1 <- function(mod){
+     assign(".subs", 1:10, envir=.GlobalEnv)
+     on.exit(remove(".subs", envir=.GlobalEnv))
+     update(mod, subset=.subs)
+     }

f1(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f1(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f2 <- function(mod){
+     env <- new.env(parent=.GlobalEnv)
+     attach(NULL)
+     on.exit(detach())
+     assign(".subs", 1:10, pos=2)
+     update(mod, subset=.subs)
+     }

f2(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f2(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in
the
global environment; the problem with f2() is that .subs can be masked by
a
variable in the global environment.

Is there a better approach?
I think the best way would be to modify the environment of the formula.
Something like the below, except that it doesn't actually work...

f3 <- function(mod) {
  f <- formula(mod)
  environment(f) <- e <-  new.env(parent=environment(f))
  mod <- update(mod, formula=f)
  evalq(.subs <- 1:10, e)
  update(mod, subset=.subs)
}

The catch is that it is not quite so easy to update the formula of a
model.
--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
Dear Gabor,

I used str() to look at the two objects but missed the difference that you
found. What I didn't quite understand was why one model worked but not the
other when both were defined at the command prompt in the global
environment.

Thanks,
 John

--------------------------------
John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Gabor Grothendieck
Sent: January-04-11 6:56 PM
To: John Fox
Cc: Sanford Weisberg; r-devel at r-project.org
Subject: Re: [Rd] scoping/non-standard evaluation issue

On Tue, Jan 4, 2011 at 4:35 PM, John Fox <jfox at mcmaster.ca> wrote:
Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------

mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed +
+ ? ? ? ? Armed.Forces + Population + Year, data=longley)

mod.2 <- update(mod.1, . ~ . - Year + Year)

all.equal(mod.1, mod.2)
[1] TRUE
f <- function(mod){
+ ? ? subs <- 1:10
+ ? ? update(mod, subset=subs)
+ ? ? }

f(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2,
or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------

f1 <- function(mod){
+ ? ? assign(".subs", 1:10, envir=.GlobalEnv)
+ ? ? on.exit(remove(".subs", envir=.GlobalEnv))
+ ? ? update(mod, subset=.subs)
+ ? ? }

f1(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f1(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f2 <- function(mod){
+ ? ? env <- new.env(parent=.GlobalEnv)
+ ? ? attach(NULL)
+ ? ? on.exit(detach())
+ ? ? assign(".subs", 1:10, pos=2)
+ ? ? update(mod, subset=.subs)
+ ? ? }

f2(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

f2(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
? ?Population + Year, data = longley, subset = .subs)

Coefficients:
?(Intercept) ?GNP.deflator ? ? ? ? ? GNP ? ?Unemployed ?Armed.Forces
? 3.641e+03 ? ? 8.394e-03 ? ? 6.909e-02 ? ?-3.971e-03 ? ?-8.595e-03
?Population ? ? ? ? ?Year
? 1.164e+00 ? ?-1.911e+00

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in
the
global environment; the problem with f2() is that .subs can be masked by
a
variable in the global environment.

Is there a better approach?

I think there is something wrong with R here since the formula in the
call component of mod.1 has a "call" class whereas the corresponding
call component of mod.2 has "formula" class:

class(mod.1$call[[2]])
[1] "call"
class(mod.2$call[[2]])
[1] "formula"

If we reset call[[2]] to have "call" class then it works:

mod.2a <- mod.2
mod.2a$call[[2]] <- as.call(as.list(mod.2a$call[[2]]))
f(mod.2a)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
    Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
 Population          Year
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
  1.164e+00    -1.911e+00

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Dear Gabor,

I used str() to look at the two objects but missed the difference that you
found. What I didn't quite understand was why one model worked but not the
other when both were defined at the command prompt in the global
environment.
I kind of suspect that the bug is that mod.1 works... I.e., I can vaguely make out the  contours of why mod.2 is not supposed to work and if that is true, neither should mod.1. However, if so, something clearly needs more work. Possibly, some of the people who worked on implement formula environments may want to chime in? (It's been a while, though.)
Thanks,
John

--------------------------------
John Fox
Senator William McMaster 
 Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

-----Original Message-----
From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org]
On
Behalf Of Gabor Grothendieck
Sent: January-04-11 6:56 PM
To: John Fox
Cc: Sanford Weisberg; r-devel at r-project.org
Subject: Re: [Rd] scoping/non-standard evaluation issue

On Tue, Jan 4, 2011 at 4:35 PM, John Fox <jfox at mcmaster.ca> wrote:
Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------

mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed +
+         Armed.Forces + Population + Year, data=longley)

mod.2 <- update(mod.1, . ~ . - Year + Year)

all.equal(mod.1, mod.2)
[1] TRUE
f <- function(mod){
+     subs <- 1:10
+     update(mod, subset=subs)
+     }

f(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2,
or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------

f1 <- function(mod){
+     assign(".subs", 1:10, envir=.GlobalEnv)
+     on.exit(remove(".subs", envir=.GlobalEnv))
+     update(mod, subset=.subs)
+     }

f1(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f1(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f2 <- function(mod){
+     env <- new.env(parent=.GlobalEnv)
+     attach(NULL)
+     on.exit(detach())
+     assign(".subs", 1:10, pos=2)
+     update(mod, subset=.subs)
+     }

f2(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f2(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in
the
global environment; the problem with f2() is that .subs can be masked by
a
variable in the global environment.

Is there a better approach?

I think there is something wrong with R here since the formula in the
call component of mod.1 has a "call" class whereas the corresponding
call component of mod.2 has "formula" class:

class(mod.1$call[[2]])
[1] "call"
class(mod.2$call[[2]])
[1] "formula"

If we reset call[[2]] to have "call" class then it works:

mod.2a <- mod.2
mod.2a$call[[2]] <- as.call(as.list(mod.2a$call[[2]]))
f(mod.2a)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
   Population + Year, data = longley, subset = subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
Population          Year
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 1.164e+00    -1.911e+00

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
Dear Peter,

You hit the nail on the head: I didn't (and don't) understand why mod.1
works -- which I attributed to my imperfect understanding of non-standard
evaluation. Even if there's a bug allowing mod.1 to work, I wonder about the
consequences of fixing it. That might break a lot of code. It would seem
desirable, though, for mod.1 and mod.2 to behave the same.

Best,
 John
-----Original Message-----
From: peter dalgaard [mailto:pdalgd at gmail.com]
Sent: January-05-11 10:51 AM
To: John Fox
Cc: 'Gabor Grothendieck'; 'Sanford Weisberg'; r-devel at r-project.org
Subject: Re: [Rd] scoping/non-standard evaluation issue

On Jan 5, 2011, at 14:44 , John Fox wrote:

Dear Gabor,

I used str() to look at the two objects but missed the difference that
you
found. What I didn't quite understand was why one model worked but not
the
other when both were defined at the command prompt in the global
environment.
I kind of suspect that the bug is that mod.1 works... I.e., I can vaguely
make out the  contours of why mod.2 is not supposed to work and if that is
true, neither should mod.1. However, if so, something clearly needs more
work. Possibly, some of the people who worked on implement formula
environments may want to chime in? (It's been a while, though.)

Thanks,
John

--------------------------------
John Fox
Senator William McMaster
 Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

-----Original Message-----
From: r-devel-bounces at r-project.org
[mailto:r-devel-bounces at r-project.org]
On
Behalf Of Gabor Grothendieck
Sent: January-04-11 6:56 PM
To: John Fox
Cc: Sanford Weisberg; r-devel at r-project.org
Subject: Re: [Rd] scoping/non-standard evaluation issue

On Tue, Jan 4, 2011 at 4:35 PM, John Fox <jfox at mcmaster.ca> wrote:
Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------

mod.1 <- lm(Employed ~ GNP.deflator + GNP + Unemployed +
+         Armed.Forces + Population + Year, data=longley)

mod.2 <- update(mod.1, . ~ . - Year + Year)

all.equal(mod.1, mod.2)
[1] TRUE
f <- function(mod){
+     subs <- 1:10
+     update(mod, subset=subs)
+     }

f(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
   Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f(mod.2)
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and
mod.2,
or
the formulas therein, are associated with different environments, but
I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------

f1 <- function(mod){
+     assign(".subs", 1:10, envir=.GlobalEnv)
+     on.exit(remove(".subs", envir=.GlobalEnv))
+     update(mod, subset=.subs)
+     }

f1(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f1(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f2 <- function(mod){
+     env <- new.env(parent=.GlobalEnv)
+     attach(NULL)
+     on.exit(detach())
+     assign(".subs", 1:10, pos=2)
+     update(mod, subset=.subs)
+     }

f2(mod.1)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

f2(mod.2)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
   Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 Population          Year
  1.164e+00    -1.911e+00

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs
in
the
global environment; the problem with f2() is that .subs can be masked
by
a
variable in the global environment.

Is there a better approach?

I think there is something wrong with R here since the formula in the
call component of mod.1 has a "call" class whereas the corresponding
call component of mod.2 has "formula" class:

class(mod.1$call[[2]])
[1] "call"
class(mod.2$call[[2]])
[1] "formula"

If we reset call[[2]] to have "call" class then it works:

mod.2a <- mod.2
mod.2a$call[[2]] <- as.call(as.list(mod.2a$call[[2]]))
f(mod.2a)
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces
+
   Population + Year, data = longley, subset = subs)

Coefficients:
(Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
Population          Year
  3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
 1.164e+00    -1.911e+00

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20110106/3470220c/attachment.pl>

the following seems an easy solution:

f1 <- function(mod){
    subs <- 1:10
    toeval <- quote(update(mod, subset=subs))
    toeval$subset<-subs
    eval(toeval)
    }

f1(mod.2)
Tere, Kenn!

Yes, enforcing pass-by-value by pre-evaluating the argument will certainly defeat the nonstandard evaluation issues. Another version of the same idea is

eval(bquote(update(mod, .(subs)))

The only thing is that if the argument is ever deparsed, you might get a messy display. E.g., try eval(bquote(plot(.(rnorm(20)))))

-pd
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com