Skip to content

scoping/non-standard evaluation issue

9 messages · Gabor Grothendieck, Peter Dalgaard, John Fox +1 more

#
Dear r-devel list members,

On a couple of occasions I've encountered the issue illustrated by the
following examples:

--------- snip -----------
+         Armed.Forces + Population + Year, data=longley)
[1] TRUE
+     subs <- 1:10
+     update(mod, subset=subs)
+     }
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
Error in eval(expr, envir, enclos) : object 'subs' not found

--------- snip -----------

I *almost* understand what's going -- that is, clearly mod.1 and mod.2, or
the formulas therein, are associated with different environments, but I
don't quite see why.

Anyway, here are two "solutions" that work, but neither is in my view
desirable:

--------- snip -----------
+     assign(".subs", 1:10, envir=.GlobalEnv)
+     on.exit(remove(".subs", envir=.GlobalEnv))
+     update(mod, subset=.subs)
+     }
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
+     env <- new.env(parent=.GlobalEnv)
+     attach(NULL)
+     on.exit(detach())
+     assign(".subs", 1:10, pos=2)
+     update(mod, subset=.subs)
+     }
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces + 
    Population + Year, data = longley, subset = .subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces  
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03  
  Population          Year  
   1.164e+00    -1.911e+00  

--------- snip -----------

The problem with f1() is that it will clobber a variable named .subs in the
global environment; the problem with f2() is that .subs can be masked by a
variable in the global environment.

Is there a better approach?

Thanks,
 John

--------------------------------
John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
#
On Jan 4, 2011, at 22:35 , John Fox wrote:

            
I think the best way would be to modify the environment of the formula. Something like the below, except that it doesn't actually work...

f3 <- function(mod) {
  f <- formula(mod)
  environment(f) <- e <-  new.env(parent=environment(f))
  mod <- update(mod, formula=f)
  evalq(.subs <- 1:10, e)
  update(mod, subset=.subs)
}

The catch is that it is not quite so easy to update the formula of a model.
#
On Tue, Jan 4, 2011 at 4:35 PM, John Fox <jfox at mcmaster.ca> wrote:
I think there is something wrong with R here since the formula in the
call component of mod.1 has a "call" class whereas the corresponding
call component of mod.2 has "formula" class:
[1] "call"
[1] "formula"

If we reset call[[2]] to have "call" class then it works:
Call:
lm(formula = Employed ~ GNP.deflator + GNP + Unemployed + Armed.Forces +
    Population + Year, data = longley, subset = subs)

Coefficients:
 (Intercept)  GNP.deflator           GNP    Unemployed  Armed.Forces
 Population          Year
   3.641e+03     8.394e-03     6.909e-02    -3.971e-03    -8.595e-03
  1.164e+00    -1.911e+00
#
Dear Peter,

I played around a bit with your suggestion but wasn't able to get it to
work. 

Thanks for this.

John

--------------------------------
John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
On
or
the
a
model.
#
Dear Gabor,

I used str() to look at the two objects but missed the difference that you
found. What I didn't quite understand was why one model worked but not the
other when both were defined at the command prompt in the global
environment.

Thanks,
 John

--------------------------------
John Fox
Senator William McMaster 
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox
On
or
the
a
#
On Jan 5, 2011, at 14:44 , John Fox wrote:

            
I kind of suspect that the bug is that mod.1 works... I.e., I can vaguely make out the  contours of why mod.2 is not supposed to work and if that is true, neither should mod.1. However, if so, something clearly needs more work. Possibly, some of the people who worked on implement formula environments may want to chime in? (It's been a while, though.)

  
    
#
Dear Peter,

You hit the nail on the head: I didn't (and don't) understand why mod.1
works -- which I attributed to my imperfect understanding of non-standard
evaluation. Even if there's a bug allowing mod.1 to work, I wonder about the
consequences of fixing it. That might break a lot of code. It would seem
desirable, though, for mod.1 and mod.2 to behave the same.

Best,
 John
you
the
[mailto:r-devel-bounces at r-project.org]
+
mod.2,
I
+
+
+
+
in
by
+
#
On Jan 6, 2011, at 13:11 , Kenn Konstabel wrote:

            
Tere, Kenn!

Yes, enforcing pass-by-value by pre-evaluating the argument will certainly defeat the nonstandard evaluation issues. Another version of the same idea is

eval(bquote(update(mod, .(subs)))

The only thing is that if the argument is ever deparsed, you might get a messy display. E.g., try eval(bquote(plot(.(rnorm(20)))))


-pd