Skip to content

Lexical scoping for step and add1 functions

4 messages · Louisell, Paul T PW, Jeff Newmiller, S Ellison

#
Hi,

I've run into a problem calling the step function from within a function; I sent this to the R development list first, but the moderator said it was better suited to R help. My OS is Windows 7 and I'm using R version 3.2.3.

Here's a simple function to help reproduce the error:
      > test.FN
      function(dfr, scope, k=2){
      temp.lm=lm(scope$lower, data=dfr)
      step(temp.lm, scope=scope$upper, k=k)
      }

And here's the code that gives the error when calling the function above:
      # Begin by setting the rng seed.
      > set.seed(523)

      # Generate a design matrix and response.
      > X.des=matrix(abs(rnorm(50*20, sd=4)), nrow=50)
      > Y=20 + X.des[, 1:3] %*% matrix(c(3, -4, 2), nrow=3) + rnorm(50)
      > X.des=cbind(as.data.frame(X.des), Y)

      # Create the lower and upper formula components of a list.
      > test.scope=list(lower=as.formula(Y ~ 1), upper=as.formula(paste("Y ~ ", paste(names(X.des)[1:20], collapse=" + "), sep="")))

      # Run 'test.FN'.
      > test.FN(dfr=X.des, scope=test.scope)
      Start:  AIC=257.58
      Y ~ 1

      Error in is.data.frame(data) : object 'dfr' not found
      > traceback()
      11: is.data.frame(data)
      10: model.frame.default(formula = Y ~ V1 + V2 + V3 + V4 + V5 + V6 +
              V7 + V8 + V9 + V10 + V11 + V12 + V13 + V14 + V15 + V16 +
              V17 + V18 + V19 + V20, data = dfr, drop.unused.levels = TRUE)
      9: stats::model.frame(formula = Y ~ V1 + V2 + V3 + V4 + V5 + V6 +
             V7 + V8 + V9 + V10 + V11 + V12 + V13 + V14 + V15 + V16 +
             V17 + V18 + V19 + V20, data = dfr, drop.unused.levels = TRUE)
      8: eval(expr, envir, enclos)
      7: eval(fcall, env)
      6: model.frame.lm(fob, xlev = object$xlevels)
      5: model.frame(fob, xlev = object$xlevels)
      4: add1.lm(fit, scope$add, scale = scale, trace = trace, k = k,
             ...)
      3: add1(fit, scope$add, scale = scale, trace = trace, k = k, ...)
      2: step(temp.lm, scope = scope$upper, k = k) at #3
      1: test.FN(dfr = X.des, scope = test.scope)

The call to the traceback function indicates add1 doesn't see the dataframe dfr  passed to test.FN. The step function runs fine when I do everything in the global environment without using test.FN. I know the lexical scoping rules are different for objects involving model formulae, but despite a fair amount of experimentation, I haven't found any way to make the step / add1 functions see the dataframe that's passed to test.FN. Any help would be greatly appreciated.

Thanks,

Paul Louisell
Statistical Specialist
Paul.Louisell at pw.utc.com<mailto:Paul.Louisell at pw.utc.com>
860-565-8104

Still, tomorrow's going to be another working day, and I'm trying to get some rest.
That's all, I'm trying to get some rest.
Paul Simon, "American Tune"
1 day later
#
In a nutshell, formulas carry the environment in which they are 
defined along with the variable names, and your dfr was defined in the 
test.FN environment, but the formulas were defined in the global 
environment. I got this to work by defining the formula character strings 
in the global environment, and then converting those strings to formulas 
in the function. I don't think you can trick lm into referring to the 
global environment from within test.FN so that summaries refer to the 
X.des data frame instead of dfr (but someone could prove me wrong).

################################

test.FN <- function( dfr, scope, k = 2 ) {
   scp <- list( lower = as.formula( scope$lower )
 	     , upper = as.formula( scope$upper )
 	     )
   temp.lm <- lm( scp$lower
                , data = dfr
                )
   step( temp.lm
       , scope = scp
       , k=k
       )
}

# Begin by setting the rng seed.
set.seed( 523 )

# Generate a design matrix and response.
X.des <- matrix( abs( rnorm( 50 * 20, sd = 4 ) ), nrow = 50 )
Y <- 20 + X.des[, 1:3 ] %*% matrix( c( 3, -4, 2 ), nrow = 3 ) + rnorm( 50 )
X.des <- cbind( as.data.frame( X.des ), Y )

# Create the lower and upper formula components of a list.
test.scope <- list( lower = "Y ~ 1"
 		  , upper = paste( "Y ~"
 				 , paste( names( X.des )[ 1:20 ]
 					, collapse = " + "
 					)
 				 , sep=""
 				 )
 		  )
# Run 'test.FN'.
test.FN( dfr = X.des
        , scope = test.scope
        )
On Mon, 7 Mar 2016, Louisell, Paul T PW wrote:

            
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
#
If you want a function to refer to something in the global environment, just refer to the global object in the function. If the object name isn't used in the function's scope, it is sought in the parent environment. So the original code works if 
test.FN <-  function(scope, k=2){
      temp.lm=lm(scope$lower, data=X.des)  ## X.des is sought in parent environment
      step(temp.lm, scope=scope$upper, k=k)
      }

Admittedly, I'd not regard that kind of thing as a good idea; fragile and inflexible. But if you are clear about scope it does work.

Another way to proceed, somewhat more safely, is to wrap the whole thing in a function, passing X.des as dfr, then defining test.FN inside the outer function so that you know where it's going to get dfr from. Something along the lines of

test.step <- function(dfr, Y) {
	test.FN <-  function(scope, k=2){
      		temp.lm=lm(scope$lower, data=dfr)  ## X.des 
		print(temp.lm)
		 step(temp.lm, scope=as.formula(scope$upper), k=k)
      	}
	scope <- list( lower= as.formula('Y~1'), 
		upper=as.formula(paste('Y~', paste(names(dfr[1:20]), collapse="+"))) 
	)
	test.FN(scope=scope)
}

test.step(X.des, Y)


S Ellison



*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}
#
So in both my solution and your second option, lm prints that it evaluated the regression in a function context (using dfr) which the user of the function might prefer to be unaware of (they know what X.des is). Your first solution  avoids that but hardcodes access to the global variable so if the user wants to use a different data frame then a different function has to be defined. I am OK with that, but thought that there might be a way to indirectly tell lm to use the global environment via the parameter dfr.