Skip to content

lm looking for weights outside of the user-defined function

8 messages · Dimitri Liakhovitski, William Dunlap, David Winsemius

#
Dear R'ers,

I am fighting with a problem that is driving me crazy. I use "lm" in
my user-defined function, but it seems to be looking for weights
outside of my function's environment:

### Generating example data:
x<-data.frame(y=rnorm(100,0,1),a=rnorm(100,1,1),b=rnorm(100,2,1))
myweights<-runif(100)
data.for.regression<-x[1:3]

### Creating function "weighted.reg":
weighted.reg=function(formula, MyData, filename,WeightsVector)
{
	print(dim(MyData))
	print(filename)
	print(length(WeightsVector))
	regr.f<-lm(formula,MyData,weights=WeightsVector,na.action=na.omit)
	results<-as.data.frame(round(summary(regr.f)$coeff,3))
	write.csv(results,file=filename)
	return(results)
}

### Running "weighted.reg" with my data:
reg2<-weighted.reg(y~., MyData=x, WeightsVector=myweights, filename="TEST.csv")


I get an error: Error in eval(expr, envir, enclos) : object
'WeightsVector' not found
Notice, that the function correctly prints length(WeightsVector). But
it looks like "lm" is looking for weights (in the 4th line of the
function) OUTSIDE the function and does not see WeightsVector.
Why is it looking outside the function for the object that has just
been defined inside the function?


Thank you very much!
#
On Oct 22, 2010, at 9:01 AM, Dimitri Liakhovitski wrote:

            
Have you tried putting WeightsVector in the "x" dataframe? That would  
seem to reduce the potential for environmental conflation.

 From the details section of help(lm):
"All of weights, subset and offset are evaluated in the same way as  
variables in formula, that is first in data and then in the  
environment of formula."
David Winsemius, MD
West Hartford, CT
#
David,
I undersand - and I am sure what you are suggesting should work. But I
just can't understand why it's not grabbing things INSIDE the
environment of the formula first.
I've already tried to define the weights outside of the function - and
it finds them.

But shouldn't it go in this order?
1. Look in the data frame
2. Look in the environment of the user-defined function
3. Look outside.

Dimitri
On Fri, Oct 22, 2010 at 9:15 AM, David Winsemius <dwinsemius at comcast.net> wrote:

  
    
#
On Oct 22, 2010, at 9:18 AM, Dimitri Liakhovitski wrote:

            
I am not sure that either one of us understand what is meant by "the  
environment of the formula".
Hey, I only work here, I don't make the rules, I just follow them. I  
agree that one might guess that to be the search order, but it is not  
what is documented.
#
As you suggested, David, the code below works.
Now I it can find the weights - because they are in the data frame x.
But how can I be sure now that it actually grabs the data from the
data frame "variables" and not the data frame x?

x<-data.frame(y=rnorm(100,0,1),a=rnorm(100,1,1),b=rnorm(100,2,1),myweights=runif(100))
names(x)

weighted.reg=function(formula, MyData, filename,WeightsVector)
{
	variables<-MyData[1:(length(MyData)-1)]    # creating a data frame
without the weights
	print(dim(MyData))
	print(filename)
	print(length(WeightsVector))
	regr.f<-lm(formula,variables,weights=WeightsVector,na.action=na.omit)
	results<-as.data.frame(round(summary(regr.f)$coeff,3))
	write.csv(results,file=filename)
	return(results)
}

reg2<-weighted.reg(y~., MyData=x, filename="TEST.csv",
WeightsVector=x$myweights)

Dimitri
On Fri, Oct 22, 2010 at 9:15 AM, David Winsemius <dwinsemius at comcast.net> wrote:

  
    
#
"The environment of the formula" is the output of
   environment(formula)
which is assigned to the current environment when the
formula is created.  The modelling functions look for
variables (in the formula, weights, and subset arguments)
in the order
   1) the data argument (usually an environment or a list)
   2) environment of the formula
When an environment is searched for a name, the search
continues through all ancestral environments until the
name is found or until you run out of ancestors.

You can reassign the environment of a formula.  E.g.,
compare the following two:

  > wr0 <- function(formula, MyData, WeightsVector) {
  +     lm(formula, data=MyData, weights=WeightsVector)
  + }
  > wr1 <- function(formula, MyData, WeightsVector) {
  +     environment(formula) <- environment()
  +     lm(formula, data=MyData, weights=WeightsVector)
  + }
  > wr0(mpg~cyl, MyData=mtcars, WeightsVector=sqrt(1:32))
  Error in eval(expr, envir, enclos) : object 'WeightsVector' not found
  > wr1(mpg~cyl, MyData=mtcars, WeightsVector=sqrt(1:32))

  Call:
  lm(formula = formula, data = MyData, weights = WeightsVector)

  Coefficients:
  (Intercept)          cyl
       38.567       -2.966

Reassigning the environment can lead to the sort of
surprises that dynamic scoping gives you.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
On Oct 22, 2010, at 12:17 PM, William Dunlap wrote:

            
The wr0 call created a formula but the weights vector was "outside"  
that environment? And that wss because the formula creation was at the  
stage of evaluation the function arguments when tehy wouldn't "see"  
each other?  Except this works:

 > xtoy <- function(x = 1:2, y=x){y}
 > xtoy()
[1] 1 2

I'm trying to figure out what makes the wr0 version fail and that  
xtoy() function succeed.
#
...
The basic reason is that lm() calls eval() to evaluate
things in the formula, weights, and subset arguments
in a nonstandard (but well defined) way and xtoy() uses
the standard argument evaluation rules.
 
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com