Skip to content

works in R-1.1.1 but not in R-development; why?

2 messages · Luke Tierney, Ramon Diaz-Uriarte

#
Peter Dalgaard BSA wrote:
Not really, but I guess I have no choice :-).  Here is my take on this:

The simple solution is to use

lapply(split(datai,datai$counter),
       function(datos,formula) {lm(formula = formula, data = datos,
                                   weights = x2)},
       formula = formula) 

i.e. use x2 instead of datos$x2 as the weights argument.  This works
in both 1.1.1 and in the devel branch.

A long-winded explanation:

What makes using lm and friends in functions difficult is that some of
its arguments are used for value and some are used for expression (for
lack of better terms).  Arguments used for value are ordinary function
arguments that are evaluated internally by the standard function
evaluation process; the data argument is one.  (For value arguments
you can almost think of them as being computed before the function
call and only their values are passed.)  Expression arguments are not
evaluated directly.  Instead their expressions are captured (by
substitute or something similar), those expressions are then examined,
possibly modified, and then possibly evaluated through an explicit
call to eval using some context. The weights argument is an expression
argument.

The key in understanding how expression arguments work is knowing and
perhaps controlling the environment used for evaluating them (*and*
knowing which they are--the documentation isn't as helpful as it could
be here).  The changes Robert made to lm and related functions are a
first step in trying to make the context in which expression arguments
are used a bit more rational and controllable, in particular when no
explicit data frames are supplied.

The 1.1.1 rules were to evaluate expression arguments in an
environment consisting of the data frame and the environment of the
caller of lm.  The (intended at least) new rules, which may still
change, are that the evaluation environment consist of the data frame
and the environment in which the formula was constructed.

The two approaches both use the data frame, if supplied, as the first
place to find variable; they only differ in how they handle the case
where the data frame does not contain the values.

The call used in this example is:

	lm(formula = formula, data = datos, weights = datos$x2)

An explicit data argument is given but the data frame datos only
contains the components y, x1, x2, x3, and counter.  The expression
provided as the weights argument is datos$x2.  When the expression is
evaluated, datos is not found in the data frame provided, so the
default environment is used.  Under 1.1.1 that is the caller's
environment and you get what you want. In the devel branch it is the
environment in which the formula was created, which is the global
environment.

With all that, back to the simple solution: The weights argument x2 is
an expression argument and is intended to refer to the x2 component of
the data frame.  When the expression is evaluated, the first place the
evaluation looks for variable values, both in 1.1.1 and in devel, is
the data frame.  So you get the answer you want in both cases.

luke
#
Thanks a lot to Thomas Lumley, Peter Dalgaard, and Luke Tierney for their
answers. 


I think I understand why the function failed, and the workaorunds. However,
would it be appropriate if a call such as:


my.function <- function(formula, data){
# other stuff
    lm(formula = formula, data = data, weights = data$x2)
} 

my.function(formula, datai)


where to return an error, instead of simply ignoring the weights=data$x2
argument? (especially since such a thing used to work and might be used in
other code)

Ramon
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._