Skip to content

problem with lapply(x, subset, ...) and variable select argument

8 messages · Thomas Lumley, Joerg van den Hoff, Dimitris Rizopoulos +2 more

#
I need to extract identically named columns from several data frames in 
a list. the column name is a variable (i.e. not known in advance). the 
whole thing occurs within a function body. I'd like to use lapply with a
variable 'select' argument.


example:

tt <- function (n) {
    x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
    for (xx in x) print(subset(xx, select = n))   ### works
    print (lapply(x, subset, select = a))   ### works
    print (lapply(x, subset, select = "a"))  ### works
    print (lapply(x, subset, select = n))  ### does not work as intended
}
n = "b"
tt("a")  #works (but selects not the intended column)
rm(n)
tt("a")   #no longer works in the lapply call including variable 'n'


question: how  can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? ....


many thanks

joerg
#
On Mon, 10 Oct 2005, joerg van den hoff wrote:

            
You would probably be better off using "[" rather than subset().

tt <- function (n) {
     x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
     print(lapply(x,"[",n))
}

seems to do what you want.

 	-thomas
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the environment
in lapply since tt does not call subset directly but rather lapply does.

Try this which is similar except we have added the line beginning
with environment before the print statement.

tt <- function (n) {
   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   environment(lapply) <- environment()
   print(lapply(x, subset, select = n))
}

n <- "b"
tt("a")

What this does is create a new version of lapply whose
parent is the environment in tt.
On 10/10/05, joerg van den hoff <j.van_den_hoff at fz-rossendorf.de> wrote:
#
Gabor Grothendieck wrote:
many thanks to thomas and gabor for their help. both solutions solve my 
problem perfectly.

but just as an attempt to improve my understanding of the inner workings 
of R (similar problems are sure to come up ...) two more question:

1.
why does the call of the "[" function (thomas' solution) behave 
different from "subset" in that the look up of the variable "n" works 
without providing lapply with the current environment (which is nice)?

2.
using 'subset' in this context becomes more cumbersome, if sapply is 
used. it seems that than I need
...
environment(sapply) <- environment(lapply) <- environment()
sapply(x, subset, select = n))
...
to get it working (and that means you must know, that sapply uses 
lapply). or can I somehow avoid the additional explicit definition of 
the lapply-environment?


again: many thanks

joerg
#
As Gabor said, the issue here is that subset.data.frame() evaluates 
the value of the `select' argument in the parent.frame(); Thus, if you 
create a local function within lapply() (or sapply()) it works:

tt <- function (n) {
    x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
    print(lapply(x, function(y, n) subset(y, select = n), n = n))
    print(sapply(x, function(y, n) subset(y, select = n), n = n))
}

tt("a")


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
     http://www.student.kuleuven.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: "joerg van den hoff" <j.van_den_hoff at fz-rossendorf.de>
To: "Gabor Grothendieck" <ggrothendieck at gmail.com>; "Thomas Lumley" 
<tlumley at u.washington.edu>
Cc: "r-help" <r-help at stat.math.ethz.ch>
Sent: Tuesday, October 11, 2005 10:18 AM
Subject: Re: [R] problem with lapply(x, subset,...) and variable 
select argument
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
#
"Dimitris Rizopoulos" <dimitris.rizopoulos at med.kuleuven.be> writes:
It's more complicated than that: It evaluates the select argument in a
named list with names duplicating those of the data frame, and *then*
in parent.frame. This is convenient for command line use, because you
can specify ranges of variables as in

  dfsub <- subset(dfr,select=c(sex:treat, x_pre:x_24))

but it is quite risky to try and do this inside a function - if you're
passing in a variable, the result depends on whether there is a
variable of the same name in the data frame! You can probably get
around it using substitute() constructions, but I think it is safer to
avoid using functions with nonstandard semantics inside functions.

  
    
#
On Tue, 11 Oct 2005, joerg van den hoff wrote:
"[" behaves like nearly all functions in R: the value of the argument is 
passed.   subset() does some tricky things to subvert the usual argument 
passing.  Quite a few of the modelling functions do similar tricky things, 
and they do sometimes get confused when passed as arguments to another 
function.
You really don't want to go around playing with environment() on 
functions. That way lies madness.  Use subset at the command line and [ or 
[[ in programming.  I don't think I have ever set environment() on a 
function (only on formulas).


 	-thomas
#
Just one simple shortening of DR's solution:

tt <- function (n) {
   x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
   print(sapply(x, function(...) subset(...), select = n))
}

n <- "b"
tt("a")
On 10/11/05, Dimitris Rizopoulos <dimitris.rizopoulos at med.kuleuven.be> wrote: