I need to extract identically named columns from several data frames in
a list. the column name is a variable (i.e. not known in advance). the
whole thing occurs within a function body. I'd like to use lapply with a
variable 'select' argument.
example:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n)) ### works
print (lapply(x, subset, select = a)) ### works
print (lapply(x, subset, select = "a")) ### works
print (lapply(x, subset, select = n)) ### does not work as intended
}
n = "b"
tt("a") #works (but selects not the intended column)
rm(n)
tt("a") #no longer works in the lapply call including variable 'n'
question: how can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? ....
many thanks
joerg
problem with lapply(x, subset, ...) and variable select argument
8 messages · Thomas Lumley, Joerg van den Hoff, Dimitris Rizopoulos +2 more
On Mon, 10 Oct 2005, joerg van den hoff wrote:
I need to extract identically named columns from several data frames in a list. the column name is a variable (i.e. not known in advance). the whole thing occurs within a function body. I'd like to use lapply with a variable 'select' argument.
You would probably be better off using "[" rather than subset().
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
print(lapply(x,"[",n))
}
seems to do what you want.
-thomas
example:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n)) ### works
print (lapply(x, subset, select = a)) ### works
print (lapply(x, subset, select = "a")) ### works
print (lapply(x, subset, select = n)) ### does not work as intended
}
n = "b"
tt("a") #works (but selects not the intended column)
rm(n)
tt("a") #no longer works in the lapply call including variable 'n'
question: how can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? ....
many thanks
joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the environment
in lapply since tt does not call subset directly but rather lapply does.
Try this which is similar except we have added the line beginning
with environment before the print statement.
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
environment(lapply) <- environment()
print(lapply(x, subset, select = n))
}
n <- "b"
tt("a")
What this does is create a new version of lapply whose
parent is the environment in tt.
On 10/10/05, joerg van den hoff <j.van_den_hoff at fz-rossendorf.de> wrote:
I need to extract identically named columns from several data frames in
a list. the column name is a variable (i.e. not known in advance). the
whole thing occurs within a function body. I'd like to use lapply with a
variable 'select' argument.
example:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n)) ### works
print (lapply(x, subset, select = a)) ### works
print (lapply(x, subset, select = "a")) ### works
print (lapply(x, subset, select = n)) ### does not work as intended
}
n = "b"
tt("a") #works (but selects not the intended column)
rm(n)
tt("a") #no longer works in the lapply call including variable 'n'
question: how can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? ....
many thanks
joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Gabor Grothendieck wrote:
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the environment
in lapply since tt does not call subset directly but rather lapply does.
Try this which is similar except we have added the line beginning
with environment before the print statement.
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
environment(lapply) <- environment()
print(lapply(x, subset, select = n))
}
n <- "b"
tt("a")
What this does is create a new version of lapply whose
parent is the environment in tt.
On 10/10/05, joerg van den hoff <j.van_den_hoff at fz-rossendorf.de> wrote:
I need to extract identically named columns from several data frames in
a list. the column name is a variable (i.e. not known in advance). the
whole thing occurs within a function body. I'd like to use lapply with a
variable 'select' argument.
example:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n)) ### works
print (lapply(x, subset, select = a)) ### works
print (lapply(x, subset, select = "a")) ### works
print (lapply(x, subset, select = n)) ### does not work as intended
}
n = "b"
tt("a") #works (but selects not the intended column)
rm(n)
tt("a") #no longer works in the lapply call including variable 'n'
question: how can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval and
specifying the correct evaluation frame, but how? ....
many thanks
joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
many thanks to thomas and gabor for their help. both solutions solve my problem perfectly. but just as an attempt to improve my understanding of the inner workings of R (similar problems are sure to come up ...) two more question: 1. why does the call of the "[" function (thomas' solution) behave different from "subset" in that the look up of the variable "n" works without providing lapply with the current environment (which is nice)? 2. using 'subset' in this context becomes more cumbersome, if sapply is used. it seems that than I need ... environment(sapply) <- environment(lapply) <- environment() sapply(x, subset, select = n)) ... to get it working (and that means you must know, that sapply uses lapply). or can I somehow avoid the additional explicit definition of the lapply-environment? again: many thanks joerg
As Gabor said, the issue here is that subset.data.frame() evaluates
the value of the `select' argument in the parent.frame(); Thus, if you
create a local function within lapply() (or sapply()) it works:
tt <- function (n) {
x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
print(lapply(x, function(y, n) subset(y, select = n), n = n))
print(sapply(x, function(y, n) subset(y, select = n), n = n))
}
tt("a")
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "joerg van den hoff" <j.van_den_hoff at fz-rossendorf.de>
To: "Gabor Grothendieck" <ggrothendieck at gmail.com>; "Thomas Lumley"
<tlumley at u.washington.edu>
Cc: "r-help" <r-help at stat.math.ethz.ch>
Sent: Tuesday, October 11, 2005 10:18 AM
Subject: Re: [R] problem with lapply(x, subset,...) and variable
select argument
Gabor Grothendieck wrote:
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the
environment
in lapply since tt does not call subset directly but rather lapply
does.
Try this which is similar except we have added the line beginning
with environment before the print statement.
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
environment(lapply) <- environment()
print(lapply(x, subset, select = n))
}
n <- "b"
tt("a")
What this does is create a new version of lapply whose
parent is the environment in tt.
On 10/10/05, joerg van den hoff <j.van_den_hoff at fz-rossendorf.de>
wrote:
I need to extract identically named columns from several data
frames in
a list. the column name is a variable (i.e. not known in advance).
the
whole thing occurs within a function body. I'd like to use lapply
with a
variable 'select' argument.
example:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n)) ### works
print (lapply(x, subset, select = a)) ### works
print (lapply(x, subset, select = "a")) ### works
print (lapply(x, subset, select = n)) ### does not work as
intended
}
n = "b"
tt("a") #works (but selects not the intended column)
rm(n)
tt("a") #no longer works in the lapply call including variable
'n'
question: how can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval
and
specifying the correct evaluation frame, but how? ....
many thanks
joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
many thanks to thomas and gabor for their help. both solutions solve my problem perfectly. but just as an attempt to improve my understanding of the inner workings of R (similar problems are sure to come up ...) two more question: 1. why does the call of the "[" function (thomas' solution) behave different from "subset" in that the look up of the variable "n" works without providing lapply with the current environment (which is nice)? 2. using 'subset' in this context becomes more cumbersome, if sapply is used. it seems that than I need ... environment(sapply) <- environment(lapply) <- environment() sapply(x, subset, select = n)) ... to get it working (and that means you must know, that sapply uses lapply). or can I somehow avoid the additional explicit definition of the lapply-environment? again: many thanks joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
"Dimitris Rizopoulos" <dimitris.rizopoulos at med.kuleuven.be> writes:
As Gabor said, the issue here is that subset.data.frame() evaluates the value of the `select' argument in the parent.frame(); Thus, if you create a local function within lapply() (or sapply()) it works:
It's more complicated than that: It evaluates the select argument in a named list with names duplicating those of the data frame, and *then* in parent.frame. This is convenient for command line use, because you can specify ranges of variables as in dfsub <- subset(dfr,select=c(sex:treat, x_pre:x_24)) but it is quite risky to try and do this inside a function - if you're passing in a variable, the result depends on whether there is a variable of the same name in the data frame! You can probably get around it using substitute() constructions, but I think it is safer to avoid using functions with nonstandard semantics inside functions.
tt <- function (n) {
x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
print(lapply(x, function(y, n) subset(y, select = n), n = n))
print(sapply(x, function(y, n) subset(y, select = n), n = n))
}
tt("a")
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "joerg van den hoff" <j.van_den_hoff at fz-rossendorf.de>
To: "Gabor Grothendieck" <ggrothendieck at gmail.com>; "Thomas Lumley"
<tlumley at u.washington.edu>
Cc: "r-help" <r-help at stat.math.ethz.ch>
Sent: Tuesday, October 11, 2005 10:18 AM
Subject: Re: [R] problem with lapply(x, subset,...) and variable
select argument
Gabor Grothendieck wrote:
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the
environment
in lapply since tt does not call subset directly but rather lapply
does.
Try this which is similar except we have added the line beginning
with environment before the print statement.
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
environment(lapply) <- environment()
print(lapply(x, subset, select = n))
}
n <- "b"
tt("a")
What this does is create a new version of lapply whose
parent is the environment in tt.
On 10/10/05, joerg van den hoff <j.van_den_hoff at fz-rossendorf.de>
wrote:
I need to extract identically named columns from several data
frames in
a list. the column name is a variable (i.e. not known in advance).
the
whole thing occurs within a function body. I'd like to use lapply
with a
variable 'select' argument.
example:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n)) ### works
print (lapply(x, subset, select = a)) ### works
print (lapply(x, subset, select = "a")) ### works
print (lapply(x, subset, select = n)) ### does not work as
intended
}
n = "b"
tt("a") #works (but selects not the intended column)
rm(n)
tt("a") #no longer works in the lapply call including variable
'n'
question: how can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval
and
specifying the correct evaluation frame, but how? ....
many thanks
joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
many thanks to thomas and gabor for their help. both solutions solve my problem perfectly. but just as an attempt to improve my understanding of the inner workings of R (similar problems are sure to come up ...) two more question: 1. why does the call of the "[" function (thomas' solution) behave different from "subset" in that the look up of the variable "n" works without providing lapply with the current environment (which is nice)? 2. using 'subset' in this context becomes more cumbersome, if sapply is used. it seems that than I need ... environment(sapply) <- environment(lapply) <- environment() sapply(x, subset, select = n)) ... to get it working (and that means you must know, that sapply uses lapply). or can I somehow avoid the additional explicit definition of the lapply-environment? again: many thanks joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Tue, 11 Oct 2005, joerg van den hoff wrote:
many thanks to thomas and gabor for their help. both solutions solve my problem perfectly. but just as an attempt to improve my understanding of the inner workings of R (similar problems are sure to come up ...) two more question: 1. why does the call of the "[" function (thomas' solution) behave different from "subset" in that the look up of the variable "n" works without providing lapply with the current environment (which is nice)?
"[" behaves like nearly all functions in R: the value of the argument is passed. subset() does some tricky things to subvert the usual argument passing. Quite a few of the modelling functions do similar tricky things, and they do sometimes get confused when passed as arguments to another function.
2. using 'subset' in this context becomes more cumbersome, if sapply is used. it seems that than I need ... environment(sapply) <- environment(lapply) <- environment() sapply(x, subset, select = n)) ... to get it working (and that means you must know, that sapply uses lapply). or can I somehow avoid the additional explicit definition of the lapply-environment?
You really don't want to go around playing with environment() on functions. That way lies madness. Use subset at the command line and [ or [[ in programming. I don't think I have ever set environment() on a function (only on formulas). -thomas
Just one simple shortening of DR's solution:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
print(sapply(x, function(...) subset(...), select = n))
}
n <- "b"
tt("a")
On 10/11/05, Dimitris Rizopoulos <dimitris.rizopoulos at med.kuleuven.be> wrote:
As Gabor said, the issue here is that subset.data.frame() evaluates
the value of the `select' argument in the parent.frame(); Thus, if you
create a local function within lapply() (or sapply()) it works:
tt <- function (n) {
x <- list(data.frame(a = 1, b = 2), data.frame(a = 3, b = 4))
print(lapply(x, function(y, n) subset(y, select = n), n = n))
print(sapply(x, function(y, n) subset(y, select = n), n = n))
}
tt("a")
I hope it helps.
Best,
Dimitris
----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven
Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://www.med.kuleuven.be/biostat/
http://www.student.kuleuven.be/~m0390867/dimitris.htm
----- Original Message -----
From: "joerg van den hoff" <j.van_den_hoff at fz-rossendorf.de>
To: "Gabor Grothendieck" <ggrothendieck at gmail.com>; "Thomas Lumley"
<tlumley at u.washington.edu>
Cc: "r-help" <r-help at stat.math.ethz.ch>
Sent: Tuesday, October 11, 2005 10:18 AM
Subject: Re: [R] problem with lapply(x, subset,...) and variable
select argument
Gabor Grothendieck wrote:
The problem is that subset looks into its parent frame but in this
case the parent frame is not the environment in tt but the
environment
in lapply since tt does not call subset directly but rather lapply
does.
Try this which is similar except we have added the line beginning
with environment before the print statement.
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
environment(lapply) <- environment()
print(lapply(x, subset, select = n))
}
n <- "b"
tt("a")
What this does is create a new version of lapply whose
parent is the environment in tt.
On 10/10/05, joerg van den hoff <j.van_den_hoff at fz-rossendorf.de>
wrote:
I need to extract identically named columns from several data
frames in
a list. the column name is a variable (i.e. not known in advance).
the
whole thing occurs within a function body. I'd like to use lapply
with a
variable 'select' argument.
example:
tt <- function (n) {
x <- list(data.frame(a=1,b=2), data.frame(a=3,b=4))
for (xx in x) print(subset(xx, select = n)) ### works
print (lapply(x, subset, select = a)) ### works
print (lapply(x, subset, select = "a")) ### works
print (lapply(x, subset, select = n)) ### does not work as
intended
}
n = "b"
tt("a") #works (but selects not the intended column)
rm(n)
tt("a") #no longer works in the lapply call including variable
'n'
question: how can I enforce evaluation of the variable n such that
the lapply call works? I suspect it has something to do with eval
and
specifying the correct evaluation frame, but how? ....
many thanks
joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
many thanks to thomas and gabor for their help. both solutions solve my problem perfectly. but just as an attempt to improve my understanding of the inner workings of R (similar problems are sure to come up ...) two more question: 1. why does the call of the "[" function (thomas' solution) behave different from "subset" in that the look up of the variable "n" works without providing lapply with the current environment (which is nice)? 2. using 'subset' in this context becomes more cumbersome, if sapply is used. it seems that than I need ... environment(sapply) <- environment(lapply) <- environment() sapply(x, subset, select = n)) ... to get it working (and that means you must know, that sapply uses lapply). or can I somehow avoid the additional explicit definition of the lapply-environment? again: many thanks joerg
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html