Skip to content

Write a function that allows access to columns of a passed dataframe.

21 messages · John Sorkin, Rui Barradas, David Winsemius +2 more

#
I am trying to write a function which, when passed the name of a dataframe and the name of a column of the dataframe, will allow me to work on the columns of a dataframe. I can not get my code to work. Please see the code below. Any help in getting the function to work would be appreciated.




mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)


myfun <- function(frame,var){
  call <- match.call()
  print(call)


  indx <- match(c("frame","var"),names(call),nomatch=0)
  print(indx)
  if(indx[1]==0) stop("Function called without sufficient arguments!")


  cat("I can get the name of the dataframe as a text string!\n")
  xx <- deparse(substitute(frame))
  print(xx)


  cat("I can get the name of the column as a text string!\n")
  yy <- deparse(substitute(var))
  print(yy)


  # This does not work.
  col <- xx[,"yy"]


  # Nor does this work.
  col <- xx[,yy]
  print(col)
}


myfun(mydf,age)

John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 




John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing) 





Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
#
Hello,

You don't need xx <- deparse(substitute(...)), since you are passing the 
data.frame to your function. Just use


myfun <- function(frame,var){

   [...]

   # Nor does this work.
   col <- frame[,yy]
   print(col)
}

myfun(mydf,age)
myfun(frame = mydf, var = age)
[1] 2 3
I can get the name of the dataframe as a text string!
[1] "mydf"
I can get the name of the column as a text string!
[1] "age"
[1] 20 34 43 32 21


Hope this helps,

Rui Barradas



Em 05-12-2016 14:44, John Sorkin escreveu:
#
I forgot to say that I've commented out the line

# This does not work.
#col <- xx[,"yy"]

Rui Barradas

Em 05-12-2016 15:17, Rui Barradas escreveu:
#
Rui,
I appreciate your suggestion, but eliminating the deparse statement does not solve my problem. Do you have any other suggestions? See code below.
Thank you,
John


mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)


myfun <- function(frame,var){
  call <- match.call()
  print(call)


  indx <- match(c("frame","var"),names(call),nomatch=0)
  print(indx)
  if(indx[1]==0) stop("Function called without sufficient arguments!")


  cat("I can get the name of the dataframe as a text string!\n")
  #xx <- deparse(substitute(frame))
  print(xx)


  cat("I can get the name of the column as a text string!\n")
  #yy <- deparse(substitute(var))
  print(yy)


  # This does not work.
  print(frame[,var])


  # This does not work.
  print(frame[,"var"])




  # This does not work.
  col <- xx[,"yy"]


  # Nor does this work.
  col <- xx[,yy]
  print(col)
}


myfun(mydf,age)




myfun()














John David Sorkin M.D., Ph.D.
Professor of Medicine
Chief, Biostatistics and Informatics
University of Maryland School of Medicine Division of Gerontology and Geriatric Medicine
Baltimore VA Medical Center
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
(Phone) 410-605-7119
(Fax) 410-605-7913 (Please call phone number above prior to faxing)
Hello,

You don't need xx <- deparse(substitute(...)), since you are passing the 
data.frame to your function. Just use


myfun <- function(frame,var){

   [...]

   # Nor does this work.
   col <- frame[,yy]
   print(col)
}

myfun(mydf,age)
myfun(frame = mydf, var = age)
[1] 2 3
I can get the name of the dataframe as a text string!
[1] "mydf"
I can get the name of the column as a text string!
[1] "age"
[1] 20 34 43 32 21


Hope this helps,

Rui Barradas



Em 05-12-2016 14:44, John Sorkin escreveu:
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
#
John:

I think you need to re-read about how functions pass arguments and
data frame access works in R.

What Rui meant was to get rid of all the xx stuff and access your column by:

col <- frame[, yy]

That *does* work.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Dec 5, 2016 at 7:29 AM, John Sorkin <jsorkin at grecc.umaryland.edu> wrote:
#
Bert is right, that's exactly what I mant.

Rui Barradas

Em 05-12-2016 15:36, Bert Gunter escreveu:
#
Hello,

For some reason I got moderation in reply to Bert's post so I'll retry.
Get rid of 'xx', keep 'yy':

frame[, yy]  # this works

Hope this helps,

Rui Barradas

Em 05-12-2016 15:29, John Sorkin escreveu:
#
When you use that calling syntax, the system will supply the values of whatever the `age` variable contains. (And if there is no `age`-named object, you get an error at the time of the call to `myfun`. You need either to call it as:

myfun( mydf , "age")


# Or:

age <- "age"
myfun( mydf, age)

Unless your value of the `age`-named variable was "age" in the calling environment (and you did not give us that value in either of your postings), you would fail.
#
Hello,

Inline.

Em 05-12-2016 17:09, David Winsemius escreveu:
Actually, no, which was very surprising to me but John's code worked 
(not the function, the call). And with the change I've proposed, it 
worked flawlessly. No errors. Why I don't know.

Rui Barradas

  You need either to call it as:
#
I see. Must be one of those "promise" things. It appears that if you don't actually require the value you can just pass a name with no value?

Thanks for the correction.
#
Inline.

-- Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Dec 5, 2016 at 9:53 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
#
Sorry, hit "Send" by mistake.

Inline.
On Mon, Dec 5, 2016 at 1:34 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
See ?substitute and in particular the example highlighted there.

The technical details are explained in the R Language Definition
manual. The key here is the use of promises for lay evaluations. In
fact, the expression in the call *is* available within the functions,
as is (a pointer to) the environment in which to evaluate the
expression. That is how substitute() works. Specifically, quoting from
the manual,

*****
It is possible to access the actual (not default) expressions used as
arguments inside the function. The mechanism is implemented via
promises. When a function is being evaluated the actual expression
used as an argument is stored in the promise together with a pointer
to the environment the function was called from. When (if) the
argument is evaluated the stored expression is evaluated in the
environment that the function was called from. Since only a pointer to
the environment is used any changes made to that environment will be
in effect during this evaluation. The resulting value is then also
stored in a separate spot in the promise. Subsequent evaluations
retrieve this stored value (a second evaluation is not carried out).
Access to the unevaluated expression is also available using
substitute.
********

-- Bert
#
Typo: "lazy evaluation" not "lay evaluation."

-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Dec 5, 2016 at 1:46 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
#
Hello,

Just to say that I wouldn't write the function as John did. I would get 
rid of all the deparse/substitute stuff and instinctively use a quoted 
argument as a column name. Something like the following.

myfun <- function(frame, var){
	[...]
	col <- frame[, var]  # or frame[[var]]
	[...]
}

myfun(mydf, "age")  # much better, simpler, no promises.

Rui Barradas

Em 05-12-2016 21:49, Bert Gunter escreveu:
#
Over my almost 50 years programming, I have come to believe that if one wants a program to be useful, one should write the program to do as much work as possible and demand as little as possible from the user of the program. In my opinion, one should not ask the person who uses my function to remember to put the name of the data frame column in quotation marks. The function should be written so that all that needs to be passed is the name of the column; the function should take care of the quotation marks.
Jihny
Confidentiality Statement:
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
#
Ok, that's a way of seeing it.

Rui Barradas

Em 06-12-2016 14:28, John Sorkin escreveu:
#
I basically agree with Rui - using substitute will cause trouble.  E.g., how
would the user iterate over the columns, calling your function for each?
     for(column in dataFrame) func(column)
would fail because dataFrame$column does not exist.  You need to provide
an extra argument to handle this case. something like the following:
     func <- function(df,
         columnAsName,,
         columnAsString = deparse(substitute(columnAsName))[1])
         ...
     }
The default value of columnAsString should also deal with the case that
the user supplied something like log(Conc.) instead of Conc.

I think that using a formula for the lazily evaluated argument
(columnAsName)
works well.  The user then knows exactly how it gets evaluated.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Tue, Dec 6, 2016 at 6:28 AM, John Sorkin <jsorkin at grecc.umaryland.edu>
wrote:
#
Perhaps the best way is the one used by library(), where both 
library(package) and library("package") work. It uses 
as.charecter/substitute, not deparse/substitute, as follows.

mydf <- 
data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)
str(mydf)

myfun <- function(frame,var){
	yy <- as.character(substitute(var))
	frame[, yy]
}

myfun(mydf, age)
myfun(mydf, "age")

Rui Barradas

Em 06-12-2016 15:03, William Dunlap escreveu:
#
Note that library has another argument, character.only=TRUE/FALSE,
to control whether the main argument should be regarded as a variable
or a literal.  I think you need two arguments to handle this.

Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Tue, Dec 6, 2016 at 7:33 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:

            

  
  
#
This would be an implementation that would support a multi-column extraction using a formula object:

mydf <- data.frame(id=c(1,2,3,4,5),sex=c("M","M","M","F","F"),age=c(20,34,43,32,21))
mydf
class(mydf)
str(mydf)

myfun <- function(frame, vars){
	yy <- terms(vars)
	frame[, attr(yy, "term.labels")]
}

myfun(mydf, ~age+sex)
David Winsemius
Alameda, CA, USA
#
Simpler I think: ?all.vars
[1] "A" "B"

Note also:
[1] "A"

Cheers,
Bert





"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Dec 6, 2016 at 10:41 AM, David Winsemius <dwinsemius at comcast.net> wrote: