Dear R friends
I wanted a function to make a simple percent table that would be easy for
students to use. The goal originally was to have a simple thing people
would call like this
pctable(rowvar, colvar, data)
and the things "rowvar" and "colvar" might be names of variables in data. I
wanted to avoid the usage of "with" (as we now see in the table help).
Then some people wanted more features, and I agreed with the suggestion to
create a formula interface that people can call like so:
pctable(rowvar ~ colvar, data)
I end up with a generic function pctable and methods pctable.default,
pctable.formula, pctable.character.
I got that working, mostly I understand what's going on.
Except the following, which, actually, is a good lesson to me about
promises and method dispatch in R. An S3 generic will not send a call with
a promise in the first argument to pctable.default (as I had mistakenly
hoped). I'll paste in all the code below, but I think you will know the
answer even without running it.
pctable is a generic function. In workspace, I have no objects x and y,
but there are variables inside data.frame dat named x and y. Since y is
not an object, the method dispatch fails thus:
pctable(y, x, dat)
Error in pctable(y, x, dat) (from #3) : object 'y' not found
This direct call on pctable.default works (recall y and x are promises):
pctable.default(y, x, dat)
Count (column %)
x
y 1 2 3 4 Sum
A 5(20%) 3(12%) 5(20%) 6(24%) 19
B 9(36%) 5(20%) 4(16%) 6(24%) 24
C 1(4%) 6(24%) 3(12%) 2(8%) 12
D 4(16%) 4(16%) 6(24%) 5(20%) 19
E 6(24%) 7(28%) 7(28%) 6(24%) 26
Sum 25 25 25 25 100
All the methods work fine when the first argument is a language object.
This works (dispatches to pctable.formula)
pctable(y ~ x, dat)
Count (column %)
x
y 1 2 3 4 Sum
A 5(20%) 3(12%) 5(20%) 6(24%) 19
B 9(36%) 5(20%) 4(16%) 6(24%) 24
C 1(4%) 6(24%) 3(12%) 2(8%) 12
D 4(16%) 4(16%) 6(24%) 5(20%) 19
E 6(24%) 7(28%) 7(28%) 6(24%) 26
Sum 25 25 25 25 100
This works (dispatches to pctable.default)
pctable(dat$y, dat$x)
Count (column %)
dat$x
dat$y 1 2 3 4 Sum
A 5(20%) 3(12%) 5(20%) 6(24%) 19
B 9(36%) 5(20%) 4(16%) 6(24%) 24
C 1(4%) 6(24%) 3(12%) 2(8%) 12
D 4(16%) 4(16%) 6(24%) 5(20%) 19
E 6(24%) 7(28%) 7(28%) 6(24%) 26
Sum 25 25 25 25 100
However, this fails because y is not an object with a type
pctable(y, x, dat)
Error in pctable(y, x, dat) (from #3) : object 'y' not found
Can R be tricked to send that call to pctable.default, where it does work?
Here's the code, I'm working on documentation, will put in package
rockchalk eventually, but hate to leave this problem until I fully
understand it.
pctable <- function(rv, ...)
{
UseMethod("pctable")
}
## rv: row variable, quoted or not
## cv: column variable, quoted or not
pctable.default <- function(rv, cv, data = parent.frame(),
rvlab = NULL, cvlab = NULL,
colpct = TRUE, rowpct = FALSE,
exclude = c(NA, NaN), rounded = FALSE)
{
rvlabel <- if (!missing(rv)) deparse(substitute(rv))
cvlabel <- if (!missing(cv)) deparse(substitute(cv))
rvlab <- if (is.null(rvlab)) rvlabel else rvlab
cvlab <- if (is.null(cvlab)) cvlabel else cvlab
rvin <- eval(substitute(rv), envir = data, enclos = parent.frame())
cvin <- eval(substitute(cv), envir = data, enclos = parent.frame())
t1 <- table(rvin, cvin, dnn = c(rvlab, cvlab), exclude = exclude)
rownames(t1)[is.na(rownames(t1))] <- "NA" ## symbol to letters
colnames(t1)[is.na(colnames(t1))] <- "NA"
if (rounded) t1 <- round(t1, -1)
t2 <- addmargins(t1, c(1,2))
t1colpct <- round(100*prop.table(t1, 2), 1)
t1rowpct <- round(100*prop.table(t1, 1), 1)
t1colpct <- apply(t1colpct, c(1,2), function(x) gsub("NaN", "", x))
t1rowpct <- apply(t1rowpct, c(1,2), function(x) gsub("NaN", "", x))
res <- list("count" = t2, "colpct" = t1colpct, "rowpct" = t1rowpct,
call = match.call())
class(res) <- "pctable"
print(res, colpct = colpct, rowpct = rowpct)
invisible(res)
}
pctable.formula <- function(formula, data = NULL, rvlab = NULL,
cvlab = NULL, colpct = TRUE, rowpct = FALSE,
exclude = c(NA, NaN), rounded = FALSE,
..., subset = NULL)
{
if (missing(data) || !is.data.frame(data)) stop("pctable requires a
data frame")
if (missing(formula) || (length(formula) != 3L))
stop("pctable requires a two sided formula")
mt <- terms(formula, data = data)
if (attr(mt, "response") == 0L) stop("response variable is required")
mf <- match.call(expand.dots = FALSE)
keepers <- match(c("formula", "data", "subset", "na.action"),
names(mf), 0L)
mf <- mf[c(1L, keepers)]
mf$drop.unused.levels <- FALSE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(mf, parent.frame())
## response is column 1
rvlab <- if (missing(rvlab)) colnames(mf)[1] else rvlab
cvlab <- if (missing(cvlab)) colnames(mf)[2] else cvlab
res <- pctable.default(mf[[1L]], mf[[2L]], data = mf,
rvlab = rvlab, cvlab = cvlab,
colpct = colpct, rowpct = rowpct,
exclude = exclude, rounded = rounded)
invisible(res)
}
pctable.character <- function(rowvar, colvar, data = NULL, rvlab = NULL,
cvlab = NULL, colpct = TRUE,
rowpct = FALSE, exclude = c(NA, NaN), rounded =
FALSE,
..., subset = NULL)
{
if (missing(data) || !is.data.frame(data)) stop("pctable requires a
data frame")
## colvar <- if (!is.character(colvar)) deparse(substitute(colvar))
else colvar
colvar <- as.character(substitute(colvar))[1L]
rvlab <- if (missing(rvlab)) rowvar else rvlab
cvlab <- if (missing(cvlab)) colvar else cvlab
t1 <- with(data, table(data[[rowvar]], data[[colvar]], dnn = c(rvlab,
cvlab), exclude = exclude))
rownames(t1)[is.na(rownames(t1))] <- "NA" ## symbol to letters
colnames(t1)[is.na(colnames(t1))] <- "NA"
if (rounded) t1 <- round(t1, -1)
t2 <- addmargins(t1, c(1,2))
t1colpct <- round(100*prop.table(t1, 2), 1)
t1rowpct <- round(100*prop.table(t1, 1), 1)
t1colpct <- apply(t1colpct, c(1,2), function(x) gsub("NaN", "", x))
t1rowpct <- apply(t1rowpct, c(1,2), function(x) gsub("NaN", "", x))
res <- list("count" = t2, "colpct" = t1colpct, "rowpct" = t1rowpct,
call = match.call())
class(res) <- "pctable"
print(res, colpct = colpct, rowpct = rowpct)
invisible(res)
}
## OK, I see now I'm doing the same work over and over, will extract
## a middle chunk out of each of those methods. And finally my cool print
method.
print.pctable <- function(tab, colpct = TRUE, rowpct = FALSE){
count <- tab[["count"]]
t3 <- count
if (colpct && !rowpct) {
cpct <- tab[["colpct"]]
for(j in rownames(cpct)){
for(k in colnames(cpct)){
t3[j, k] <- paste0(count[j, k], "(", cpct[j, k], "%)")
}
}
cat("Count (column %)\n")
print(t3)
return(invisible(t3))
}
## rowpct == TRUE< else would have returned
rpct <- tab[["rowpct"]]
for(j in rownames(rpct)){
for(k in colnames(rpct)){
t3[j, k] <- paste0(count[j, k], "(", rpct[j, k], "%)")
}
}
if (!colpct) {
cat("Count (row %)\n")
print(t3)
return(invisible(t3))
} else {
cpct <- tab[["colpct"]]
t4 <- array("", dim = c(1, 1) + c(2,1)*dim(tab$colpct))
t4[seq(1, NROW(t4), 2), ] <- t3
rownames(t4)[seq(1, NROW(t4), 2)] <- rownames(t3)
rownames(t4)[is.na(rownames(t4))] <- ""
colnames(t4) <- colnames(t3)
for(j in rownames(tab[["colpct"]])) {
for(k in colnames(tab[["colpct"]])){
t4[1 + which(rownames(t4) == j) ,k] <-
paste0(tab[["colpct"]][j, k], "%")
}
}
names(dimnames(t4)) <- names(dimnames(count))
cat("Count (row %)\n")
cat("column %\n")
print(t4, quote = FALSE)
return(invisible(t4))
}
}
And usage examples
dat <- data.frame(x = gl(4, 25),
y = sample(c("A", "B", "C", "D", "E"), 100, replace=
TRUE))
## Here's what I was aiming for, in the beginning
pctable(y ~ x, dat)
pctable(y ~ x, dat, exclude = NULL)
pctable(y ~ x, dat, rvlab = "My Outcome Var", cvlab = "My Columns")
## People who like row percents asked for this
pctable(y ~ x, dat, rowpct = TRUE, colpct = FALSE)
## Some people want both. Tiresome.
pctable(y ~ x, dat, rowpct = TRUE, colpct = TRUE)
pctable(y ~ x, dat, rowpct = TRUE, colpct = TRUE, exclude = NULL)
tab <- pctable(y ~ x, dat, rvlab = "My Outcome Var", cvlab = "My Columns")
print(tab, rowpct = TRUE, colpct = FALSE)
print.pctable(tab, rowpct = TRUE, colpct = TRUE)
## I also wanted an interface that would allow calls like
## pctable(y, x, dat)
## which I was able to do, but not when pctable is a method.
## As long as one writes in an existing variable, this dispatches
## pctable.default and result is OK
pctable(dat$y, dat$x)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = FALSE)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = NULL)
tab <- pctable(dat$y, dat$x)
print(tab, rowpct = TRUE, colpct = FALSE)
print(tab, rowpct = TRUE, colpct = TRUE)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c(NA, "E"))
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c("E"))
## Why do NA's get excluded
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c("B", "2"))
## This succeeds
pctable.default(y, x, dat)
## Next causes error
pctable(y, x, dat)
## Error in pctable(y, x, dat) (from #3) : object 'y' not found
At one point yesterday, I was on the verge of comprehending the parse tree
:)
Paul E. Johnson
Professor, Political Science Acting. Director
1541 Lilac Lane, Room 504 Center for Research Methods
University of Kansas University of Kansas
http://pj.freefaculty.orghttp://crmda.ku.edu
<http://quant.ku.edu>
[[alternative HTML version deleted]]
Dear R friends
I wanted a function to make a simple percent table that would be easy for
students to use. The goal originally was to have a simple thing people
would call like this
pctable(rowvar, colvar, data)
and the things "rowvar" and "colvar" might be names of variables in data. I
wanted to avoid the usage of "with" (as we now see in the table help).
Then some people wanted more features, and I agreed with the suggestion to
create a formula interface that people can call like so:
pctable(rowvar ~ colvar, data)
I end up with a generic function pctable and methods pctable.default,
pctable.formula, pctable.character.
I got that working, mostly I understand what's going on.
Except the following, which, actually, is a good lesson to me about
promises and method dispatch in R. An S3 generic will not send a call with
a promise in the first argument to pctable.default (as I had mistakenly
hoped). I'll paste in all the code below, but I think you will know the
answer even without running it.
pctable is a generic function. In workspace, I have no objects x and y,
but there are variables inside data.frame dat named x and y. Since y is
not an object, the method dispatch fails thus:
pctable(y, x, dat)
Error in pctable(y, x, dat) (from #3) : object 'y' not found
This direct call on pctable.default works (recall y and x are promises):
I think you are using the word "promise" differently than the standard R
usage. In a sense, all arguments are promises, but you seem to mean
something more. I think you mean that you want nonstandard evaluation
for x and y.
pctable.default(y, x, dat)
Count (column %)
x
y 1 2 3 4 Sum
A 5(20%) 3(12%) 5(20%) 6(24%) 19
B 9(36%) 5(20%) 4(16%) 6(24%) 24
C 1(4%) 6(24%) 3(12%) 2(8%) 12
D 4(16%) 4(16%) 6(24%) 5(20%) 19
E 6(24%) 7(28%) 7(28%) 6(24%) 26
Sum 25 25 25 25 100
All the methods work fine when the first argument is a language object.
This works (dispatches to pctable.formula)
pctable(y ~ x, dat)
Count (column %)
x
y 1 2 3 4 Sum
A 5(20%) 3(12%) 5(20%) 6(24%) 19
B 9(36%) 5(20%) 4(16%) 6(24%) 24
C 1(4%) 6(24%) 3(12%) 2(8%) 12
D 4(16%) 4(16%) 6(24%) 5(20%) 19
E 6(24%) 7(28%) 7(28%) 6(24%) 26
Sum 25 25 25 25 100
This works (dispatches to pctable.default)
pctable(dat$y, dat$x)
Count (column %)
dat$x
dat$y 1 2 3 4 Sum
A 5(20%) 3(12%) 5(20%) 6(24%) 19
B 9(36%) 5(20%) 4(16%) 6(24%) 24
C 1(4%) 6(24%) 3(12%) 2(8%) 12
D 4(16%) 4(16%) 6(24%) 5(20%) 19
E 6(24%) 7(28%) 7(28%) 6(24%) 26
Sum 25 25 25 25 100
However, this fails because y is not an object with a type
I would say it fails because object y does not exist in the evaluation
frame of the function, where you are implicitly evaluating it.
pctable(y, x, dat)
Error in pctable(y, x, dat) (from #3) : object 'y' not found
Can R be tricked to send that call to pctable.default, where it does work?
Yes, but it is probably a bad idea. The idea of S3 dispatch is that it
depends on the type of the argument used for dispatching, by default the
first argument. You don't have a variable named y, so you can't do that.
There are a few ways to do the tricking. You could add a data parameter
to the generic function, and construct a new environment from it before
you evaluate the first argument. Then y would be found in the data
parameter, and dispatch could work.
You could use exists() to find if the first argument exists, and make an
explicit call to pctable.default if it doesn't. But this will fail if
you have a global variable named y, because the test will find that one
and use it for dispatch, rather than using dat$y.
So I would conclude: don't do that. If you want to use a data
argument, use the formula method. That's what other functions do, and
so that's what your users would expect. If you don't use a formula,
then the variables should all use standard evaluation, i.e. they should
exist in the frame where you are calling pctable.
One more comment inline below.
Here's the code, I'm working on documentation, will put in package
rockchalk eventually, but hate to leave this problem until I fully
understand it.
pctable <- function(rv, ...)
{
UseMethod("pctable")
}
## rv: row variable, quoted or not
## cv: column variable, quoted or not
pctable.default <- function(rv, cv, data = parent.frame(),
rvlab = NULL, cvlab = NULL,
colpct = TRUE, rowpct = FALSE,
exclude = c(NA, NaN), rounded = FALSE)
{
rvlabel <- if (!missing(rv)) deparse(substitute(rv))
cvlabel <- if (!missing(cv)) deparse(substitute(cv))
rvlab <- if (is.null(rvlab)) rvlabel else rvlab
cvlab <- if (is.null(cvlab)) cvlabel else cvlab
rvin <- eval(substitute(rv), envir = data, enclos = parent.frame())
cvin <- eval(substitute(cv), envir = data, enclos = parent.frame())
t1 <- table(rvin, cvin, dnn = c(rvlab, cvlab), exclude = exclude)
rownames(t1)[is.na(rownames(t1))] <- "NA" ## symbol to letters
colnames(t1)[is.na(colnames(t1))] <- "NA"
if (rounded) t1 <- round(t1, -1)
t2 <- addmargins(t1, c(1,2))
t1colpct <- round(100*prop.table(t1, 2), 1)
t1rowpct <- round(100*prop.table(t1, 1), 1)
t1colpct <- apply(t1colpct, c(1,2), function(x) gsub("NaN", "", x))
t1rowpct <- apply(t1rowpct, c(1,2), function(x) gsub("NaN", "", x))
res <- list("count" = t2, "colpct" = t1colpct, "rowpct" = t1rowpct,
call = match.call())
class(res) <- "pctable"
print(res, colpct = colpct, rowpct = rowpct)
invisible(res)
}
pctable.formula <- function(formula, data = NULL, rvlab = NULL,
cvlab = NULL, colpct = TRUE, rowpct = FALSE,
exclude = c(NA, NaN), rounded = FALSE,
..., subset = NULL)
{
if (missing(data) || !is.data.frame(data)) stop("pctable requires a
data frame")
This test seems too strong. If x and y had been global variables of the
right shape in your example, then pctable(y ~ x) should work. I would
let model.frame (which you call down below) establish the rules for what
is allowed.
Duncan Murdoch
if (missing(formula) || (length(formula) != 3L))
stop("pctable requires a two sided formula")
mt <- terms(formula, data = data)
if (attr(mt, "response") == 0L) stop("response variable is required")
mf <- match.call(expand.dots = FALSE)
keepers <- match(c("formula", "data", "subset", "na.action"),
names(mf), 0L)
mf <- mf[c(1L, keepers)]
mf$drop.unused.levels <- FALSE
mf[[1L]] <- quote(stats::model.frame)
mf <- eval(mf, parent.frame())
## response is column 1
rvlab <- if (missing(rvlab)) colnames(mf)[1] else rvlab
cvlab <- if (missing(cvlab)) colnames(mf)[2] else cvlab
res <- pctable.default(mf[[1L]], mf[[2L]], data = mf,
rvlab = rvlab, cvlab = cvlab,
colpct = colpct, rowpct = rowpct,
exclude = exclude, rounded = rounded)
invisible(res)
}
pctable.character <- function(rowvar, colvar, data = NULL, rvlab = NULL,
cvlab = NULL, colpct = TRUE,
rowpct = FALSE, exclude = c(NA, NaN), rounded =
FALSE,
..., subset = NULL)
{
if (missing(data) || !is.data.frame(data)) stop("pctable requires a
data frame")
## colvar <- if (!is.character(colvar)) deparse(substitute(colvar))
else colvar
colvar <- as.character(substitute(colvar))[1L]
rvlab <- if (missing(rvlab)) rowvar else rvlab
cvlab <- if (missing(cvlab)) colvar else cvlab
t1 <- with(data, table(data[[rowvar]], data[[colvar]], dnn = c(rvlab,
cvlab), exclude = exclude))
rownames(t1)[is.na(rownames(t1))] <- "NA" ## symbol to letters
colnames(t1)[is.na(colnames(t1))] <- "NA"
if (rounded) t1 <- round(t1, -1)
t2 <- addmargins(t1, c(1,2))
t1colpct <- round(100*prop.table(t1, 2), 1)
t1rowpct <- round(100*prop.table(t1, 1), 1)
t1colpct <- apply(t1colpct, c(1,2), function(x) gsub("NaN", "", x))
t1rowpct <- apply(t1rowpct, c(1,2), function(x) gsub("NaN", "", x))
res <- list("count" = t2, "colpct" = t1colpct, "rowpct" = t1rowpct,
call = match.call())
class(res) <- "pctable"
print(res, colpct = colpct, rowpct = rowpct)
invisible(res)
}
## OK, I see now I'm doing the same work over and over, will extract
## a middle chunk out of each of those methods. And finally my cool print
method.
print.pctable <- function(tab, colpct = TRUE, rowpct = FALSE){
count <- tab[["count"]]
t3 <- count
if (colpct && !rowpct) {
cpct <- tab[["colpct"]]
for(j in rownames(cpct)){
for(k in colnames(cpct)){
t3[j, k] <- paste0(count[j, k], "(", cpct[j, k], "%)")
}
}
cat("Count (column %)\n")
print(t3)
return(invisible(t3))
}
## rowpct == TRUE< else would have returned
rpct <- tab[["rowpct"]]
for(j in rownames(rpct)){
for(k in colnames(rpct)){
t3[j, k] <- paste0(count[j, k], "(", rpct[j, k], "%)")
}
}
if (!colpct) {
cat("Count (row %)\n")
print(t3)
return(invisible(t3))
} else {
cpct <- tab[["colpct"]]
t4 <- array("", dim = c(1, 1) + c(2,1)*dim(tab$colpct))
t4[seq(1, NROW(t4), 2), ] <- t3
rownames(t4)[seq(1, NROW(t4), 2)] <- rownames(t3)
rownames(t4)[is.na(rownames(t4))] <- ""
colnames(t4) <- colnames(t3)
for(j in rownames(tab[["colpct"]])) {
for(k in colnames(tab[["colpct"]])){
t4[1 + which(rownames(t4) == j) ,k] <-
paste0(tab[["colpct"]][j, k], "%")
}
}
names(dimnames(t4)) <- names(dimnames(count))
cat("Count (row %)\n")
cat("column %\n")
print(t4, quote = FALSE)
return(invisible(t4))
}
}
And usage examples
dat <- data.frame(x = gl(4, 25),
y = sample(c("A", "B", "C", "D", "E"), 100, replace=
TRUE))
## Here's what I was aiming for, in the beginning
pctable(y ~ x, dat)
pctable(y ~ x, dat, exclude = NULL)
pctable(y ~ x, dat, rvlab = "My Outcome Var", cvlab = "My Columns")
## People who like row percents asked for this
pctable(y ~ x, dat, rowpct = TRUE, colpct = FALSE)
## Some people want both. Tiresome.
pctable(y ~ x, dat, rowpct = TRUE, colpct = TRUE)
pctable(y ~ x, dat, rowpct = TRUE, colpct = TRUE, exclude = NULL)
tab <- pctable(y ~ x, dat, rvlab = "My Outcome Var", cvlab = "My Columns")
print(tab, rowpct = TRUE, colpct = FALSE)
print.pctable(tab, rowpct = TRUE, colpct = TRUE)
## I also wanted an interface that would allow calls like
## pctable(y, x, dat)
## which I was able to do, but not when pctable is a method.
## As long as one writes in an existing variable, this dispatches
## pctable.default and result is OK
pctable(dat$y, dat$x)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = FALSE)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = NULL)
tab <- pctable(dat$y, dat$x)
print(tab, rowpct = TRUE, colpct = FALSE)
print(tab, rowpct = TRUE, colpct = TRUE)
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c(NA, "E"))
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c("E"))
## Why do NA's get excluded
pctable(dat$y, dat$x, rowpct = TRUE, colpct = TRUE, exclude = c("B", "2"))
## This succeeds
pctable.default(y, x, dat)
## Next causes error
pctable(y, x, dat)
## Error in pctable(y, x, dat) (from #3) : object 'y' not found
At one point yesterday, I was on the verge of comprehending the parse tree
:)