I would like to ask for advice from R experts about the benefits or
dangers of using attr to return information with an object that is
returned from a function. I have a feeling as though I have cheated by
using attributes, and wonder if I've done something fishy.
Maybe I mean to ask, where is the dividing line between attributes and
instance variables? The separation is not clear in my mind anymore.
Background: I paste below a function that takes in a regression object
and make changes to the data and/or call and then run a
revised regression. In my earlier effort, I was building a return
list, including the new fitted regression object plus some
variables that have information about the changes that a were made.
That creates some inconvenience, however. When the regression is in a
list object, then methods for lm objects don't apply to that result
object. The return is not an lm anymore. I either need to write
custom methods for every function or remember to extract the object
from the list before sending to the generic function.
I *guessed* it would work to write the new information as object
attributes, and it seems to work. There is a generic function
"meanCenter" and a method "meanCenter.default". At the end of
meanCenter.default, here's my use (or abuse) of attributes.
res <- eval(mc)
class(res) <- c("mcreg", class(model))
attr(res, "centeredVars") <- nc
attr(res, "centerCall") <- match.call()
res
I wrote print and summary methods, but other methods that work for lm
objects like plot will also work for these new ones.
meanCenter <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){
UseMethod("meanCenter")
}
meanCenter.default <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){
std <- function(x) {
if( !is.numeric(x) ){
stop("center.lm tried to center a factor variable. No Can Do!")
} else {
scale(x, center = TRUE, scale = standardize)
}
}
rdf <- get_all_vars(formula(model), model$model) #raw data frame
t <- terms(model)
tl <- attr(t, "term.labels")
tmdc <- attr(t, "dataClasses") ##term model data classes
isNumeric <- names(tmdc)[ which(tmdc %in% c("numeric"))]
isFac <- names(tmdc)[ which(tmdc %in% c("factor"))]
if (tmdc[1] != "numeric") stop("Sorry, DV not a single numeric column")
##Build "nc", a vector of variable names that "need centering"
##
if (!centerDV) {
if (centerOnlyInteractors == FALSE){
nc <- isNumeric[-1] #-1 excludes response
unique(nc)
}else{
interactTerms <- tl[grep(":", tl)]
nc <- unique(unlist(strsplit( interactTerms, ":")))
nc <- nc[which(nc %in% isNumeric)]
}
}else{
if (centerOnlyInteractors == FALSE){
nc <- isNumeric
}else{
interactTerms <- tl[grep(":", tl)]
nc <- unique(unlist(strsplit( interactTerms, ":")))
nc <- nc[which(nc %in% isNumeric)]
nc <- c( names(tmdc)[1] , nc)
}
}
mc <- model$call
# run same model call, replacing non centered data with centered data.
## if no need to center factor contrasts:
if (!centerContrasts)
{
stddat <- rdf
for (i in nc) stddat[ , i] <- std( stddat[, i])
mc$data <- quote(stddat)
}else{
##dm: design matrix, only includes intercept and predictors
dm <- model.matrix(model, data=rdf, contrasts.arg =
model$contrasts, xlev = model$xlevels)
##contrastIdx: indexes of contrast variables in dm
contrastIdx <- which(attr(dm, "assign")== match(isFac, tl))
contrastVars <- colnames(dm)[contrastIdx]
nc <- c(nc, contrastVars)
dm <- as.data.frame(dm)
hasIntercept <- attr(t, "intercept")
if (hasIntercept) dm <- dm[ , -1] # removes intercept, column 1
dv <- rdf[ ,names(tmdc)[1]] #tmdc[1] is response variable name
dm <- cbind(dv, dm)
colnames(dm)[1] <- names(tmdc)[1] #put colname for dv
dmnames <- colnames(dm)
hasColon <- dmnames[grep(":", dmnames)]
dm <- dm[ , -match(hasColon, dmnames)] ##remove vars with colons
(lm will recreate)
##Now, standardise the variables that need standardizing
for (i in nc) dm[ , i] <- std( dm[, i])
fmla <- formula(paste(dmnames[1], " ~ ", paste(dmnames[-1],
collapse=" + ")))
cat("This fitted model will use those centered variables\n")
cat("Model-constructed interactions such as \"x1:x3\" are built
from centered variables\n")
mc$formula <- formula(fmla)
mc$data <- quote(dm)
}
cat("These variables", nc, "Are centered in the design matrix \n")
res <- eval(mc)
class(res) <- c("mcreg", class(model))
attr(res, "centeredVars") <- nc
attr(res, "centerCall") <- match.call()
res
}
summary.mcreg <- function(object, ...){
nc <- attr(object, "centeredVars")
cat("The centered variables were: \n")
print(nc)
cat("Even though the variables here have the same names as their
non-centered counterparts, I assure you these are centered.\n")
mc <- attr(object, "centerCall")
cat("These results were produced from: \n")
print(mc)
NextMethod(generic = "summary", object = object, ...)
}
print.mcreg <- function(x, ...){
nc <- attr(x, "centeredVars")
cat("The centered variables were: \n")
print(nc)
cat("Even though the variables here have the same names as their
non-centered counterparts, I assure you these are centered.\n")
mc <- attr(x, "centerCall")
cat("These results were produced from: \n")
print(mc)
NextMethod(generic = "print", object = x, ...)
}
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
I would like to ask for advice from R experts about the benefits or
dangers of using attr to return information with an object that is
returned from a function. I have a feeling as though I have cheated by
using attributes, and wonder if I've done something fishy.
Maybe I mean to ask, where is the dividing line between attributes and
instance variables? The separation is not clear in my mind anymore.
Background: I paste below a function that takes in a regression object
and make changes to the data and/or call and then run a
revised regression. In my earlier effort, I was building a return
list, including the new fitted regression object plus some
variables that have information about the changes that a were made.
That creates some inconvenience, however. When the regression is in a
list object, then methods for lm objects don't apply to that result
object. The return is not an lm anymore.
Why don't you just subclass it? That's the "normal" way of doing things - you simply add additional entries for your subclass (e.g. m$myItem1, m$myItem2, ...), prepend your new subclass name and you're done. You can still dispatch on your subclass before the superclass while superclass methods just work as well..
Cheers,
Simon
I either need to write
custom methods for every function or remember to extract the object
from the list before sending to the generic function.
I *guessed* it would work to write the new information as object
attributes, and it seems to work. There is a generic function
"meanCenter" and a method "meanCenter.default". At the end of
meanCenter.default, here's my use (or abuse) of attributes.
res <- eval(mc)
class(res) <- c("mcreg", class(model))
attr(res, "centeredVars") <- nc
attr(res, "centerCall") <- match.call()
res
I wrote print and summary methods, but other methods that work for lm
objects like plot will also work for these new ones.
meanCenter <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){
UseMethod("meanCenter")
}
meanCenter.default <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){
std <- function(x) {
if( !is.numeric(x) ){
stop("center.lm tried to center a factor variable. No Can Do!")
} else {
scale(x, center = TRUE, scale = standardize)
}
}
rdf <- get_all_vars(formula(model), model$model) #raw data frame
t <- terms(model)
tl <- attr(t, "term.labels")
tmdc <- attr(t, "dataClasses") ##term model data classes
isNumeric <- names(tmdc)[ which(tmdc %in% c("numeric"))]
isFac <- names(tmdc)[ which(tmdc %in% c("factor"))]
if (tmdc[1] != "numeric") stop("Sorry, DV not a single numeric column")
##Build "nc", a vector of variable names that "need centering"
##
if (!centerDV) {
if (centerOnlyInteractors == FALSE){
nc <- isNumeric[-1] #-1 excludes response
unique(nc)
}else{
interactTerms <- tl[grep(":", tl)]
nc <- unique(unlist(strsplit( interactTerms, ":")))
nc <- nc[which(nc %in% isNumeric)]
}
}else{
if (centerOnlyInteractors == FALSE){
nc <- isNumeric
}else{
interactTerms <- tl[grep(":", tl)]
nc <- unique(unlist(strsplit( interactTerms, ":")))
nc <- nc[which(nc %in% isNumeric)]
nc <- c( names(tmdc)[1] , nc)
}
}
mc <- model$call
# run same model call, replacing non centered data with centered data.
## if no need to center factor contrasts:
if (!centerContrasts)
{
stddat <- rdf
for (i in nc) stddat[ , i] <- std( stddat[, i])
mc$data <- quote(stddat)
}else{
##dm: design matrix, only includes intercept and predictors
dm <- model.matrix(model, data=rdf, contrasts.arg =
model$contrasts, xlev = model$xlevels)
##contrastIdx: indexes of contrast variables in dm
contrastIdx <- which(attr(dm, "assign")== match(isFac, tl))
contrastVars <- colnames(dm)[contrastIdx]
nc <- c(nc, contrastVars)
dm <- as.data.frame(dm)
hasIntercept <- attr(t, "intercept")
if (hasIntercept) dm <- dm[ , -1] # removes intercept, column 1
dv <- rdf[ ,names(tmdc)[1]] #tmdc[1] is response variable name
dm <- cbind(dv, dm)
colnames(dm)[1] <- names(tmdc)[1] #put colname for dv
dmnames <- colnames(dm)
hasColon <- dmnames[grep(":", dmnames)]
dm <- dm[ , -match(hasColon, dmnames)] ##remove vars with colons
(lm will recreate)
##Now, standardise the variables that need standardizing
for (i in nc) dm[ , i] <- std( dm[, i])
fmla <- formula(paste(dmnames[1], " ~ ", paste(dmnames[-1],
collapse=" + ")))
cat("This fitted model will use those centered variables\n")
cat("Model-constructed interactions such as \"x1:x3\" are built
from centered variables\n")
mc$formula <- formula(fmla)
mc$data <- quote(dm)
}
cat("These variables", nc, "Are centered in the design matrix \n")
res <- eval(mc)
class(res) <- c("mcreg", class(model))
attr(res, "centeredVars") <- nc
attr(res, "centerCall") <- match.call()
res
}
summary.mcreg <- function(object, ...){
nc <- attr(object, "centeredVars")
cat("The centered variables were: \n")
print(nc)
cat("Even though the variables here have the same names as their
non-centered counterparts, I assure you these are centered.\n")
mc <- attr(object, "centerCall")
cat("These results were produced from: \n")
print(mc)
NextMethod(generic = "summary", object = object, ...)
}
print.mcreg <- function(x, ...){
nc <- attr(x, "centeredVars")
cat("The centered variables were: \n")
print(nc)
cat("Even though the variables here have the same names as their
non-centered counterparts, I assure you these are centered.\n")
mc <- attr(x, "centerCall")
cat("These results were produced from: \n")
print(mc)
NextMethod(generic = "print", object = x, ...)
}
--
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
Paul,
On Jan 3, 2012, at 3:08 PM, Paul Johnson wrote:
I would like to ask for advice from R experts about the benefits or
dangers of using attr to return information with an object that is
returned from a function. I have a feeling as though I have cheated by
using attributes, and wonder if I've done something fishy.
Maybe I mean to ask, where is the dividing line between attributes and
instance variables? ?The separation is not clear in my mind anymore.
Background: I paste below a function that takes in a regression object
and make changes to the data and/or call and then run a
revised regression. ?In my earlier effort, I was building a return
list, including the new fitted regression object plus some
variables that have information about the changes that a were made.
That creates some inconvenience, however. ?When the regression is in a
list object, then methods for lm objects don't apply to that result
object. The return is not an lm anymore.
Why don't you just subclass it? That's the "normal" way of doing things - you simply add additional entries for your subclass (e.g. m$myItem1, m$myItem2, ...), prepend your new subclass name and you're done. You can still dispatch on your subclass before the superclass while superclass methods just work as well..
Cheers,
Simon
Yes. I see that now.
But I'm still wondering about the attribute versus variable question.
To the programmer, is there any difference between returning
information as attributes or variables?
Does R care if I do this:
class(res) <- c("mcreg", "lm")
attr(res, "centeredVars") <- nc
Or this:
class(res) <- c("mcreg", "lm")
res$centeredVars <- nc
Is there some place "down there," in the C foundations of R, where an
attribute is just a variable, as is centeredVars?
pj
Paul E. Johnson
Professor, Political Science
1541 Lilac Lane, Room 504
University of Kansas
On Tue, Jan 3, 2012 at 3:59 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
Paul,
On Jan 3, 2012, at 3:08 PM, Paul Johnson wrote:
I would like to ask for advice from R experts about the benefits or
dangers of using attr to return information with an object that is
returned from a function. I have a feeling as though I have cheated by
using attributes, and wonder if I've done something fishy.
Maybe I mean to ask, where is the dividing line between attributes and
instance variables? The separation is not clear in my mind anymore.
Background: I paste below a function that takes in a regression object
and make changes to the data and/or call and then run a
revised regression. In my earlier effort, I was building a return
list, including the new fitted regression object plus some
variables that have information about the changes that a were made.
That creates some inconvenience, however. When the regression is in a
list object, then methods for lm objects don't apply to that result
object. The return is not an lm anymore.
Why don't you just subclass it? That's the "normal" way of doing things - you simply add additional entries for your subclass (e.g. m$myItem1, m$myItem2, ...), prepend your new subclass name and you're done. You can still dispatch on your subclass before the superclass while superclass methods just work as well..
Cheers,
Simon
Yes. I see that now.
But I'm still wondering about the attribute versus variable question.
To the programmer, is there any difference between returning
information as attributes or variables?
Does R care if I do this:
class(res)<- c("mcreg", "lm")
attr(res, "centeredVars")<- nc
Or this:
class(res)<- c("mcreg", "lm")
res$centeredVars<- nc
Is there some place "down there," in the C foundations of R, where an
attribute is just a variable, as is centeredVars?
Yes, the internal implementation is different (res$centeredVars implies
res is a list, but attributes are stored as a pairlist). But the thing
itself (your nc) can't tell where it is stored.
Attributes are slightly harder to work with, so Simon's recommendation
is good advice. But in cases where you want other functions to work
with the result, and the result isn't a named list with a class, then
attributes are a convenient way to go.
The only two snags I can think of are 1, that R uses a few attributes
for its own purposes (e.g. "names", "dim" and "dimnames") and if you use
those attribute names for something incompatible you'll probably run
into all sorts of problems and 2, that S4 objects use attributes
internally. I wouldn't recommend using attributes on an S4 object.
Duncan Murdoch
Attributes are slightly harder to work with, so Simon's recommendation is
good advice. ?But in cases where you want other functions to work with the
result, and the result isn't a named list with a class, then attributes are
a convenient way to go.
The only two snags I can think of are 1, that R uses a few attributes for
its own purposes (e.g. "names", "dim" and "dimnames") and if you use those
attribute names for something incompatible you'll probably run into all
sorts of problems and 2, that S4 objects use attributes internally. ?I
wouldn't recommend using attributes on an S4 object.
Attributes are also useful when you're extending a atomic vector, and
want to be able to store objects of your new class inside data frames
etc.
Hadley
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/