Skip to content

returning information from functions via attributes rather than return list

5 messages · Paul Johnson, Simon Urbanek, Duncan Murdoch +1 more

#
I would like to ask for advice from R experts about the benefits or
dangers of using attr to return information with an object that is
returned from a function. I have a feeling as though I have cheated by
using attributes, and wonder if I've done something fishy.

Maybe I mean to ask, where is the dividing line between attributes and
instance variables?  The separation is not clear in my mind anymore.

Background: I paste below a function that takes in a regression object
and make changes to the data and/or call and then run a
revised regression.  In my earlier effort, I was building a return
list, including the new fitted regression object plus some
variables that have information about the changes that a were made.

That creates some inconvenience, however.  When the regression is in a
list object, then methods for lm objects don't apply to that result
object. The return is not an lm anymore.  I either need to write
custom methods for every function or remember to extract the object
from the list before sending to the generic function.

I *guessed* it would work to write the new information as object
attributes, and it seems to work. There is a generic function
"meanCenter" and a method "meanCenter.default". At the end of
meanCenter.default, here's my use (or abuse) of attributes.

  res <- eval(mc)
  class(res) <- c("mcreg", class(model))
  attr(res, "centeredVars") <- nc
  attr(res, "centerCall") <-  match.call()
  res

I wrote print and summary methods, but other methods that work for lm
objects like plot will also work for these new ones.



meanCenter <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){
  UseMethod("meanCenter")
}

meanCenter.default <- function(model, centerOnlyInteractors=TRUE,
centerDV=FALSE, standardize=FALSE, centerContrasts = F){

  std <- function(x) {
    if( !is.numeric(x) ){
      stop("center.lm tried to center a factor variable. No Can Do!")
    } else {
      scale(x, center = TRUE, scale = standardize)
    }
  }

  rdf <- get_all_vars(formula(model), model$model) #raw data frame
  t <- terms(model)
  tl <- attr(t, "term.labels")
  tmdc <- attr(t, "dataClasses") ##term model data classes

  isNumeric <- names(tmdc)[ which(tmdc %in% c("numeric"))]
  isFac <-  names(tmdc)[ which(tmdc %in% c("factor"))]
  if (tmdc[1] != "numeric") stop("Sorry, DV not a single numeric column")

  ##Build "nc", a vector of variable names that "need centering"
  ##
  if (!centerDV) {
    if (centerOnlyInteractors == FALSE){
      nc <- isNumeric[-1] #-1 excludes response
      unique(nc)
    }else{
      interactTerms <- tl[grep(":", tl)]
      nc <- unique(unlist(strsplit( interactTerms, ":")))
      nc <-  nc[which(nc %in% isNumeric)]
    }
  }else{
    if (centerOnlyInteractors == FALSE){
      nc <- isNumeric
    }else{
      interactTerms <- tl[grep(":", tl)]
      nc <- unique(unlist(strsplit( interactTerms, ":")))
      nc <- nc[which(nc %in% isNumeric)]
      nc <- c( names(tmdc)[1] , nc)
    }
  }


  mc <- model$call
  # run same model call, replacing non centered data with centered data.
  ## if no need to center factor contrasts:
  if (!centerContrasts)
    {
      stddat <- rdf
      for (i in nc) stddat[ , i] <- std( stddat[, i])
      mc$data <- quote(stddat)
    }else{
      ##dm: design matrix, only includes intercept and predictors
      dm <- model.matrix(model, data=rdf, contrasts.arg =
model$contrasts, xlev = model$xlevels)
      ##contrastIdx: indexes of contrast variables in dm
      contrastIdx <- which(attr(dm, "assign")== match(isFac, tl))
      contrastVars <- colnames(dm)[contrastIdx]
      nc <- c(nc, contrastVars)

      dm <- as.data.frame(dm)

      hasIntercept <- attr(t, "intercept")
      if (hasIntercept) dm <- dm[ , -1] # removes intercept, column 1

      dv <- rdf[ ,names(tmdc)[1]] #tmdc[1] is response variable name
      dm <- cbind(dv, dm)
      colnames(dm)[1] <- names(tmdc)[1] #put colname for dv

      dmnames <- colnames(dm)
      hasColon <- dmnames[grep(":", dmnames)]
      dm <- dm[ , -match(hasColon, dmnames)] ##remove vars with colons
(lm will recreate)

      ##Now, standardise the variables that need standardizing
      for (i in nc) dm[ , i] <- std( dm[, i])


      fmla <- formula(paste(dmnames[1], " ~ ",  paste(dmnames[-1],
collapse=" + ")))
      cat("This fitted model will use those centered variables\n")
      cat("Model-constructed interactions such as \"x1:x3\" are built
from centered variables\n")
      mc$formula <- formula(fmla)
      mc$data <-  quote(dm)
    }

  cat("These variables", nc, "Are centered in the design matrix \n")

  res <- eval(mc)
  class(res) <- c("mcreg", class(model))
  attr(res, "centeredVars") <- nc
  attr(res, "centerCall") <-  match.call()
  res
}

summary.mcreg <- function(object, ...){
  nc <- attr(object, "centeredVars")
  cat("The centered variables were: \n")
  print(nc)
  cat("Even though the variables here have the same names as their
non-centered counterparts, I assure you these are centered.\n")
  mc <- attr(object, "centerCall")
  cat("These results were produced from: \n")
  print(mc)
  NextMethod(generic = "summary", object = object, ...)
}


print.mcreg <- function(x, ...){
  nc <- attr(x, "centeredVars")
  cat("The centered variables were: \n")
  print(nc)
  cat("Even though the variables here have the same names as their
non-centered counterparts, I assure you these are centered.\n")
  mc <- attr(x, "centerCall")
  cat("These results were produced from: \n")
  print(mc)
  NextMethod(generic = "print", object = x, ...)
}
#
Paul,
On Jan 3, 2012, at 3:08 PM, Paul Johnson wrote:

            
Why don't you just subclass it? That's the "normal" way of doing things - you simply add additional entries for your subclass (e.g. m$myItem1, m$myItem2, ...), prepend your new subclass name and you're done. You can still dispatch on your subclass before the superclass while superclass methods just work as well..

Cheers,
Simon
#
On Tue, Jan 3, 2012 at 3:59 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
Yes. I see that now.

But I'm still wondering about the attribute versus variable question.
To the programmer, is there any difference between returning
information as attributes or variables?

Does R care if I do this:

class(res) <- c("mcreg", "lm")
attr(res, "centeredVars") <- nc

Or this:

class(res) <- c("mcreg", "lm")
res$centeredVars <- nc

Is there some place "down there," in the C foundations of R,  where an
attribute is just a variable, as is centeredVars?

pj
#
On 12-01-04 3:19 PM, Paul Johnson wrote:
Yes, the internal implementation is different (res$centeredVars implies 
res is a list, but attributes are stored as a pairlist).  But the thing 
itself (your nc) can't tell where it is stored.

Attributes are slightly harder to work with, so Simon's recommendation 
is good advice.  But in cases where you want other functions to work 
with the result, and the result isn't a named list with a class, then 
attributes are a convenient way to go.

The only two snags I can think of are 1, that R uses a few attributes 
for its own purposes (e.g. "names", "dim" and "dimnames") and if you use 
those attribute names for something incompatible you'll probably run 
into all sorts of problems and 2, that S4 objects use attributes 
internally.  I wouldn't recommend using attributes on an S4 object.

Duncan Murdoch
#
Attributes are also useful when you're extending a atomic vector, and
want to be able to store objects of your new class inside data frames
etc.

Hadley