Skip to content

Variable lables (was Re: [R] Reading SAS version 8 data into

2 messages · Warnes, Gregory R, Frank E Harrell Jr

#
[Moved from R-help]
I would like to see general support for label attributes in the R plotting
and modeling functions.  One possible way of implementing this is to create
a replacement for the standard "deparse(substitute(blah))" idiom. This
function, getlabel(), checks for a label attribute and returns that if
present. Otherwise it returns the variable's name as a string.

Here's some code I've put together:

label  <-  function(x) attr(x,"label")

"label<-" <-  function(x, value )
  {
    m  <-  match.call()
    m[[1]]  <- as.name("attr<-")
    m$value  <- NULL
    m$which  <- "label"
    m$value  <- value
    eval(m)
  }

getlabel <- function(x)
  {
    tmp <- attr(x,"label")
    if(is.null(tmp) || tmp=="")
      {
        m  <- match.call()
        m[[1]] <- as.name('substitute')
        tmp <- deparse(eval(m,envir=parent.frame()))
      }
    return(tmp)
  }

I've done some testing, and getlabel seems to work fine as a substitute for
"deparse(subsitute(x))" in the plot commands.

There are a couple of problems.  First, attributes are carried along in
sometime unexpected ways.  For example, attributes are carried along by all
of the arethmetic operations I tried:
   > x <- rnorm(1)
   > label(x) <- "x label"
   > 
   > sqrt(x)
    [1] 0.8888801
   attr(,"label")
   [1] "x label"
   > x+1
    [1] 1.8888801
   attr(,"label")
   [1] "x label"
Ideally, performing an operation the creates a new variable should mask off
the label attribute (what about other attributes?).  I recognize that this
would require changes to R.  Would this be a big task?

Second, unless one bounds the length of the labels, it can get pretty messy
to use them in some places, (eg the coefficients table reported by
print.summary).  I can see a couple of solutions for this problem.  A)
Truncate labels when necessary.   B) Have 2 attributes--One short 'label'
that has a fixed length (say 30 characters), and one long 'description' that
can has no length limit.  C) Continue to use the variable name given in the
call for places where length is a problem, but show a translation between
the variable name and the label somewhere else as part of the output.

Except for the problem of the label attribute getting 'carried along' when
it is not desirable, I think that it would be straightforward and 'backwards
compatible' to add general support for variable labels.  

I am willing to submit patches for functions that I regularly use.  Would
others be willing to contribute?  Would the patches be accepted?

-Greg


LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Dear Greg,

I too would like to see labels be more a part of R.
In Hmisc I allow labels to be any length but plotting
and table making functions have options to abbreviate()
them or use variable names instead of labels.

I think your code is more complex that is really needed.

The problem with defaulting to deparse(...) is that
multiple function pass-throughs return the wrong result:
[1] "z"

So I don't see a large role for the deparse(...) method.

The Hmisc library already defines label<- so if you
are willing to use another name for your version that
would prevent confusion from users of Hmisc.

The problem of labels being retained after you do
arithmetic on the variable is a real one, and one
I've put up with for a long time with S-Plus.  It would
be nice if R could prevent that but that is getting tricky.
What I've wanted more generally is the ability for the
user to specify a vector of attribute names in options()
that would be preserved upon subsetting.  That way I
wouldn't have to go to trouble to write local versions
of [.factor, etc. that carry the 'label' attribute.
Im my usage, 'label's are always logically carried
forward for subsetting.

Frank
"Warnes, Gregory R" wrote: