Skip to content

[S] Labels wrong with lrm

5 messages · Brian Ripley, Thomas Lumley, Frank E Harrell Jr

#
Dear Jan,

Thank you very much for your excellent description of the
problem and the self-contained test code.  This is a
problem that I've been meaning to either document better
or solve for some time.  The root of the problem is with
the builtin S-Plus terms.inner function:
expression(age, kx, smok)

You can see that terms.inner inappropriately includes kx
as an independent variable as it does not know that
the first argument to pol is the special variable.
When a constant replaces kx, all is well.

As I am relying on the C code called by terms.inner to 
do the job, I don't have a ready solution.  I would
be happy if someone comes up with a solution.  The
all.vars function in the R language has the same
limitation:

all.vars(asthma ~ pol(age,kx) + smok)
[1] "asthma" "age"   "kx"   "smok"

Frank Harrell
Jan Brogger wrote:

  
    
#
On Sat, 28 Jul 2001 fharrell at virginia.edu wrote:

            
Well,

1) all.vars comes from S, and works the same under R.

2) I don't think it is a limitation.  all.vars is described as

Description:

     Return a character vector containing all the names which occur in
     an expression or call.

and kx is such a name.  Indeed, there is no way to know from just the
formula if it is length n-vector or a scalar.   I don't know what pol()
is, but guess it is from your library, in which case the interpretation
may well depend on where (if anywhere) that library is in the search path.

If you want special-purpose names in formulae (like strata) you need to
extend the formula-handling code.  Neither S4 nor R make that easy.

Brian
#
Right Brian, I don't see it as a limitation, 
but an extension would be helpful.  An option to
all.vars to produce a list of vectors containing
names for variables in each "term" would be
especially helpful.  -Frank

P.S.  Thanks for pointing out also that all.vars
is in S - I had missed that.
Prof Brian Ripley wrote:
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Sat, 28 Jul 2001 fharrell at virginia.edu wrote:

            
I think this can't be fixed in general.  There is simply no way to know
whether pol(age,kx) contains two variables (like interaction(age,kx)) or
one variable, and if so, which is the variable and which the parameter.

The termplot() function has a function carrier.names() that guesses that
the first argument is the only variable, which is a useful heuristic until
you have log(0.5+x) as a term.

You can get the results of all.vars broken down by term like
I(log(0.5 + x)) ~ pol(age, kx) + ns(sbp, df) + factor(race, labels =
races) +
    sex
av[match(all.vars(parse(text=term)),av)])
$"pol(age, kx)"
[1] "age" "kx"

$"ns(sbp, df)"
[1] "sbp" "df"

$"factor(race, labels = races)"
[1] "race"  "races"

$sex
[1] "sex"

(actually you only get the RHS of the formula, but that shouldn't be hard
to fix)

You might also look at how nlme and Jim Lindsey's nonlinear models
functions solve the question of whether a name refers to a a variable or a
parameter.

	-thomas

Thomas Lumley			Asst. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Dear Thomas,

Your code solves the problem.  Thank you!

To users of the Design library such as Jan Brogger
who want to be able to
use variables as the second argument to one of the
library's transformation functions (pol, rcs, lsp),
you can redefine the var.inner function in the Hmisc
library as the following until the fix is made on the
source code posted on our web page.

var.inner <- function(formula) {
  if(!inherits(formula,"formula")) formula <- attr(formula,"formula")
  if(!length(formula)) stop('no formula object found')
	if(length(formula) > 2)
		formula[[2]] <- NULL  # remove response variable
  av <- all.vars(formula)
  ## Thanks to Thomas Lumley <tlumley at u.washington.edu> 28Jul01 :
  unique(sapply(attr(terms(formula),"term.labels"),
         function(term,av)
	av[match(all.vars(parse(text=term)),av)][1],
                av=av) )
}
 
-Frank Harrell
Thomas Lumley wrote: