Skip to content

How to deal with package conflicts

10 messages · Terry Therneau, Michael Friendly, Gabor Grothendieck +2 more

#
The ridge() function was put into the survival package as a simple
example of what a user could do with penalized functions.  It's not a
"serious" function, and I'd be open to any suggestions for change. 

Actually, for any L2 penalty + Cox model one is now better off using
coxme as the maximization process is much better thought out there.  I'd
be happy to remove ridge from survival -- except that there are bound to
be lots of folks using the function and any such changes (even good
ones) to the survival package are fraught with peril.

Duncan: this raises a larger point.  I've often wished that I could have
"namespace" like rules apply to formulas.  Using survival again, when I
implemented gam-like smooths I had to create "pspline" rather than use
the more natural "s()" notation.  In survival, it would be good to do
this for ridge, cluster, pspline, and frailty; all of whom depend deeply
on a coxph context.  It would also solve a frailty() problem of long
standing, that when used in survreg only a subset of the frailty options
make sense; this is documented in the help file but catches users again
and again.

Terry Therneau
On Fri, 2011-11-25 at 12:00 +0100, r-devel-request at r-project.org wrote:
#
On 25/11/2011 9:10 AM, Terry Therneau wrote:
I think the general idea in formulas is that it is up to the user to 
define the meaning of functions used in them.  Normally the user has 
attached the package that is working on the formula, so the package 
author can provide useful things like s(), but if a user wanted to 
redefine s() to their own function, that should be possible.  Formulas 
do have environments attached, so both variables and functions should be 
looked up there.

This not perfectly applied, of course.  It is generally up to the 
function interpreting the formula to define what "+" means, for example.
You could also have the function treat s() and other functions 
specially, but this is likely to be a little risky.  (I'm in the process 
of putting together a small package for displaying tables; it treats +, 
*, and a few other function-like things specially:  Format, .Format, 
Heading and Justify.  I chose capital letters for those to hopefully 
avoid conflicts with a user's own functions.  Perhaps I should have used 
dots on all of them.)

Duncan Murdoch
#
On Fri, 2011-11-25 at 09:50 -0500, Duncan Murdoch wrote:
I don't agree that this is the best way.  A function like coxph could
easily have in its documentation a list of the "formula specials" that
it defines internally.  If the user want something of their own they can
easily use a different word.  In fact, I would strongly recommend that
they don't use one of these key names.  For things that work across
mutiple packages like ns(), what user in his right mind would redefine
it?
  So I re-raise the question.  Is there a reasonably simple way to make
the survival ridge() function specific to survival formulas?  It sets up
structures that have no meaning anywhere else, and its global definition
stands in the way of other sensible uses.  Having it be not exported +
obey namespace type sematics would be a plus all around.   

Philosophical aside:
  I have discovered to my dismay that formulas do have environments
attached, and that variables/functions are looked up there.  This made
sensible semantics for predict() within a function impossible for some
of the survival functions, unless I were to change all the routines to a
model=TRUE default.  (And a change of that magnitude to survival, with
its long list of dependencies, is not fun to contemplate.  A very quick
survey reveals several dependent packages will break.) So I don't agree
nearly so fully with the "should" part of your last sentence.  The out
of context evaluations allowed by environments are, I find, always
tricky and often lead to intricate special cases. 
  Thus, moving back and forth between how it seems that a formula should
work, and how it actually does work, sometimes leaves my head
spinning.  

Terry T.


Terry Therneau
#
On 11/25/2011 9:10 AM, Terry Therneau wrote:
Duncan provided one suggestion:  make ridge() an S3 generic, and rename 
ridge()
to ridge.coxph(), but this won't work, since you use ridge() inside 
coxph() and
survreg() to add a penalty term in the model formula.
Another idea might be simply to not export ridge(), but I have the 
feeling this will break
your R CMD checks.

Alternatively, my particular problem (wanting to use car::vif in my 
package documentation) would
be solved if John Fox considered making making survival a Suggests: 
package rather than a
Depends: one.  This might work, since survival is only referenced in car 
by providing Anova()
methods for coxph models.

I think all of this raises a general issue of unintended consequences of 
"package bloat," where
(a) Depends: packages are forced to load by require()/library(), whether 
they are really needed or not;
(b) There is nothing like require(car, depends=FALSE) to circumvent this;
(c) Once a require()'d package is loaded, it cannot be unloaded;
(d) AFAIK, there is no way for a package author to override the masking 
of functions or data
provided by other other packages, except by using mypackage::myfun() calls.

To me this seems to be a flaw in the namespace mechanism.

best,
-Michael
#
On Fri, 2011-11-25 at 10:42 -0500, Michael Friendly wrote:
The S3 generic idea won't work.  The argument inside ridge(x) is an
ordinary variable, and it's the argument inside that a generic uses for
dispatch.  I want to dispatch based on the context, which is what the
namespace mechanism does for a call to for instance coxpenal.fit, a non
exported survival function.  
  
I suspect that not exporting ridge would work for
	coxph(Surv(time, status) ~ ph.ecog + ridge(age), data=lung)
but not for
      myform <-Surv(time, status) ~ ph.ecog + ridge(age)
      coxph(myform, data=lung)

(I haven't test this)  This is because formulas are treated rather like
functions, with bindings coming into play when they are first defined,
not when they are first used.
I will say that the long list of "reverse depends" on the survival
package does give me pause when making changes.

Terry T.
#
On Fri, Nov 25, 2011 at 10:37 AM, Terry Therneau <therneau at mayo.edu> wrote:
The dynlm package uses formula functions which are specific to it.
Look at its code.
#
On 25/11/2011 10:37 AM, Terry Therneau wrote:
Yes, that's what I described in the second part of my answer, and you 
can do it too in coxph.  It requires some work to do special processing 
of symbols in a formula, but it is already being done for + and : and *, 
so doing it as well for some other functions would be reasonable.  If 
you don't mind some programming on the formula object, it's not even 
very hard.

As to a user defining their own ns() function:  that seems like it's not 
something we should disallow, especially if it was done in a context 
where natural splines weren't being used.  It might have nothing to do 
with the ns() function in the splines package, but it might mean 
something to the user in terms of his own data.  The splines package is 
a base package so it's not a great idea to re-use the name, but many 
users would not have splines attached, and wouldn't notice that they had 
just masked the splines::ns function.
Yes, there is a way to do what you want.  Don't export the function from 
the package, but preprocess formulas coming into coxph to substitute 
things that look like calls to ridge() with calls to something local.

For example, this does the substitution.  I haven't checked it much, so 
it might mess up something else (and there might be
more elegant ways to write it, using e.g. rapply).  It is definitely 
slightly more elaborate than it needs to be (no need for the separate 
local function), but that's so you can make the outer function do a bit 
more than the recursive part does.

fixRidge <- function( formula ) {

   recurse <- function( e ) {
     if (length(e) == 1) {
        if (as.character(e) == "ridge") e <- quote(survival:::ridge)
     }  else for (i in seq_along(e))
           e[[i]] <- recurse(e[[i]])
    e
   }

   recurse(formula)
}

This replace calls to ridge in the formula with calls to survival:::ridge.
It all comes down to the question:  who owns the name?  Generally the 
caller owns the name.  So you should look it up in the context of the 
caller.  In R, that means you need to carry along the environment of the 
caller.

Duncan Murdoch
#
Hi Michael,

I'll look into moving survival to suggests (this weekend, if I have time),
but that doesn't address the more general issue.

Best,
 John
#
I like the idea of making the functions local, and will persue it.
This issue has bothered me for a long time -- I had real misgivings when
I introduced "cluster" to the package, but did not at that time see any
way other than making it global.  
 I might make this change soon in the ridge function, since it's a good
test case -- less likely to cause downstream troubles.

Here is another possible approach:
 Inside coxph, just before calling eval with the formula, create a new
environment "tempenv" which consists of my handful of special functions
(ridge, frailty, cluster, pspline) who have meaning only inside a coxph
call, with a parent environment of the tempenv being the current
environment of the formula. Then set the environment of the formula to
tempenv, then eval.  Would this work?

 Two small further questions:
1. Any special rules for the documentation?  We need a page for
"cluster", but want to mark it almost like a method in the sense of
applying only in a one context.

2. Does one scheme or another work best for downstream functions like
predict or model.matrix?  Duncan's idea of direct modification might
have an advantage (?) in that the terms object would be permanently
changed.

Terry T.
#
On 25/11/2011 12:12 PM, Terry Therneau wrote:
It should.
I would list those special functions as aliases of the coxph topic, and 
document them there.
As long as you attach your new temporary environment to copies of the 
formula that you pass elsewhere, it should mostly work.  It may confuse 
someone who did  ls(environment(formula)) (because they'd only see your 
functions, not the user's), but I don't think that's a very common thing 
to want to do.

Duncan Murdoch