Skip to content

specials and ::

19 messages · Kevin Coombes, Chris Black, Berwin A Turlach +8 more

#
The survival package makes significant use of the "specials" argument of terms(), before 
calling model.frame; it is part of nearly every modeling function. The reason is that 
strata argments simply have to be handled differently than other things on the right hand 
side. Likewise for tt() and cluster(), though those are much less frequent.

I now get "bug reports" from the growing segment that believes one should put 
packagename:: in front of every single instance.?? For instance
 ????? fit <- survival::survdiff( survival::Surv(time, status) ~ ph.karno + 
survival::strata(inst),? data= survival::lung)

This fails to give the correct answer because it fools terms(formula, specials= 
"strata").??? I've stood firm in my response of "that's your bug, not mine", but I begin 
to believe I am swimming uphill.?? One person responded that it was company policy to 
qualify everything.

I don't see an easy way to fix survival, and even if I did it would be a tremendous amout 
of work.?? What are other's thoughts?

Terry
#
I know I'm a curmudgeon, but it seems to me that if their "company 
policy" is causing a problem while trying to use free software, then the 
company should pay to fix it.

 ? Kevin
On 8/26/2024 10:42 AM, Therneau, Terry M., Ph.D. via R-devel wrote:
#
G'day Terry,

On Mon, 26 Aug 2024 09:42:10 -0500
"Therneau, Terry M., Ph.D. via R-devel" <r-devel at r-project.org> wrote:
[...]
[...]
Not that I want to start a flame war, or say anything controversial,
but IMHO people who want to put packagename:: in front of every
function do not understand S3 method dispatch and are just clutching at
a comfort blanket that sets them up for failure. ;-)

Admittedly, it is now quite a while back that I had the joy of
debugging the following situation (but it happened at least twice in
this millennium if memory serves correctly):

  * Author of package X relies on foo(bar) to be dispatched to
    foo.default().
  * Author of package Y loads a zillion packages, one of which defines
    (and registers) a method foo.bar()
  * User first attaches package X and then package Y (well, order does
    not really matter).
    There is no warning about masked functions (why would there be?) or
    anything else.
  * User calls function in package X that relies on foo(bar) to be
    dispatched to foo.default(), but it is now dispatched to foo.bar().
  * foo.bar() returns something different to foo.default() on an object
    of class bar and, hence, all hell breaks lose in the function in
    package X that used the call foo(bar).

And you can put packagename:: in front of every function call in
package X (and Y and Z &c) until the cows come home and it would not
avoid this problem.

Sorry, but except if the "put package:: in front of every function"
crowd does not also start advocating that generics are not be used but
rather the desired method be called directly, I don't think they can be
taken seriously.

Just my 0.02 (Australian of course).

Cheers,

	Berwin
#
It?s completely reasonable to decline to do extra work to support it, but at the same time: Qualified calls are widely used and recommended, and users are also being completely reasonable when they try to use them (probably without checking the manual!) and expect them to work.

Would there be a tolerably easy way to make the fit fail loudly on `survival::strata(?)` rather than return the wrong result?
#
G'day Terry,

On Mon, 26 Aug 2024 09:42:10 -0500
"Therneau, Terry M., Ph.D. via R-devel" <r-devel at r-project.org> wrote:
[...]
[...]
Not that I want to start a flame war, or say anything controversial,
but IMHO people who want to put packagename:: in front of every
function do not understand S3 method dispatch and are just clutching at
a comfort blanket that sets them up for failure. ;-)

Admittedly, it is now quite a while back that I had the joy of
debugging the following situation (but it happened at least twice in
this millennium if memory serves correctly):

  * Author of package X relies on foo(bar) to be dispatched to
    foo.default().
  * Author of package Y loads a zillion packages, one of which defines
    (and registers) a method foo.bar()
  * User first attaches package X and then package Y (well, order does
    not.
    There is no warning about masked functions (why would there be?) or
    anything else.
  * User calls function in package X that relies on foo(bar) to be
    dispatched to foo.default(), but it is now dispatched to foo.bar().
  * foo.bar() returns something different to foo.default() on an object
    of class bar and, hence, all hell breaks lose in the function in
    package X that used the call foo(bar).

And you can put packagename:: in front of every function call in
package X (and Y and Z &c) until the cows come home and it would not
avoid this problem.

Sorry, but except if the "put package:: in front of every function"
crowd does not also start advocating that generics are not be used but
rather the desired method be called directly, I don't think they can be
taken seriously.

Just my 0.02 (Australian of course).

Cheers,

        Berwin
#
On 2024-08-26 10:42 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
I received a similar complaint about the tables package, which had 
assumed during argument processing that it was on the search list in 
order to find a function (see 
https://github.com/dmurdoch/tables/issues/30 if you want the details). 
In my case there's only one function exported by tables that wasn't 
being found, "labelSubset".

I don't know any of the details of the survival problems.  When I try 
your example code above without attaching survival, it appears to work. 
So my solution might be irrelevant to you.

The way I found to work around this was to use this code early in the 
processing, when it is trying to turn the data argument into an environment:

     parent <- if (is.environment(data)) data else environment(table)
     if (!exists("labelSubset", envir = parent)) {
       withTableFns <- new.env(parent = parent)
       withTableFns$labelSubset <- labelSubset
     } else
       withTableFns <- parent

     if (is.null(data))
     	data <- withTableFns
     else if (is.list(data))
     	data <- list2env(data, parent = withTableFns)
     else if (!is.environment(data))
     	stop("'data' must be a dataframe, list or environment")
     	
This inserts a new environment containing just that one tables function.

One issue is if a user has "labelSubset" already in the environment; I 
decided to use that one on the assumption that the user did it 
intentionally.  It would have been better to use a name that was less 
likely to show up in another package, but it's old code.

This isn't on CRAN yet, so I'd be interested in hearing about problems 
with this approach, or better solutions.

Duncan Murdoch
#
One could define a function that removes all instances of 'survival::' from
an expression, returning the fixed up expression, and applying it to all
formulae given as arguments to your survival functions.  E.g.,

removeDoubleColonSurvival <- function (formula)
{
  doubleColon <- as.name("::")
  survival <- as.name("survival")
  fix <- function(expr) {
    if (is.call(expr) && identical(expr[[1]], doubleColon) &&
identical(expr[[2]], survival)){
      expr <- expr[[3]]
    } else if (is.call(expr)) {
      for(i in seq_along(expr)){
        expr[[i]] <- fix(expr[[i]])
      }
    }
    expr
  }
  fix(formula)
}

identical(f(y ~ f(x) + survival::g(x,10) + z),
                  y ~ f(x) + g(x,10) + z)
# [1] TRUE

-Bill

On Mon, Aug 26, 2024 at 7:42?AM Therneau, Terry M., Ph.D. via R-devel <
r-devel at r-project.org> wrote:

            

  
  
#
On 2024-08-26 12:26 p.m., Chris Black wrote:
If the issues in survival are the same as the issues I saw in the tables 
package, the way to issue such a message would be to put code like

  if (! ("package:survival" %in% search()))
    stop("'survival' needs to be attached using library() or require()")

in functions that could trigger the problems.

Duncan Murdoch
#
? Mon, 26 Aug 2024 09:42:10 -0500
"Therneau, Terry M., Ph.D. via R-devel" <r-devel at r-project.org> ?????:
Apologies if the following has no chance to work for reasons obvious to
everyone else, but *currently*, terms(formula, specials= c('strata',
'survival::strata')) seems to recognise `survival::strata`. Would it be
possible to then post-process the terms object and retain only one kind
of 'strata' special?

Having said that, if https://bugs.r-project.org/show_bug.cgi?id=18568
is merged, this will probably break and will instead require
recognising `::` as a special and then manually figuring out which
function is being imported from which package.
#
On 2024-08-26 12:34 p.m., Duncan Murdoch wrote:
Of course, posting this meant I discovered a bug in it:  if 
is.environment(data) was TRUE, the modification was ignored.  Line 8 
should be

        if (is.null(data) || is.environment(data))

to handle that case.

Duncan Murdoch
#
Thanks to all for the responses. A couple notes It is nice to get the overall feedback 
that I'm not nuts to be terribly annoyed by this, and don't need to fix it tomorrow. 
Berwin 's note brings to mind the old adage that "The reason it is so hard to make things 
foolproof is that fools are so ingeneous." 1. Using survival::strata(inst) in the rhs of 
the survdiff call does not generate an error message. Because the stata function is not 
recognized as special one instead gets the wrong answer. (Or I should say, "the correct 
answer to a different question".) Ditto for most of the rest of the package functions. The 
very worst kind of bug. 2. Using specials =c("strata", "survival::strata") could work. I 
always process the result with a small "untangle.specials" function, a leftover from when 
R and Splus returned slightly different formula structures. I could put post-processing 
there. I'll think on this some more. But Ivan's follow-up was not encouraging. 3. Bill's 
suggestion to pre-fix the formula. Not a bad idea. If I followed the Call <- match.call() 
that lives at the top of my code by an immediate fix of the formula portion of the list, 
then all else would flow. And as perhaps a bit of a snark, the user would see the 
corrected form in their printout. The nuts who want to call a survival routine without 
attaching the name space will be out of luck though. Terry
#
I wouldn't go so far as to call people who don't want to wholesale attach
namespaces as "nuts."

{survival} is provided via the {censored} R package to integrate into the
{tidymodels} ecosystem.
And the reverse imports of the package is massive! Assuming that each and
every one of them
should attach the entire namespace is a bit presumptuous!

I am one of those nuts that likes to namespace everything. It just takes
one conflicting method to
taint it :)



On Mon, Aug 26, 2024 at 12:44?PM Therneau, Terry M., Ph.D. via R-devel <
r-devel at r-project.org> wrote:

            

  
  
#
The survival package itself has a tiny list of reverse imports, so there is no savings from avoiding the survival namespace.   (I don?t have a choice: since it is on the recommended list I can only depend on base and recommended.  The vignettes in particular would be nicer in knitr than Sweave?).   So yes, I think that avoiding this particular namespace is being a bit excessive.

The penchant of tidy users to ?import the universe? is a separate issue.   I have colleagues with a dozen ?standard?s that they place at the top of every analysis file whether they need them or not.   I work in medical research and it is important to be able to exactly recreate an analysis later, e.g., when a paper review or more seriously an FDA auditor appears, adding lots of dependencies just makes that harder.  But that is a within-group debate that I will lose (?old fogey?).

But perhaps most pertinent: I understand using survival::coxph() on the outer call, but futher adding survival::Surv or survival::strata or ? WITHIN  said call is completely unnecessary, since the routine looks in its own namespace first.   (Unless you start mucking about with the language model, with posix rather likes to do?)

Terry T

From: Josiah Parry <josiah.parry at gmail.com>
Date: Monday, August 26, 2024 at 5:09?PM
To: Therneau, Terry M., Ph.D. <therneau at mayo.edu>
Cc: r-devel at r-project.org <r-devel at r-project.org>
Subject: [EXTERNAL] Re: [Rd] specials and ::
I wouldn't go so far as to call people who don't want to wholesale attach namespaces as "nuts."

{survival} is provided via the {censored} R package to integrate into the {tidymodels} ecosystem.
And the reverse imports of the package is massive! Assuming that each and every one of them
should attach the entire namespace is a bit presumptuous!

I am one of those nuts that likes to namespace everything. It just takes one conflicting method to
taint it :)
On Mon, Aug 26, 2024 at 12:44?PM Therneau, Terry M., Ph.D. via R-devel <r-devel at r-project.org<mailto:r-devel at r-project.org>> wrote:
Thanks to all for the responses. A couple notes It is nice to get the overall feedback
that I'm not nuts to be terribly annoyed by this, and don't need to fix it tomorrow.
Berwin 's note brings to mind the old adage that "The reason it is so hard to make things
foolproof is that fools are so ingeneous." 1. Using survival::strata(inst) in the rhs of
the survdiff call does not generate an error message. Because the stata function is not
recognized as special one instead gets the wrong answer. (Or I should say, "the correct
answer to a different question".) Ditto for most of the rest of the package functions. The
very worst kind of bug. 2. Using specials =c("strata", "survival::strata") could work. I
always process the result with a small "untangle.specials" function, a leftover from when
R and Splus returned slightly different formula structures. I could put post-processing
there. I'll think on this some more. But Ivan's follow-up was not encouraging. 3. Bill's
suggestion to pre-fix the formula. Not a bad idea. If I followed the Call <- match.call()
that lives at the top of my code by an immediate fix of the formula portion of the list,
then all else would flow. And as perhaps a bit of a snark, the user would see the
corrected form in their printout. The nuts who want to call a survival routine without
attaching the name space will be out of luck though. Terry

______________________________________________
R-devel at r-project.org<mailto:R-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
In my view, that's just plain wrong, because strata() is not a function but a special operator in a model formula. Wouldn't it also blow up on stats::offset()?

Oh, yes it would:
Call:
lm(formula = y ~ x + offset(z))

Coefficients:
(Intercept)            x  
     0.7350       0.0719
Call:
lm(formula = y ~ x + stats::offset(z))

Coefficients:
     (Intercept)                 x  stats::offset(z)  
          0.6457            0.1078            0.8521  


Or, to be facetious:
Call:
lm(formula = y ~ base::"+"(x, z))

Coefficients:
    (Intercept)  base::"+"(x, z)  
         0.4516           0.4383  



-pd

  
    
#
You are right of course, Peter, but I can see where some will get confused.?? In a formula 
some symbols and functions are special operators, and others are simple functions.?? That 
is the reason one needs I(events/time) to put a rate in as a variable.??? Someone who 
types 'offset' at the command line will see that there actually IS a function behind the 
scenes.

Does anyone see a downside to Bill Dunlap's suggestion where the first step of my formula 
processing would be to "clean off" any survival:: modifiers???? That is, something that 
will break? After all, the code already has a lot of? "if (....) "? lines for other common 
user errors.?? I could view it as just saving me the time to deal with the 'we found an 
error' emails.?? I would output the corrected version as the "call" component.

Terry
On 8/27/24 03:38, peter dalgaard wrote:

  
  
#
On 2024-08-27 9:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:
I don't know if you have any data vectors that someone might use in a 
fit, but conceivably

   survdiff( Surv(time, status) ~ survival::datavector +
             strata(inst),  data=lung)

would mean something different than

   survdiff( Surv(time, status) ~ datavector +
             strata(inst),  data=lung)

if a user had a vector named datavector.

Duncan Murdoch
#
I don't see a big downside, but I will say that there's always a bit 
of a tradeoff between "train the users to do it right" (by writing clear 
documentation and informative error messages) and "make things easy for 
the user" (by making the code more complicated to handle things for them 
automatically).

    For example, part of me wishes that (1) there were only one way to 
provide a response variable for a binomial variable with N>1 (preferably 
by specifying proportions and a weights argument) and (2) grouping 
variables in lme4/nlme/et al always had to be specified as factors 
(rather than automatically being coerced to factors). Making those 
decisions would avoid so much code complexity ... (and eliminate one 
class of errors, i.e. people including a continuous covariate as a 
random-effect grouping variable because they think of 'random effect' 
and 'nuisance variable' as synonyms ...)

   But taking the "train the users to do it right" path does also 
involve more discussion with users ("if your software knows what I 
should be doing why can't it just do it for me?")

   cheers
    Ben Bolker
On 2024-08-27 9:43 a.m., Therneau, Terry M., Ph.D. via R-devel wrote:

  
    
#
How about a hybrid of ?fix it up for the user? and ?train the user to do it right? by checking for ?survival::?, generating an informative warning message, removing it and proceeding with execution.

Two birds, one stone ?
-G

--  
Change your thoughts and you change the world.
--Dr. Norman Vincent Peale

  
  
#
removeDoubleColonSurvival (or 'f', as it was shown in the example) could be
changed to only remove the 'survival::' from expressions that involve the
names to be used in specials.  E.g., change

if (is.call(expr) && identical(expr[[1]], doubleColon) &&
identical(expr[[2]], survival))

to
specialNames <- c("strata", "whatever")
...
if (is.call(expr)
    && identical(expr[[1]], doubleColon)
    && identical(expr[[2]], survival)
    && is.name(expr[[3]]) && is.element(as.character(expr[[3]]),
specialNames))

-Bill

On Tue, Aug 27, 2024 at 7:01?AM Duncan Murdoch <murdoch.duncan at gmail.com>
wrote: