Skip to content

In glm explicitly search stats for family functions

7 messages · Tim Taylor, Martin Maechler, Serguei Sokol +2 more

#
I appreciate there are likely many places where calling a stats function via `::` and without the stats package being loaded could be problematic but would R core have any interest in adapting functions to accommodate this where possible?

The example I ran in to today can be seen below:

ex <- function() {
    counts <- c(18,17,15,20,10,20,25,13,12)
    outcome <- gl(3,1,9)
    treatment <- gl(3,3)
    stats::glm(counts ~ outcome + treatment, family = "poisson")
}

tools::R(ex)
#> 
#> Call:  stats::glm(formula = counts ~ outcome + treatment, family = "poisson")
#> 
#> Coefficients:
#> (Intercept)     outcome2     outcome3   treatment2   treatment3  
#>   3.045e+00   -4.543e-01   -2.930e-01    6.972e-16    8.237e-16  
#> 
#> Degrees of Freedom: 8 Total (i.e. Null);  4 Residual
#> Null Deviance:	    10.58 
#> Residual Deviance: 5.129 	AIC: 56.76

tools::R(ex, env=c("R_DEFAULT_PACKAGES=NULL"))
#> Error: error in inferior call:
#>   object 'poisson' of mode 'function' was not found

The second call fails due to the following line in glm:

if (is.character(family)) 
        family <- get(family, mode = "function", envir = parent.frame())

A non-breaking patch (AFAICT) could add an additional branch that explicitly searches a lookup of functions in the stats package if the above call to `get` failed.

Again I understand this could very much be a case of, "don't do that", but ...

Regards

Tim
#
> I appreciate there are likely many places where calling a stats function via `::` and without the stats package being loaded could be problematic but would R core have any interest in adapting functions to accommodate this where possible?

For my part (just one R corer): No, not at all.
Using R without 'stats' is like  "<your favorite> without
<your_other_fav>", e.g., like doing math without the (greek
Sigma) summation sign:  possible, but rarely a good idea and just
complicating things unnecessarily.

Martin


    > The example I ran in to today can be seen below:

    > ex <- function() {
    > counts <- c(18,17,15,20,10,20,25,13,12)
    > outcome <- gl(3,1,9)
    > treatment <- gl(3,3)
    > stats::glm(counts ~ outcome + treatment, family = "poisson")
    > }

    > tools::R(ex)
    > #> 
    > #> Call:  stats::glm(formula = counts ~ outcome + treatment, family = "poisson")
    > #> 
    > #> Coefficients:
    > #> (Intercept)     outcome2     outcome3   treatment2   treatment3  
    > #>   3.045e+00   -4.543e-01   -2.930e-01    6.972e-16    8.237e-16  
    > #> 
    > #> Degrees of Freedom: 8 Total (i.e. Null);  4 Residual
    > #> Null Deviance:	    10.58 
    > #> Residual Deviance: 5.129 	AIC: 56.76

    > tools::R(ex, env=c("R_DEFAULT_PACKAGES=NULL"))
    > #> Error: error in inferior call:
    > #>   object 'poisson' of mode 'function' was not found

    > The second call fails due to the following line in glm:

    > if (is.character(family)) 
    > family <- get(family, mode = "function", envir = parent.frame())

    > A non-breaking patch (AFAICT) could add an additional branch that explicitly searches a lookup of functions in the stats package if the above call to `get` failed.

    > Again I understand this could very much be a case of, "don't do that", but ...

    > Regards

    > Tim

    > ______________________________________________
    > R-devel at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel
#
Le 09/03/2026 ? 12:53, Tim Taylor a ?crit?:
You can do it on your own side by modifying the function 'ex' like:

ex <- function() {
     counts <- c(18,17,15,20,10,20,25,13,12)
     outcome <- gl(3,1,9)
     treatment <- gl(3,3)
     require(stats)
     stats::glm(counts ~ outcome + treatment, family = "poisson")
}

Note "require(stats)" before glm() call. In a classical R env, this is almost a no-cost operation because stats is already attached and in R_DEFAULT_PACKAGES=NULL
  case, it does what you need.

Best,
Serguei.
#
I believe you can

ex3 <- function() {
  counts <- c(18,17,15,20,10,20,25,13,12)
  outcome <- gl(3,1,9)
  treatment <- gl(3,3)
  stats::glm(counts ~ outcome + treatment, family = stats::poisson)
}

I think if family is character it's searching in parent.frame() rather
than package namespace.


~Michal


On Mon, Mar 9, 2026 at 12:54?PM Tim Taylor
<tim.taylor at hiddenelephants.co.uk> wrote:
#
On 2026-03-09 11:39 a.m., Serguei Sokol via R-devel wrote:
That's a relatively harmless thing to do in your own scripts, but you 
shouldn't do that in a package that you expect others to use.  There may 
be a reason they don't have stats on the search list, and that puts it 
there.

Duncan Murdoch
#
On 2026-03-09 7:53 a.m., Tim Taylor wrote:
There are several other cases in stats where functions assume that a 
function name passed as a string can be found on the search list.  They 
aren't always consistent about the search order:

C() searches for contr in the stats namespace first, as does 
make.tables.aovproj().

contrasts() does like glm() for family, it looks in the parent frame.
ks.test.default() likewise.

make.tables.aovprojlist() looks locally first, but doesn't restrict the 
search to functions.

I'm not sure I spotted all cases of this, and I didn't look in any other 
base packages besides stats.

So I think a patch to fix this should determine a consistent approach, 
and apply it everywhere.  Maybe too big a can of worms?

Duncan Murdoch
#
Thank you Duncan

Yes, given Martin's comment it is probably too big a can for the reward.

To other commentators - the example was purely to illustrate what Duncan more clearly articulated (i.e. stats looking for strings on the search path).  In my own code I was evaluating things in an environment whose parent was `basenv()`. I'm now restructuring this as clearly it was a little too restrictive/fragile.

Best

Tim
On Mon, 9 Mar 2026, at 3:49 PM, Duncan Murdoch wrote: