Skip to content

Suggestion: Modify common hypothesis tests and models to work better with pipes

1 message · Måns Thulin

#
Dear R-devel,

Some months ago I posted to this list proposing that common statistical
functions such as t.test, wilcox.test, lm, glm, and aov could benefit from
data.frame S3 methods, so that a data frame can be piped into them directly
without the placeholder syntax:

    penguins |>
      subset(species != "Gentoo") |>
      t.test(bill_len ~ species)

That post went largely unnoticed. Rather than assume the idea had no merit,
I decided to implement it and see how it worked in practice.

The result is pipedreams, a small CRAN-ready package that adds data.frame
methods for t.test, wilcox.test, lm, glm, aov, and nls from base R, as well
as survfit, coxph, and survreg from survival, lmer and glmer from lme4, and
polr, rlm, lda, and qda from MASS. The implementation is straightforward in
every case: the data.frame method simply reorders the arguments and
delegates to the existing formula method. No existing behaviour is changed.

The package is available at: https://github.com/mthulin/pipedreams

Having lived with this for a while, I am more convinced than before that
these changes belong in the packages themselves rather than in a wrapper
package. The case for including them in base R (and in survival, lme4, and
MASS, if the package maintainers are interested) is:

1. These functions predate the pipe. Had they been written today, the data
argument would almost certainly have come first. The data.frame methods
correct an accidental asymmetry rather than introduce new design.

2. The implementation is small and carries no risk of breaking existing
code, since no currently-valid call passes a data frame as the first
positional argument to any of these functions.

3. Keeping this in a third-party package means users must know to install
it, documentation is scattered, and the fix is invisible to people who
encounter the friction in the first place.

For the base R functions the change amounts to registering a method of the
form:

    t.test.data.frame <- function(x, formula, ...) {
      if (!inherits(formula, "formula"))
        stop("'formula' must be a formula object")
      t.test(formula, data = x, ...)
    }

For lm, glm, aov, and nls, which are not currently S3 generics, the change
would additionally require promoting them to generics; something the
pipedreams package demonstrates is workable, with default methods that pass
through to the originals.

I recognise that changes to base R go through Bugzilla and require
consensus among the core team. I am happy to file a formal feature request,
provide a patch, or do whatever is most useful. I would also welcome any
feedback on why this approach might be problematic ? the lack of response
to my earlier post left me uncertain whether the idea was uncontroversial,
unwanted, or simply missed.

Thank you for your time.
M?ns

On Tue, Jun 10, 2025 at 8:46?AM M?ns Thulin <gausseliminering at gmail.com>
wrote: