Skip to content

Dropping RHS of a formula using NULL assignment

4 messages · Blackwell, Matthew, Duncan Murdoch, Brian Ripley +1 more

#
Hello all,

In attempting to create a one-sided formula from a two-sided formula,
I discovered that the following syntax will successfully complete this
operation:
~x + z
Class 'formula'  language ~x + z
  ..- attr(*, ".Environment")=<environment: R_GlobalEnv>

In searching through the formula documentation, I couldn't find this
technique as documented and wondered whether or not it is expected and
if it makes sense to develop a package against the behavior. I'm using
R 4.1.0, but I see the same on R-devel (r81303). I asked on Twitter,
but someone thought this list might be a better venue.

Apologies if I missed some documentation and thanks in advance.

Cheers,
Matt

~~~~~~~~~~
Matthew Blackwell
Associate Professor of Government
Harvard University
https://www.mattblackwell.org
#
On 14/12/2021 3:26 p.m., Blackwell, Matthew wrote:
The source "y ~ x + z" parses to a call to the `~` function with 
arguments y and x + z.  Calls have the function as the first element, 
and arguments follow:  so f[1] would be ~, f[2] would be y, f[3] would 
be x + z.

You can see this if you pass f through as.list():

 > as.list(f)
[[1]]
`~`

[[2]]
y

[[3]]
x + z

Setting element 2 to NULL removes it, so you see

 > f[2] <- NULL
 > as.list(f)
[[1]]
`~`

[[2]]
x + z

I think it's safe to make use of this even if it's undocumented.  It's a 
pretty basic aspect of formulas.  I'd guess there are lots of packages 
already using it, but I can't point to any particular examples.

(I've ignored the difference between an unevaluated formula and an 
evaluated one, but they're almost the same, the only important 
difference in the attributes:  evaluating it gives it a class and an 
environment.)

Duncan Murdoch
#
On 14/12/2021 20:26, Blackwell, Matthew wrote:
See ?"~", which says

      A formula has mode ?call?.  It can be subsetted by ?[[?: the
      components are ?~?, the left-hand side (if present) and the
      right-hand side _in that order_.

That would suggest that

f <- y ~ x + z
f[[2]] <- NULL

was the documented way (and the one I would have used).   However, ?"[" says

      ?[? and ?[[? are sometimes applied to other recursive objects such
      as calls and expressions.  Pairlists are coerced to lists for
      extraction by ?[?, but all three operators can be used for
      replacement.

  
    
#
Am 14.12.21 um 21:57 schrieb Prof Brian Ripley:
I'd also mention delete.response() here. It takes a "terms" object (a 
formula with attributes) and uses the same technique internally to 
remove the response -- if there is one... I.e., be sure that 
length(f)==3 before dropping the second element.

Best regards,

	Sebastian Meyer