Skip to content

Pipe operator status, placeholders?

12 messages · Benjamin Redelings, Peter Dalgaard, Lionel Henry +4 more

#
Hi,

I see that R 4.2 adds the underscore _ as a placeholder for the new 
forward pipe operator |> , but only for named arguments. The reason why 
placeholders for position arguments was NOT added isn't clear to me, so 
I've been looking for the discussion around the introduction of the 
placeholder.

By searching subject lines in the r-devel mailing list archive, I've found

 ??? https://stat.ethz.ch/pipermail/r-devel/2021-April/080646.html

https://stat.ethz.ch/pipermail/r-devel/2021-January/080396.html

https://stat.ethz.ch/pipermail/r-devel/2020-December/080173.html and 
following messages

but not much else.

1. Am I looking in the wrong place?

2. What is the reasoning behind allowing _ as a placeholder only for 
named arguments?

take care,

-BenRI
2 days later
#
You probably want Luke Tierney for the full story, but what I gather from the deliberations (on the private R-core list), there are issues with how non-funcall syntax like lm(....) |> _$coef[2] should work. This, in turn, has to do with wanting to have the placeholder occur only as a toplevel substitution (i.e. "["("$"(_, coef), 2) is a no-go. And the reason for that has to do with the way the pipe works in the absense of placeholder, e.g. the parser gets confused by
Error in f(x, g(x = "_")) : invalid use of pipe placeholder

-pd

  
    
#
I think usage of the placeholder in nested calls is another question.
The placeholder requires a named argument to improve readability
because it's a single character that is easy to miss.

Best,
Lionel
On 4/19/22, peter dalgaard <pdalgd at gmail.com> wrote:
#
Thanks to you and Lionel for the info!? I wasn't sure if there was a 
private core developers list, or if I was just looking in the wrong place.

1. Its good to know that the only reason not to allow _ in positional 
arguments is that its easy to miss.? Personally, I would like to be able 
to write f(x, _), but its not a big deal.

Is the idea that when you see

 ??? x |> f(x, y, _, z, w)

its hard for the eye to scan the RHS and find the _?

Hmm.... I notice that a lot of languages (i.e. Haskell) use _ as a 
wildcard pattern, and I don't recall any complaints about it being hard 
to see.

2. I can see how there would be issues with placeholders that aren't at 
the top level... although I'm not sure precisely what they are.? Any 
hints? :-)? I did briefly look at the parser/grammar file...

Thanks again for the info!

-BenRI
On 4/19/22 3:24 AM, peter dalgaard wrote:
#
Ben,

I think you considered only part of Peter's response. Placeholders can safely only work for the first call, hence at the top level. Anything below may not do what you think as you'd have to skip frames and suddenly things can have entirely different meaning since you're not evaluating in the scope of the preceding call. That is also the reason why only named arguments are allowed, because if it was not the case then you might be tempted to write x |> _$foo[1] which looks legit at a first glance, but is no longer at the top level (since it is `[`(`$`(_, foo), 1)) and thus not valid.

Cheers,
Simon
#
On 19/04/2022 6:55 p.m., Simon Urbanek wrote:
The R pipe is purely syntactic sugar, it just transforms expressions.  I 
think the real reason not to allow _ to be deeply nested in an 
expression is that it would make parsing really hard.  If you have

  x |> { some really huge expression }

then the parser would have to parse the huge expression and search it 
for underscores to see what to do with x.  With the current rule, the 
search is much easier, it's just at the top level.

There are probably cases where deeply nested underscores would be 
ambiguous, e.g. if that huge expression contained a pipe operator 
itself, who gets the substitution?

The other limitation of the transformation approach is that _ can only 
occur once.  magrittr evaluates x and puts the value in where it sees a 
dot, so this works to print 2 once and give a value of 4:

   print(2) %>% `+`(., .)

It's equivalent to

   *tmp* <- print(2)
   *tmp* + *tmp*

However, you'd have the print executed twice in

   print(2) |> `+`(_, _)

(if such was allowed), because it would be equivalent to

   print(2) + print(2)

Duncan Murdoch
#
I vaguely remember that some package versions of piping once simply rewrote your code?to make a period (a valid identifier in R) be the recipient of part of a calculation.
So the code generated looked like:
. <- calculation
Then the next line was another:
. <- calculation
So if the calculation included a period where a variable name might fit,?it simply worked as in:
. <- lm(formula, data = . )
But when the pipe is implemented very differently, other techniques may be needed?whether using a period or underscore or anything. Syntactic sugar is only sweet?when it works consistently and reliably without side unintended side effects.


-----Original Message-----
From: Duncan Murdoch <murdoch.duncan at gmail.com>
To: Simon Urbanek <simon.urbanek at R-project.org>; Benjamin Redelings <benjamin.redelings at gmail.com>
Cc: R-devel at r-project.org
Sent: Tue, Apr 19, 2022 8:22 pm
Subject: Re: [Rd] Pipe operator status, placeholders?
On 19/04/2022 6:55 p.m., Simon Urbanek wrote:
The R pipe is purely syntactic sugar, it just transforms expressions.? I 
think the real reason not to allow _ to be deeply nested in an 
expression is that it would make parsing really hard.? If you have

? x |> { some really huge expression }

then the parser would have to parse the huge expression and search it 
for underscores to see what to do with x.? With the current rule, the 
search is much easier, it's just at the top level.

There are probably cases where deeply nested underscores would be 
ambiguous, e.g. if that huge expression contained a pipe operator 
itself, who gets the substitution?

The other limitation of the transformation approach is that _ can only 
occur once.? magrittr evaluates x and puts the value in where it sees a 
dot, so this works to print 2 once and give a value of 4:

? print(2) %>% `+`(., .)

It's equivalent to

? *tmp* <- print(2)
? *tmp* + *tmp*

However, you'd have the print executed twice in

? print(2) |> `+`(_, _)

(if such was allowed), because it would be equivalent to

? print(2) + print(2)

Duncan Murdoch
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
Hi Simon,

1. Peter said that the parser "gets confused" by placeholder's that 
don't occur at the top-level.? But this raises the question of why the 
parser gets confused, and whether you could fix it.

It sounds like you are saying something else -- that if E |> f(g(_)) was 
parsed as f(g(E)), then E is evaluated in an unexpected context.? Is 
this situation similar to LISP, where (I think) the s-exp E would yield 
different results if evaluated in a different "environment"?

2. With regards to named arguments, it seems like you could (in theory) 
require that `_` only occurs as an entry in a standard function call by 
making it a production of the grammar node "sub" in this file:

 ??? https://github.com/wch/r-source/blob/trunk/src/main/gram.y

take care,

-BenRI
On 4/19/22 6:55 PM, Simon Urbanek wrote:
#
Generally all arguments to a function are named and you are free to?spell out the names. So unless a function had a named argument calledjust "_", it sounds like you could use the new underscore with just?about any function as long as you fully spelled out all arguments as named.
Is that too high a price to pay for using that function in the new pipeline?



-----Original Message-----
From: Benjamin Redelings <benjamin.redelings at gmail.com>
To: Simon Urbanek <simon.urbanek at R-project.org>
Cc: R-devel at r-project.org
Sent: Tue, Apr 19, 2022 11:07 pm
Subject: Re: [Rd] Pipe operator status, placeholders?

Hi Simon,

1. Peter said that the parser "gets confused" by placeholder's that 
don't occur at the top-level.? But this raises the question of why the 
parser gets confused, and whether you could fix it.

It sounds like you are saying something else -- that if E |> f(g(_)) was 
parsed as f(g(E)), then E is evaluated in an unexpected context.? Is 
this situation similar to LISP, where (I think) the s-exp E would yield 
different results if evaluated in a different "environment"?

2. With regards to named arguments, it seems like you could (in theory) 
require that `_` only occurs as an entry in a standard function call by 
making it a production of the grammar node "sub" in this file:

 ??? https://github.com/wch/r-source/blob/trunk/src/main/gram.y

take care,

-BenRI
On 4/19/22 6:55 PM, Simon Urbanek wrote:
______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
#
On 4/19/22 8:22 PM, Duncan Murdoch wrote:
As long as the search for placeholders is linear in the size of the 
total expression, I suspect that this would will be quite fast.? It 
might be a bit tricky to make sure that each sub-expression is searched 
only once, if there are nested pipes.
If you have something like x |> f(_, y |> g(_)), then you could make a 
rule, such as: when the top-level pipe searches for a placeholder in its 
RHS, it ignores the RHS of any nested pipe operator that it finds.

Interestingly, this happens automatically.? When the top-level pipe 
operator searches its RHS for the placeholder, it would see `f(_ , 
g(y))`.? Any nested pipe in the RHS would already have consumed its own 
placeholder and been transformed into a pipeless expression.

However, the top-level pipe would search the expression `g(y)` for a 
placeholder, which means that the expression gets searched twice.
Yeah, this makes sense.? If you allow the placeholder to occur twice, 
you can't just substitute the expression, because then you could 
evaluate it twice.? Then you have to implement lazy evaluation, which 
the lambda function syntax `x |> (\(d) ...)()` already does.

take care,

-BenRI
1 day later
#
At some point there will probably be a blog post about the design of
the forward pipe operator in base, but that is not something I will
think about until after the current semester is over and my backlog of
other things is cleared.

Best,

luke
On Tue, 19 Apr 2022, peter dalgaard wrote:

            

  
    
1 day later
#
Completely understandable.? I'll look forward to hearing more about it 
later.

-BenRI
On 4/21/22 11:57 AM, luke-tierney at uiowa.edu wrote: