Dear R-devel,
The most popular piping operator sits in the package `magrittr` and is used
by a huge amount of users, and imported /reexported by more and more
packages too.
Many workflows don't even make much sense without pipes nowadays, so the
examples in the doc will use pipes, as do the README, vignettes etc. I
believe base R could have a piping operator so packages can use a pipe in
their code or doc and stay dependency free.
I don't suggest an operator based on complex heuristics, instead I suggest
a very simple and fast one (>10 times than magrittr in my tests) :
`%.%` <- function (e1, e2) {
eval(substitute(e2), envir = list(. = e1), enclos = parent.frame())
}
iris %.% head(.) %.% dim(.)
#> [1] 6 5
The difference with magrittr is that the dots must all be explicit (which
sits with the choice of the name), and that special magrittr features such
as assignment in place and building functions with `. %>% head() %>% dim()`
are not supported.
Edge cases are not surprising:
```
x <- "a"
x %.% quote(.)
#> .
x %.% substitute(.)
#> [1] "a"
f1 <- function(y) function() eval(quote(y))
f2 <- x %.% f1(.)
f2()
#> [1] "a"
```
Looking forward for your thoughts on this,
Antoine
should base R have a piping operator ?
33 messages · Jeff Ryan, David Hugh-Jones, Ant F +10 more
Messages 1–25 of 33
Is there some concrete example of your ?many workflows don?t even make much sense without pipes nowadays? comment? I don?t think I?m opposed to pipes in the absolute, but as I am now deep into my second decade of using R I?ve done just fine without them. As I would guess have the vast majority of users and code that is used throughout the world. Jeff
On Sat, Oct 5, 2019 at 09:34 Ant F <antoine.fabri at gmail.com> wrote:
Dear R-devel,
The most popular piping operator sits in the package `magrittr` and is used
by a huge amount of users, and imported /reexported by more and more
packages too.
Many workflows don't even make much sense without pipes nowadays, so the
examples in the doc will use pipes, as do the README, vignettes etc. I
believe base R could have a piping operator so packages can use a pipe in
their code or doc and stay dependency free.
I don't suggest an operator based on complex heuristics, instead I suggest
a very simple and fast one (>10 times than magrittr in my tests) :
`%.%` <- function (e1, e2) {
eval(substitute(e2), envir = list(. = e1), enclos = parent.frame())
}
iris %.% head(.) %.% dim(.)
#> [1] 6 5
The difference with magrittr is that the dots must all be explicit (which
sits with the choice of the name), and that special magrittr features such
as assignment in place and building functions with `. %>% head() %>% dim()`
are not supported.
Edge cases are not surprising:
```
x <- "a"
x %.% quote(.)
#> .
x %.% substitute(.)
#> [1] "a"
f1 <- function(y) function() eval(quote(y))
f2 <- x %.% f1(.)
f2()
#> [1] "a"
```
Looking forward for your thoughts on this,
Antoine
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Actually, base R already has a pipe fairly close to the one you describe: ->.; iris ->.; head(.) ->.; dim(.) # [1] 6 5 I've called it the Bizarro pipe ( http://www.win-vector.com/blog/2016/12/magrittrs-doppelganger/ <http://www.win-vector.com/blog/2016/12/magrittrs-doppelganger/> ), and for some reason we chickened out and didn't spend time on it in the dot pipe paper ( https://journal.r-project.org/archive/2018/RJ-2018-042/index.html <https://journal.r-project.org/archive/2018/RJ-2018-042/index.html> ). For documentation Bizarro pipe has the advantage that one can work out how it works from the application itself, with out reference to a defining function.
On Oct 5, 2019, at 7:34 AM, Ant F <antoine.fabri at gmail.com> wrote:
Dear R-devel,
The most popular piping operator sits in the package `magrittr` and is used
by a huge amount of users, and imported /reexported by more and more
packages too.
Many workflows don't even make much sense without pipes nowadays, so the
examples in the doc will use pipes, as do the README, vignettes etc. I
believe base R could have a piping operator so packages can use a pipe in
their code or doc and stay dependency free.
I don't suggest an operator based on complex heuristics, instead I suggest
a very simple and fast one (>10 times than magrittr in my tests) :
`%.%` <- function (e1, e2) {
eval(substitute(e2), envir = list(. = e1), enclos = parent.frame())
}
iris %.% head(.) %.% dim(.)
#> [1] 6 5
The difference with magrittr is that the dots must all be explicit (which
sits with the choice of the name), and that special magrittr features such
as assignment in place and building functions with `. %>% head() %>% dim()`
are not supported.
Edge cases are not surprising:
```
x <- "a"
x %.% quote(.)
#> .
x %.% substitute(.)
#> [1] "a"
f1 <- function(y) function() eval(quote(y))
f2 <- x %.% f1(.)
f2()
#> [1] "a"
```
Looking forward for your thoughts on this,
Antoine
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
--------------- John Mount http://www.win-vector.com/ <http://www.win-vector.com/> Our book: Practical Data Science with R https://www.manning.com/books/practical-data-science-with-r-second-edition <http://www.manning.com/zumel/>
How is your argument different to, say, "Should dplyr or data.table be part of base R as they are the most popular data science packages and they are used by a large number of users?" Kind regards
On Sat, Oct 5, 2019 at 4:34 PM Ant F <antoine.fabri at gmail.com> wrote:
Dear R-devel,
The most popular piping operator sits in the package `magrittr` and is used
by a huge amount of users, and imported /reexported by more and more
packages too.
Many workflows don't even make much sense without pipes nowadays, so the
examples in the doc will use pipes, as do the README, vignettes etc. I
believe base R could have a piping operator so packages can use a pipe in
their code or doc and stay dependency free.
I don't suggest an operator based on complex heuristics, instead I suggest
a very simple and fast one (>10 times than magrittr in my tests) :
`%.%` <- function (e1, e2) {
eval(substitute(e2), envir = list(. = e1), enclos = parent.frame())
}
iris %.% head(.) %.% dim(.)
#> [1] 6 5
The difference with magrittr is that the dots must all be explicit (which
sits with the choice of the name), and that special magrittr features such
as assignment in place and building functions with `. %>% head() %>% dim()`
are not supported.
Edge cases are not surprising:
```
x <- "a"
x %.% quote(.)
#> .
x %.% substitute(.)
#> [1] "a"
f1 <- function(y) function() eval(quote(y))
f2 <- x %.% f1(.)
f2()
#> [1] "a"
```
Looking forward for your thoughts on this,
Antoine
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
I +1 this idea, without judging the implementation details. The pipe operator has proven vastly popular. Adding it would be relatively easy (I think). Having it as part of the core would be a strong guarantee of the future stability of this syntax.
On Sat, 5 Oct 2019 at 15:34, Ant F <antoine.fabri at gmail.com> wrote:
Dear R-devel,
The most popular piping operator sits in the package `magrittr` and is used
by a huge amount of users, and imported /reexported by more and more
packages too.
Many workflows don't even make much sense without pipes nowadays, so the
examples in the doc will use pipes, as do the README, vignettes etc. I
believe base R could have a piping operator so packages can use a pipe in
their code or doc and stay dependency free.
I don't suggest an operator based on complex heuristics, instead I suggest
a very simple and fast one (>10 times than magrittr in my tests) :
`%.%` <- function (e1, e2) {
eval(substitute(e2), envir = list(. = e1), enclos = parent.frame())
}
iris %.% head(.) %.% dim(.)
#> [1] 6 5
The difference with magrittr is that the dots must all be explicit (which
sits with the choice of the name), and that special magrittr features such
as assignment in place and building functions with `. %>% head() %>% dim()`
are not supported.
Edge cases are not surprising:
```
x <- "a"
x %.% quote(.)
#> .
x %.% substitute(.)
#> [1] "a"
f1 <- function(y) function() eval(quote(y))
f2 <- x %.% f1(.)
f2()
#> [1] "a"
```
Looking forward for your thoughts on this,
Antoine
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Sent from Gmail Mobile [[alternative HTML version deleted]]
Hi John,
Thanks, but the Bizzaro pipe comes with many flaws though :
* It's not a single operator
* It has a different precedence
* It cannot be used in a subcall
* The variable assigned to must be on the right
* It doesn't trigger indentation when going to the line
* It creates/overwrite a `.` variable in the worksace.
And it doesn't deal gracefully with some lazy evaluation edge cases such as
:
compose <- function(f, g) { function(x) g(f(x)) }
plus1 <- function(x) x + 1
plus2 <- plus1 %.% compose(.,plus1)
plus2(5)
#> [1] 7
plus1 ->.; compose(.,plus1) -> .; . -> plus2
plus2(5)
#> Error: C stack usage 15923776 is too close to the limit
What I propose on the other hand can always substitute any existing proper
pipe in their standard feature, as long as the dot is made explicit.
Best regards,
Antoine
Le sam. 5 oct. 2019 ? 16:59, John Mount <jmount at win-vector.com> a ?crit :
Actually, base R already has a pipe fairly close to the one you describe: ->.; iris ->.; head(.) ->.; dim(.) # [1] 6 5 I've called it the Bizarro pipe ( http://www.win-vector.com/blog/2016/12/magrittrs-doppelganger/ ), and for some reason we chickened out and didn't spend time on it in the dot pipe paper ( https://journal.r-project.org/archive/2018/RJ-2018-042/index.html ). For documentation Bizarro pipe has the advantage that one can work out how it works from the application itself, with out reference to a defining function. On Oct 5, 2019, at 7:34 AM, Ant F <antoine.fabri at gmail.com> wrote: Dear R-devel, The most popular piping operator sits in the package `magrittr` and is used by a huge amount of users, and imported /reexported by more and more packages too. Many workflows don't even make much sense without pipes nowadays, so the examples in the doc will use pipes, as do the README, vignettes etc. I believe base R could have a piping operator so packages can use a pipe in their code or doc and stay dependency free. I don't suggest an operator based on complex heuristics, instead I suggest a very simple and fast one (>10 times than magrittr in my tests) : `%.%` <- function (e1, e2) { eval(substitute(e2), envir = list(. = e1), enclos = parent.frame()) } iris %.% head(.) %.% dim(.) #> [1] 6 5 The difference with magrittr is that the dots must all be explicit (which sits with the choice of the name), and that special magrittr features such as assignment in place and building functions with `. %>% head() %>% dim()` are not supported. Edge cases are not surprising: ``` x <- "a" x %.% quote(.) #> . x %.% substitute(.) #> [1] "a" f1 <- function(y) function() eval(quote(y)) f2 <- x %.% f1(.) f2() #> [1] "a" ``` Looking forward for your thoughts on this, Antoine [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel --------------- John Mount http://www.win-vector.com/ Our book: Practical Data Science with R https://www.manning.com/books/practical-data-science-with-r-second-edition <http://www.manning.com/zumel/>
Many of those issues can be dealt with by introducing curly braces:
compose <- function(f, g) { function(x) g(f(x)) }
plus1 <- function(x) x + 1
plus2 <- { plus1 ->.; compose(., plus1) }
plus2(5)
# [1] 7
And a lot of that is a point to note: we may not all agree on what cases are corner cases, and which ones should be handled in a given way.
On Oct 5, 2019, at 8:48 AM, Ant F <antoine.fabri at gmail.com> wrote:
Hi John,
Thanks, but the Bizzaro pipe comes with many flaws though :
* It's not a single operator
* It has a different precedence
* It cannot be used in a subcall
* The variable assigned to must be on the right
* It doesn't trigger indentation when going to the line
* It creates/overwrite a `.` variable in the worksace.
And it doesn't deal gracefully with some lazy evaluation edge cases such as :
compose <- function(f, g) { function(x) g(f(x)) }
plus1 <- function(x) x + 1
plus2 <- plus1 %.% compose(.,plus1)
plus2(5)
#> [1] 7
plus1 ->.; compose(.,plus1) -> .; . -> plus2
plus2(5)
#> Error: C stack usage 15923776 is too close to the limit
What I propose on the other hand can always substitute any existing proper pipe in their standard feature, as long as the dot is made explicit.
Best regards,
Antoine
Le sam. 5 oct. 2019 ? 16:59, John Mount <jmount at win-vector.com <mailto:jmount at win-vector.com>> a ?crit :
Actually, base R already has a pipe fairly close to the one you describe: ->.;
iris ->.; head(.) ->.; dim(.)
# [1] 6 5
I've called it the Bizarro pipe ( http://www.win-vector.com/blog/2016/12/magrittrs-doppelganger/ <http://www.win-vector.com/blog/2016/12/magrittrs-doppelganger/> ), and for some reason we chickened out and didn't spend time on it in the dot pipe paper ( https://journal.r-project.org/archive/2018/RJ-2018-042/index.html <https://journal.r-project.org/archive/2018/RJ-2018-042/index.html> ).
For documentation Bizarro pipe has the advantage that one can work out how it works from the application itself, with out reference to a defining function.
On Oct 5, 2019, at 7:34 AM, Ant F <antoine.fabri at gmail.com <mailto:antoine.fabri at gmail.com>> wrote:
Dear R-devel,
The most popular piping operator sits in the package `magrittr` and is used
by a huge amount of users, and imported /reexported by more and more
packages too.
Many workflows don't even make much sense without pipes nowadays, so the
examples in the doc will use pipes, as do the README, vignettes etc. I
believe base R could have a piping operator so packages can use a pipe in
their code or doc and stay dependency free.
I don't suggest an operator based on complex heuristics, instead I suggest
a very simple and fast one (>10 times than magrittr in my tests) :
`%.%` <- function (e1, e2) {
eval(substitute(e2), envir = list(. = e1), enclos = parent.frame())
}
iris %.% head(.) %.% dim(.)
#> [1] 6 5
The difference with magrittr is that the dots must all be explicit (which
sits with the choice of the name), and that special magrittr features such
as assignment in place and building functions with `. %>% head() %>% dim()`
are not supported.
Edge cases are not surprising:
```
x <- "a"
x %.% quote(.)
#> .
x %.% substitute(.)
#> [1] "a"
f1 <- function(y) function() eval(quote(y))
f2 <- x %.% f1(.)
f2()
#> [1] "a"
```
Looking forward for your thoughts on this,
Antoine
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-devel <https://stat.ethz.ch/mailman/listinfo/r-devel>
--------------- John Mount http://www.win-vector.com/ <http://www.win-vector.com/> Our book: Practical Data Science with R https://www.manning.com/books/practical-data-science-with-r-second-edition <http://www.manning.com/zumel/>
--------------- John Mount http://www.win-vector.com/ <http://www.win-vector.com/> Our book: Practical Data Science with R https://www.manning.com/books/practical-data-science-with-r-second-edition <http://www.manning.com/zumel/>
Hello, R is a functional language, pipes are not. There are even higher order functions such as [1] and [2]. Besides, packages are part of R, R couldn't live without them. I find pipes a good idea but I also find it better not to have them as part of base R. If you want to use them, load a package, if you don't, don't. This simple. As for your example, compose, there is a StackOverflow question on it. See this answer [3]. [1] https://stat.ethz.ch/R-manual/R-devel/library/base/html/funprog.html [2] https://stat.ethz.ch/R-manual/R-devel/library/base/html/Recall.html [3] https://stackoverflow.com/a/52465956/8245406 Hope this helps, Rui Barradas ?s 16:48 de 05/10/19, Ant F escreveu:
Hi John,
Thanks, but the Bizzaro pipe comes with many flaws though :
* It's not a single operator
* It has a different precedence
* It cannot be used in a subcall
* The variable assigned to must be on the right
* It doesn't trigger indentation when going to the line
* It creates/overwrite a `.` variable in the worksace.
And it doesn't deal gracefully with some lazy evaluation edge cases such as
:
compose <- function(f, g) { function(x) g(f(x)) }
plus1 <- function(x) x + 1
plus2 <- plus1 %.% compose(.,plus1)
plus2(5)
#> [1] 7
plus1 ->.; compose(.,plus1) -> .; . -> plus2
plus2(5)
#> Error: C stack usage 15923776 is too close to the limit
What I propose on the other hand can always substitute any existing proper
pipe in their standard feature, as long as the dot is made explicit.
Best regards,
Antoine
Le sam. 5 oct. 2019 ? 16:59, John Mount <jmount at win-vector.com> a ?crit :
Actually, base R already has a pipe fairly close to the one you describe: ->.; iris ->.; head(.) ->.; dim(.) # [1] 6 5 I've called it the Bizarro pipe ( http://www.win-vector.com/blog/2016/12/magrittrs-doppelganger/ ), and for some reason we chickened out and didn't spend time on it in the dot pipe paper ( https://journal.r-project.org/archive/2018/RJ-2018-042/index.html ). For documentation Bizarro pipe has the advantage that one can work out how it works from the application itself, with out reference to a defining function. On Oct 5, 2019, at 7:34 AM, Ant F <antoine.fabri at gmail.com> wrote: Dear R-devel, The most popular piping operator sits in the package `magrittr` and is used by a huge amount of users, and imported /reexported by more and more packages too. Many workflows don't even make much sense without pipes nowadays, so the examples in the doc will use pipes, as do the README, vignettes etc. I believe base R could have a piping operator so packages can use a pipe in their code or doc and stay dependency free. I don't suggest an operator based on complex heuristics, instead I suggest a very simple and fast one (>10 times than magrittr in my tests) : `%.%` <- function (e1, e2) { eval(substitute(e2), envir = list(. = e1), enclos = parent.frame()) } iris %.% head(.) %.% dim(.) #> [1] 6 5 The difference with magrittr is that the dots must all be explicit (which sits with the choice of the name), and that special magrittr features such as assignment in place and building functions with `. %>% head() %>% dim()` are not supported. Edge cases are not surprising: ``` x <- "a" x %.% quote(.) #> . x %.% substitute(.) #> [1] "a" f1 <- function(y) function() eval(quote(y)) f2 <- x %.% f1(.) f2() #> [1] "a" ``` Looking forward for your thoughts on this, Antoine [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel --------------- John Mount http://www.win-vector.com/ Our book: Practical Data Science with R https://www.manning.com/books/practical-data-science-with-r-second-edition <http://www.manning.com/zumel/>
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com> wrote:
How is your argument different to, say, "Should dplyr or data.table be part of base R as they are the most popular data science packages and they are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
On Sat, 5 Oct 2019 at 18:10, Rui Barradas <ruipbarradas at sapo.pt> wrote:
R is a functional language, pipes are not.
How would you classify them? Pipes are analogous to function composition. In that sense, they are more functional than classes, and R does have classes. Anyway, I don't see "purity" as a valid argument either in favour or against any given feature. Language classification may be useful for theorists, but certainly not for practitioners. I?aki
Hi Rui,
R is a functional language, pipes are not. There are even higher order
functions such as [1] and [2]. If they can be build in R, either R is not a functional language, or pipes are part of a functional language. Could you elaborate ? What point would you make against pipes that you couldn't make about other operators or functions leveraging lazy evaluation ? I don't understand either the references to Negate() and Recall(). I actually think that there are few things more "functional" than using pipes :).
Besides, packages are part of R, R couldn't live without them. I find pipes a good idea but I also find it better not to have them as part of base R. If you want to use them, load a package, if you don't, don't. This simple.
Simple enough, but more complicated than necessary. I believe it's a fair argument to point than pipes are different. They don't DO anything and yet are used by thousands of packages. They are largely viewed as part of R and some users are surprised no to find them in base R, in the remaining users many don't even know that they come from magrittr but think they come from dplyr. The fact that it feels odd to attach a package only to use the pipe is highlighted by the fact that thousands of packages reexport it , see https://github.com/search?q=filename%3Autils-pipe.R+magrittr . I don't think it can be said of any other function. Package developers could still use `usethis::use_pipe()` if they want to associate their package with magrittr's pipe, but they wouldn't "need" it to suggest to user than piping is recommend way of using their functions in sequence, and this would mean less red ink saying that this `%>%` pipe was masked by that other `%>%` pipe. I like the design of magrittr's pipe a lot and I'd continue to use it either way, but I would write documentation and even functions with `%.%` if it was available, without worrying about dependencies, and less worried about suboptimal performance. Best regards, Antoine Le sam. 5 oct. 2019 ? 18:03, Rui Barradas <ruipbarradas at sapo.pt> a ?crit :
Hello, R is a functional language, pipes are not. There are even higher order functions such as [1] and [2]. Besides, packages are part of R, R couldn't live without them. I find pipes a good idea but I also find it better not to have them as part of base R. If you want to use them, load a package, if you don't, don't. This simple. As for your example, compose, there is a StackOverflow question on it. See this answer [3]. [1] https://stat.ethz.ch/R-manual/R-devel/library/base/html/funprog.html [2] https://stat.ethz.ch/R-manual/R-devel/library/base/html/Recall.html [3] https://stackoverflow.com/a/52465956/8245406 Hope this helps, Rui Barradas ?s 16:48 de 05/10/19, Ant F escreveu:
Hi John, Thanks, but the Bizzaro pipe comes with many flaws though : * It's not a single operator * It has a different precedence * It cannot be used in a subcall * The variable assigned to must be on the right * It doesn't trigger indentation when going to the line * It creates/overwrite a `.` variable in the worksace. And it doesn't deal gracefully with some lazy evaluation edge cases such
as
:
compose <- function(f, g) { function(x) g(f(x)) }
plus1 <- function(x) x + 1
plus2 <- plus1 %.% compose(.,plus1)
plus2(5)
#> [1] 7
plus1 ->.; compose(.,plus1) -> .; . -> plus2
plus2(5)
#> Error: C stack usage 15923776 is too close to the limit
What I propose on the other hand can always substitute any existing
proper
pipe in their standard feature, as long as the dot is made explicit. Best regards, Antoine Le sam. 5 oct. 2019 ? 16:59, John Mount <jmount at win-vector.com> a ?crit
:
Actually, base R already has a pipe fairly close to the one you
describe:
->.; iris ->.; head(.) ->.; dim(.) # [1] 6 5 I've called it the Bizarro pipe ( http://www.win-vector.com/blog/2016/12/magrittrs-doppelganger/ ), and
for
some reason we chickened out and didn't spend time on it in the dot pipe paper (
). For documentation Bizarro pipe has the advantage that one can work out
how
it works from the application itself, with out reference to a defining function. On Oct 5, 2019, at 7:34 AM, Ant F <antoine.fabri at gmail.com> wrote: Dear R-devel, The most popular piping operator sits in the package `magrittr` and is
used
by a huge amount of users, and imported /reexported by more and more packages too. Many workflows don't even make much sense without pipes nowadays, so the examples in the doc will use pipes, as do the README, vignettes etc. I believe base R could have a piping operator so packages can use a pipe
in
their code or doc and stay dependency free. I don't suggest an operator based on complex heuristics, instead I
suggest
a very simple and fast one (>10 times than magrittr in my tests) :
`%.%` <- function (e1, e2) {
eval(substitute(e2), envir = list(. = e1), enclos = parent.frame())
}
iris %.% head(.) %.% dim(.)
#> [1] 6 5
The difference with magrittr is that the dots must all be explicit
(which
sits with the choice of the name), and that special magrittr features
such
as assignment in place and building functions with `. %>% head() %>%
dim()`
are not supported. Edge cases are not surprising: ``` x <- "a" x %.% quote(.) #> . x %.% substitute(.) #> [1] "a" f1 <- function(y) function() eval(quote(y)) f2 <- x %.% f1(.) f2() #> [1] "a" ``` Looking forward for your thoughts on this, Antoine [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel --------------- John Mount http://www.win-vector.com/ Our book: Practical Data Science with R
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
I exaggerated the comparison for effect. However, it is not very difficult to find functions in dplyr or data.table or indeed other packages that one may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc. Also, the "popularity" of magrittr::`%>%` is mostly attributable to the tidyverse (an advanced superset of R). Many R users don't even know that they are installing the magrittr package.
On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org> wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com> wrote:
How is your argument different to, say, "Should dplyr or data.table be part of base R as they are the most popular data science packages and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
On Sat, 5 Oct 2019 at 19:54, Hugh Marera <hugh.marera at gmail.com> wrote:
[...] it is not very difficult to find functions in dplyr or data.table or indeed other packages that one may wish to be in base R. Examples, for me, could include data.table::fread
You have utils::read.table and the like.
dplyr::group_by & dplyr::summari[sZ]e combo
base::tapply, base::by, stats::aggregate.
[...] Many R users don't even know that they are installing the magrittr package.
And that's one of the reasons why the proposal makes sense. Another one is that the pipe plays well with many base R functions, such as subset, transform, merge, aggregate and reshape. I?aki
Yes but this exageration precisely misses the point. Concerning your examples: * I love fread but I think it makes a lot of subjective choices that are best associated with a package. I think it changed a lot with time and can still change, and we have great developers willing to maintain it and be reactive regarding feature requests or bug reports *.group_by() adds a class that works only (or mostly) with tidyverse verbs, that's very easy to dismiss it as an inclusion in base R. * summarize is an alternative to aggregate, that would be very confusing to have both Now to be fair to your argument we could think of other functions such as data.table::rleid() which I believe base R misses deeply, and there is nothing wrong with packaged functions making their way to base R. Maybe there's an existing list of criteria for inclusion, in base R but if not I can make one up for the sake of this discussion :) : * 1) the functionality should not already exist * 2) the function should be general enough * 3) the function should have a large amount of potential of users * 4) the function should be robust, and not require extensive maintenance * 5) the function should be stable, we shouldn't expect new features ever 2 months * 6) the function should have an intuitive interface in the context of the rest ot base R I guess 1 and 6 could be held against my proposal, because : (1) everything can be done without pipes (6) They are somewhat surprising (though with explicit dots not that much, and not more surprising than say `bquote()`) In my opinion the + offset the -. I wouldn't advise taking magrittr's pipe (providing the license allows so) for instance, because it makes a lot of design choices and has a complex behavior, what I propose is 2 lines of code very unlikely to evolve or require maintenance. Antoine PS: I just receive the digest once a day so If you don't "reply all" I can only react later. Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a ?crit :
I exaggerated the comparison for effect. However, it is not very difficult to find functions in dplyr or data.table or indeed other packages that one may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc. Also, the "popularity" of magrittr::`%>%` is mostly attributable to the tidyverse (an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org> wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com> wrote:
How is your argument different to, say, "Should dplyr or data.table be part of base R as they are the most popular data science packages and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
Hi all,
I think there's some nuance here that makes makes me agree partially with
each "side".
The pipe is inarguably extremely popular. Many probably think of it as a
core feature of R, along with the tidyverse that (as was pointed out)
largely surrounds it and drives its popularity. Whether its a good or bad
thing that they think that doesn't change the fact that by my estimation
that Ant is correct that they do. BUT, I don't agree with him that that, by
itself, is a reason to put it in base R in the form that it exists now. For
the current form, there aren't really any major downsides that I see to
having people just use the package version.
Sure it may be a little weird, but it doesn't ever really stop the
people from using it or present a significant barrier. Another major point
is that many (most?) base R functions are not necessarily tooled to be
endomorphic, which in my personal opinion is *largely* the only place that
the pipes are really compelling.
That was for pipes as the exist in package space, though. There is another
way the pipe could go into base R that could not be done in package space
and has the potential to mitigate some pretty serious downsides to the
pipes relating to debugging, which would be to implement them in the parser.
If
iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
filter(mean_sl > 5)
were *parsed* as, for example, into
local({
. = group_by(iris, Species)
._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
filter(., mean_sl > 5)
})
Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.
There is some minor precedent for that type of transformative parsing:
expr = parse(text = "5 -> x")
expr
expression(5 -> x)
expr[[1]]
x <- 5 Though thats a much more minor transformation. All of that said, I believe Jim Hester (cc'ed) suggested something along these lines at the RSummit a couple of years ago, and thus far R-core has not shown much appetite for changing things in the parser. Without that changing, I'd have to say that my vote, for whatever its worth, comes down on the side of pipes being fine in packages. A summary of my reasoning being that it only makes sense for them to go into R itself if doing so fixes an issue that cna't be fixed with them in package space. Best, ~G
On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com> wrote:
Yes but this exageration precisely misses the point. Concerning your examples: * I love fread but I think it makes a lot of subjective choices that are best associated with a package. I think it changed a lot with time and can still change, and we have great developers willing to maintain it and be reactive regarding feature requests or bug reports *.group_by() adds a class that works only (or mostly) with tidyverse verbs, that's very easy to dismiss it as an inclusion in base R. * summarize is an alternative to aggregate, that would be very confusing to have both Now to be fair to your argument we could think of other functions such as data.table::rleid() which I believe base R misses deeply, and there is nothing wrong with packaged functions making their way to base R. Maybe there's an existing list of criteria for inclusion, in base R but if not I can make one up for the sake of this discussion :) : * 1) the functionality should not already exist * 2) the function should be general enough * 3) the function should have a large amount of potential of users * 4) the function should be robust, and not require extensive maintenance * 5) the function should be stable, we shouldn't expect new features ever 2 months * 6) the function should have an intuitive interface in the context of the rest ot base R I guess 1 and 6 could be held against my proposal, because : (1) everything can be done without pipes (6) They are somewhat surprising (though with explicit dots not that much, and not more surprising than say `bquote()`) In my opinion the + offset the -. I wouldn't advise taking magrittr's pipe (providing the license allows so) for instance, because it makes a lot of design choices and has a complex behavior, what I propose is 2 lines of code very unlikely to evolve or require maintenance. Antoine PS: I just receive the digest once a day so If you don't "reply all" I can only react later. Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a ?crit :
I exaggerated the comparison for effect. However, it is not very
difficult
to find functions in dplyr or data.table or indeed other packages that
one
may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
Also,
the "popularity" of magrittr::`%>%` is mostly attributable to the
tidyverse
(an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org>
wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com> wrote:
How is your argument different to, say, "Should dplyr or data.table
be
part of base R as they are the most popular data science packages and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
I'm largely with Gabriel Becker on this one: if pipes enter base R, they should be a well thought out and integrated part of the language. I do see merit though in providing a pipe in base R. Reason is mainly that right now there's not a single pipe. A pipe function exists in different packages, and it's not impossible that at one point piping operators might behave slightly different depending on the package you load. So I hope someone from RStudio is reading this thread and decides to do the heavy lifting for R core. After all, it really is mainly their packages that would benefit from it. I can't think of a non-tidyverse package that's easier to use with pipes than without. Best Joris
On Sun, Oct 6, 2019 at 1:50 AM Gabriel Becker <gabembecker at gmail.com> wrote:
Hi all,
I think there's some nuance here that makes makes me agree partially with
each "side".
The pipe is inarguably extremely popular. Many probably think of it as a
core feature of R, along with the tidyverse that (as was pointed out)
largely surrounds it and drives its popularity. Whether its a good or bad
thing that they think that doesn't change the fact that by my estimation
that Ant is correct that they do. BUT, I don't agree with him that that, by
itself, is a reason to put it in base R in the form that it exists now. For
the current form, there aren't really any major downsides that I see to
having people just use the package version.
Sure it may be a little weird, but it doesn't ever really stop the
people from using it or present a significant barrier. Another major point
is that many (most?) base R functions are not necessarily tooled to be
endomorphic, which in my personal opinion is *largely* the only place that
the pipes are really compelling.
That was for pipes as the exist in package space, though. There is another
way the pipe could go into base R that could not be done in package space
and has the potential to mitigate some pretty serious downsides to the
pipes relating to debugging, which would be to implement them in the
parser.
If
iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
filter(mean_sl > 5)
were *parsed* as, for example, into
local({
. = group_by(iris, Species)
._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
filter(., mean_sl > 5)
})
Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.
There is some minor precedent for that type of transformative parsing:
expr = parse(text = "5 -> x")
expr
expression(5 -> x)
expr[[1]]
x <- 5 Though thats a much more minor transformation. All of that said, I believe Jim Hester (cc'ed) suggested something along these lines at the RSummit a couple of years ago, and thus far R-core has not shown much appetite for changing things in the parser. Without that changing, I'd have to say that my vote, for whatever its worth, comes down on the side of pipes being fine in packages. A summary of my reasoning being that it only makes sense for them to go into R itself if doing so fixes an issue that cna't be fixed with them in package space. Best, ~G On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com> wrote:
Yes but this exageration precisely misses the point. Concerning your examples: * I love fread but I think it makes a lot of subjective choices that are best associated with a package. I think it changed a lot with time and can still change, and we have great
developers
willing to maintain it and be reactive regarding feature requests or bug reports *.group_by() adds a class that works only (or mostly) with tidyverse
verbs,
that's very easy to dismiss it as an inclusion in base R. * summarize is an alternative to aggregate, that would be very confusing
to
have both Now to be fair to your argument we could think of other functions such as data.table::rleid() which I believe base R misses deeply, and there is nothing wrong with packaged functions making their way to
base
R. Maybe there's an existing list of criteria for inclusion, in base R but
if
not I can make one up for the sake of this discussion :) : * 1) the functionality should not already exist * 2) the function should be general enough * 3) the function should have a large amount of potential of users * 4) the function should be robust, and not require extensive maintenance * 5) the function should be stable, we shouldn't expect new features
ever 2
months * 6) the function should have an intuitive interface in the context of
the
rest ot base R I guess 1 and 6 could be held against my proposal, because : (1) everything can be done without pipes (6) They are somewhat surprising (though with explicit dots not that
much,
and not more surprising than say `bquote()`) In my opinion the + offset the -. I wouldn't advise taking magrittr's pipe (providing the license allows
so)
for instance, because it makes a lot of design choices and has a complex behavior, what I propose is 2 lines of code very unlikely to evolve or require maintenance. Antoine PS: I just receive the digest once a day so If you don't "reply all" I
can
only react later. Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a
?crit :
I exaggerated the comparison for effect. However, it is not very
difficult
to find functions in dplyr or data.table or indeed other packages that
one
may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
Also,
the "popularity" of magrittr::`%>%` is mostly attributable to the
tidyverse
(an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org>
wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com>
wrote:
How is your argument different to, say, "Should dplyr or data.table
be
part of base R as they are the most popular data science packages
and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Joris Meys Statistical consultant Department of Data Analysis and Mathematical Modelling Ghent University Coupure Links 653, B-9000 Gent (Belgium) <https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g> ----------- Biowiskundedagen 2018-2019 http://www.biowiskundedagen.ugent.be/ ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
On Sun, 6 Oct 2019 at 10:30, Joris Meys <jorismeys at gmail.com> wrote:
I'm largely with Gabriel Becker on this one: if pipes enter base R, they should be a well thought out and integrated part of the language. I do see merit though in providing a pipe in base R. Reason is mainly that right now there's not a single pipe. A pipe function exists in different packages, and it's not impossible that at one point piping operators might behave slightly different depending on the package you load. So I hope someone from RStudio is reading this thread and decides to do the heavy lifting for R core. After all, it really is mainly their packages that would benefit from it.
Completely agree with Gabriel and Joris.
I can't think of a non-tidyverse package that's easier to use with pipes than without.
I can give you one (disclaimer: it's one of my packages): simmer, which is specifically designed to work with pipes, and has nothing to do with the tidyverse. I?aki
Except for the isolation of local() R pretty much already has the parsing transformation you mention. as.list(parse(text=" iris ->.; group_by(., Species) ->.; summarize(., mean_sl = mean(Sepal.Length)) ->.; filter(., mean_sl > 5) ")) #> [[1]] #> . <- iris #> #> [[2]] #> . <- group_by(., Species) #> #> [[3]] #> . <- summarize(., mean_sl = mean(Sepal.Length)) #> #> [[4]] #> filter(., mean_sl > 5) <sup>Created on 2019-10-06 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
On Oct 5, 2019, at 4:50 PM, Gabriel Becker <gabembecker at gmail.com> wrote:
iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
filter(mean_sl > 5)
were *parsed* as, for example, into
local({
. = group_by(iris, Species)
._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
filter(., mean_sl > 5)
})
Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.
There is some minor precedent for that type of transformative parsing:
expr = parse(text = "5 -> x")
expr
expression(5 -> x)
expr[[1]]
x <- 5 Though thats a much more minor transformation.
On 05/10/2019 7:50 p.m., Gabriel Becker wrote:
Hi all, I think there's some nuance here that makes makes me agree partially with each "side". The pipe is inarguably extremely popular. Many probably think of it as a core feature of R, along with the tidyverse that (as was pointed out) largely surrounds it and drives its popularity. Whether its a good or bad thing that they think that doesn't change the fact that by my estimation that Ant is correct that they do. BUT, I don't agree with him that that, by itself, is a reason to put it in base R in the form that it exists now. For the current form, there aren't really any major downsides that I see to having people just use the package version. Sure it may be a little weird, but it doesn't ever really stop the people from using it or present a significant barrier. Another major point is that many (most?) base R functions are not necessarily tooled to be endomorphic, which in my personal opinion is *largely* the only place that the pipes are really compelling. That was for pipes as the exist in package space, though. There is another way the pipe could go into base R that could not be done in package space and has the potential to mitigate some pretty serious downsides to the pipes relating to debugging, which would be to implement them in the parser.
Actually, that could be done in package space too: just write a
function to do the transformation. That is, something like
transformPipe( a %>% b %>% c )
could convert the original expression into one like yours below. This
could be done by a smart IDE like RStudio without the user typing anything.
A really strong argument for doing this in a package instead of Bison/C
code in the parser is the help page ?magrittr::"%>%". There are so many
special cases there that it's certainly hard and possibly impossible for
the parser to do the transformation: I think some parts of the
transformation depend on run-time values, not syntax.
Of course, a simpler operator like Antoine's would be easier, but that
would break code that uses magrittr pipes, and I think those are the
most commonly accepted ones.
So a workable plan would be for all the pipe authors to agree on syntax
for transformPipe(), and then for IDE authors to support it. R Core
doesn't need to be involved at all unless they want to update Rgui or
R.app or command line R.
Duncan Murdoch
If
iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
filter(mean_sl > 5)
were *parsed* as, for example, into
local({
. = group_by(iris, Species)
._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
filter(., mean_sl > 5)
})
Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.
There is some minor precedent for that type of transformative parsing:
expr = parse(text = "5 -> x")
expr
expression(5 -> x)
expr[[1]]
x <- 5 Though thats a much more minor transformation. All of that said, I believe Jim Hester (cc'ed) suggested something along these lines at the RSummit a couple of years ago, and thus far R-core has not shown much appetite for changing things in the parser. Without that changing, I'd have to say that my vote, for whatever its worth, comes down on the side of pipes being fine in packages. A summary of my reasoning being that it only makes sense for them to go into R itself if doing so fixes an issue that cna't be fixed with them in package space. Best, ~G On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com> wrote:
Yes but this exageration precisely misses the point. Concerning your examples: * I love fread but I think it makes a lot of subjective choices that are best associated with a package. I think it changed a lot with time and can still change, and we have great developers willing to maintain it and be reactive regarding feature requests or bug reports *.group_by() adds a class that works only (or mostly) with tidyverse verbs, that's very easy to dismiss it as an inclusion in base R. * summarize is an alternative to aggregate, that would be very confusing to have both Now to be fair to your argument we could think of other functions such as data.table::rleid() which I believe base R misses deeply, and there is nothing wrong with packaged functions making their way to base R. Maybe there's an existing list of criteria for inclusion, in base R but if not I can make one up for the sake of this discussion :) : * 1) the functionality should not already exist * 2) the function should be general enough * 3) the function should have a large amount of potential of users * 4) the function should be robust, and not require extensive maintenance * 5) the function should be stable, we shouldn't expect new features ever 2 months * 6) the function should have an intuitive interface in the context of the rest ot base R I guess 1 and 6 could be held against my proposal, because : (1) everything can be done without pipes (6) They are somewhat surprising (though with explicit dots not that much, and not more surprising than say `bquote()`) In my opinion the + offset the -. I wouldn't advise taking magrittr's pipe (providing the license allows so) for instance, because it makes a lot of design choices and has a complex behavior, what I propose is 2 lines of code very unlikely to evolve or require maintenance. Antoine PS: I just receive the digest once a day so If you don't "reply all" I can only react later. Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a ?crit :
I exaggerated the comparison for effect. However, it is not very
difficult
to find functions in dplyr or data.table or indeed other packages that
one
may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
Also,
the "popularity" of magrittr::`%>%` is mostly attributable to the
tidyverse
(an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org>
wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com> wrote:
How is your argument different to, say, "Should dplyr or data.table
be
part of base R as they are the most popular data science packages and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
As a matter of fact I played a few days ago with this idea of transforming
the pipe chain to a sequence of calls such as the one Gabriel proposed.
My proposed debugging method was to use a debugging pipe
calling iris %>% head %B>% dim %>% length will open place you right at the
browser call below :
#> Called from: (function (.) #> {#> on.exit(rm(.))#> . <-
head(.)#> browser()#> . <- dim(.)#> . <- length(.)#>
.#> })(iris)
https://github.com/moodymudskipper/pipe/blob/master/README.md
Regarding breaking code, it would only if the pipe if named the same.
To be clear, I like the fact that magrittr exists as an external package
and that it can evolve with the thought and input of the tidyverse crew and
I wouldn't want a base pipe to replace it.
I think package developers would code and document using the base pipe
(unless they have a strong preference for a packaged pipe), and that users
would use interactively the pipe they prefer, which is usually magrittr's
pipe among current choices.
Thanks all for the good points,
Antoine
Le dim. 6 oct. 2019 ? 22:56, Duncan Murdoch <murdoch.duncan at gmail.com> a
?crit :
On 05/10/2019 7:50 p.m., Gabriel Becker wrote:
Hi all, I think there's some nuance here that makes makes me agree partially with each "side". The pipe is inarguably extremely popular. Many probably think of it as a core feature of R, along with the tidyverse that (as was pointed out) largely surrounds it and drives its popularity. Whether its a good or bad thing that they think that doesn't change the fact that by my estimation that Ant is correct that they do. BUT, I don't agree with him that that,
by
itself, is a reason to put it in base R in the form that it exists now.
For
the current form, there aren't really any major downsides that I see to having people just use the package version. Sure it may be a little weird, but it doesn't ever really stop the people from using it or present a significant barrier. Another major
point
is that many (most?) base R functions are not necessarily tooled to be endomorphic, which in my personal opinion is *largely* the only place
that
the pipes are really compelling. That was for pipes as the exist in package space, though. There is
another
way the pipe could go into base R that could not be done in package space and has the potential to mitigate some pretty serious downsides to the pipes relating to debugging, which would be to implement them in the
parser.
Actually, that could be done in package space too: just write a
function to do the transformation. That is, something like
transformPipe( a %>% b %>% c )
could convert the original expression into one like yours below. This
could be done by a smart IDE like RStudio without the user typing anything.
A really strong argument for doing this in a package instead of Bison/C
code in the parser is the help page ?magrittr::"%>%". There are so many
special cases there that it's certainly hard and possibly impossible for
the parser to do the transformation: I think some parts of the
transformation depend on run-time values, not syntax.
Of course, a simpler operator like Antoine's would be easier, but that
would break code that uses magrittr pipes, and I think those are the
most commonly accepted ones.
So a workable plan would be for all the pipe authors to agree on syntax
for transformPipe(), and then for IDE authors to support it. R Core
doesn't need to be involved at all unless they want to update Rgui or
R.app or command line R.
Duncan Murdoch
If iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length))
%>%
filter(mean_sl > 5)
were *parsed* as, for example, into
local({
. = group_by(iris, Species)
._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
filter(., mean_sl > 5)
})
Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.
There is some minor precedent for that type of transformative parsing:
expr = parse(text = "5 -> x")
expr
expression(5 -> x)
expr[[1]]
x <- 5 Though thats a much more minor transformation. All of that said, I believe Jim Hester (cc'ed) suggested something along these lines at the RSummit a couple of years ago, and thus far R-core has not shown much appetite for changing things in the parser. Without that changing, I'd have to say that my vote, for whatever its worth, comes down on the side of pipes being fine in packages. A summary
of
my reasoning being that it only makes sense for them to go into R itself
if
doing so fixes an issue that cna't be fixed with them in package space. Best, ~G On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com> wrote:
Yes but this exageration precisely misses the point. Concerning your examples: * I love fread but I think it makes a lot of subjective choices that are best associated with a package. I think it changed a lot with time and can still change, and we have great
developers
willing to maintain it and be reactive regarding feature requests or bug reports *.group_by() adds a class that works only (or mostly) with tidyverse
verbs,
that's very easy to dismiss it as an inclusion in base R. * summarize is an alternative to aggregate, that would be very
confusing to
have both Now to be fair to your argument we could think of other functions such
as
data.table::rleid() which I believe base R misses deeply, and there is nothing wrong with packaged functions making their way to
base
R. Maybe there's an existing list of criteria for inclusion, in base R but
if
not I can make one up for the sake of this discussion :) : * 1) the functionality should not already exist * 2) the function should be general enough * 3) the function should have a large amount of potential of users * 4) the function should be robust, and not require extensive
maintenance
* 5) the function should be stable, we shouldn't expect new features
ever 2
months * 6) the function should have an intuitive interface in the context of
the
rest ot base R I guess 1 and 6 could be held against my proposal, because : (1) everything can be done without pipes (6) They are somewhat surprising (though with explicit dots not that
much,
and not more surprising than say `bquote()`) In my opinion the + offset the -. I wouldn't advise taking magrittr's pipe (providing the license allows
so)
for instance, because it makes a lot of design choices and has a complex behavior, what I propose is 2 lines of code very unlikely to evolve or require maintenance. Antoine PS: I just receive the digest once a day so If you don't "reply all" I
can
only react later. Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a
?crit :
I exaggerated the comparison for effect. However, it is not very
difficult
to find functions in dplyr or data.table or indeed other packages that
one
may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
Also,
the "popularity" of magrittr::`%>%` is mostly attributable to the
tidyverse
(an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org>
wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com>
wrote:
How is your argument different to, say, "Should dplyr or data.table
be
part of base R as they are the most popular data science packages and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Hi Gabe,
There is another way the pipe could go into base R that could not be done in package space and has the potential to mitigate some pretty serious downsides to the pipes relating to debugging
I assume you're thinking about the large stack trace of the magrittr pipe? You don't need a parser transformation to solve this problem though, the pipe could be implemented as a regular function with a very limited impact on the stack. And if implemented as a SPECIALSXP, it would be completely invisible. We've been planning to rewrite %>% to fix the performance and the stack print, it's just low priority. About the semantics of local evaluation that were proposed in this thread, I think that wouldn't be right. A native pipe should be consistent with other control flow constructs like `if` and `for` and evaluate in the current environment. In that case, the `.` binding, if any, would be restored to its original value in `on.exit()` (or through unwind-protection if implemented in C). Best, Lionel
On 6 Oct 2019, at 01:50, Gabriel Becker <gabembecker at gmail.com> wrote:
Hi all,
I think there's some nuance here that makes makes me agree partially with
each "side".
The pipe is inarguably extremely popular. Many probably think of it as a
core feature of R, along with the tidyverse that (as was pointed out)
largely surrounds it and drives its popularity. Whether its a good or bad
thing that they think that doesn't change the fact that by my estimation
that Ant is correct that they do. BUT, I don't agree with him that that, by
itself, is a reason to put it in base R in the form that it exists now. For
the current form, there aren't really any major downsides that I see to
having people just use the package version.
Sure it may be a little weird, but it doesn't ever really stop the
people from using it or present a significant barrier. Another major point
is that many (most?) base R functions are not necessarily tooled to be
endomorphic, which in my personal opinion is *largely* the only place that
the pipes are really compelling.
That was for pipes as the exist in package space, though. There is another
way the pipe could go into base R that could not be done in package space
and has the potential to mitigate some pretty serious downsides to the
pipes relating to debugging, which would be to implement them in the parser.
If
iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
filter(mean_sl > 5)
were *parsed* as, for example, into
local({
. = group_by(iris, Species)
._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
filter(., mean_sl > 5)
})
Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.
There is some minor precedent for that type of transformative parsing:
expr = parse(text = "5 -> x")
expr
expression(5 -> x)
expr[[1]]
x <- 5 Though thats a much more minor transformation. All of that said, I believe Jim Hester (cc'ed) suggested something along these lines at the RSummit a couple of years ago, and thus far R-core has not shown much appetite for changing things in the parser. Without that changing, I'd have to say that my vote, for whatever its worth, comes down on the side of pipes being fine in packages. A summary of my reasoning being that it only makes sense for them to go into R itself if doing so fixes an issue that cna't be fixed with them in package space. Best, ~G On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com> wrote:
Yes but this exageration precisely misses the point. Concerning your examples: * I love fread but I think it makes a lot of subjective choices that are best associated with a package. I think it changed a lot with time and can still change, and we have great developers willing to maintain it and be reactive regarding feature requests or bug reports *.group_by() adds a class that works only (or mostly) with tidyverse verbs, that's very easy to dismiss it as an inclusion in base R. * summarize is an alternative to aggregate, that would be very confusing to have both Now to be fair to your argument we could think of other functions such as data.table::rleid() which I believe base R misses deeply, and there is nothing wrong with packaged functions making their way to base R. Maybe there's an existing list of criteria for inclusion, in base R but if not I can make one up for the sake of this discussion :) : * 1) the functionality should not already exist * 2) the function should be general enough * 3) the function should have a large amount of potential of users * 4) the function should be robust, and not require extensive maintenance * 5) the function should be stable, we shouldn't expect new features ever 2 months * 6) the function should have an intuitive interface in the context of the rest ot base R I guess 1 and 6 could be held against my proposal, because : (1) everything can be done without pipes (6) They are somewhat surprising (though with explicit dots not that much, and not more surprising than say `bquote()`) In my opinion the + offset the -. I wouldn't advise taking magrittr's pipe (providing the license allows so) for instance, because it makes a lot of design choices and has a complex behavior, what I propose is 2 lines of code very unlikely to evolve or require maintenance. Antoine PS: I just receive the digest once a day so If you don't "reply all" I can only react later. Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a ?crit :
I exaggerated the comparison for effect. However, it is not very
difficult
to find functions in dplyr or data.table or indeed other packages that
one
may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
Also,
the "popularity" of magrittr::`%>%` is mostly attributable to the
tidyverse
(an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org>
wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com> wrote:
How is your argument different to, say, "Should dplyr or data.table
be
part of base R as they are the most popular data science packages and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 07/10/2019 4:22 a.m., Lionel Henry wrote:
Hi Gabe,
There is another way the pipe could go into base R that could not be done in package space and has the potential to mitigate some pretty serious downsides to the pipes relating to debugging
I assume you're thinking about the large stack trace of the magrittr pipe? You don't need a parser transformation to solve this problem though, the pipe could be implemented as a regular function with a very limited impact on the stack. And if implemented as a SPECIALSXP, it would be completely invisible. We've been planning to rewrite %>% to fix the performance and the stack print, it's just low priority.
I don't know what Gabe had in mind, but the downside to pipes that I see is that they are single statements. I'd like the debugger to be able to single step through one stage at a time. I'd like to be able to set a breakpoint on line 3 in a %>% b %>% c %>% d and be able to examine the intermediate result of evaluating b before piping it into c. (Or maybe that's off by one: maybe I'd prefer to examine the inputs to d if I put a breakpoint there. I'd have to try it to find out which feels more natural.)
About the semantics of local evaluation that were proposed in this thread, I think that wouldn't be right. A native pipe should be consistent with other control flow constructs like `if` and `for` and evaluate in the current environment. In that case, the `.` binding, if any, would be restored to its original value in `on.exit()` (or through unwind-protection if implemented in C).
That makes sense. Duncan Murdoch
Best, Lionel
On 6 Oct 2019, at 01:50, Gabriel Becker <gabembecker at gmail.com> wrote:
Hi all,
I think there's some nuance here that makes makes me agree partially with
each "side".
The pipe is inarguably extremely popular. Many probably think of it as a
core feature of R, along with the tidyverse that (as was pointed out)
largely surrounds it and drives its popularity. Whether its a good or bad
thing that they think that doesn't change the fact that by my estimation
that Ant is correct that they do. BUT, I don't agree with him that that, by
itself, is a reason to put it in base R in the form that it exists now. For
the current form, there aren't really any major downsides that I see to
having people just use the package version.
Sure it may be a little weird, but it doesn't ever really stop the
people from using it or present a significant barrier. Another major point
is that many (most?) base R functions are not necessarily tooled to be
endomorphic, which in my personal opinion is *largely* the only place that
the pipes are really compelling.
That was for pipes as the exist in package space, though. There is another
way the pipe could go into base R that could not be done in package space
and has the potential to mitigate some pretty serious downsides to the
pipes relating to debugging, which would be to implement them in the parser.
If
iris %>% group_by(Species) %>% summarize(mean_sl = mean(Sepal.Length)) %>%
filter(mean_sl > 5)
were *parsed* as, for example, into
local({
. = group_by(iris, Species)
._tmp2 = summarize(., mean_sl = mean(Sepal.Length))
filter(., mean_sl > 5)
})
Then debuggiing (once you knew that) would be much easier but behavaior
would be the same as it is now. There could even be some sort of
step-through-pipe debugger at that point added as well for additional
convenience.
There is some minor precedent for that type of transformative parsing:
expr = parse(text = "5 -> x")
expr
expression(5 -> x)
expr[[1]]
x <- 5 Though thats a much more minor transformation. All of that said, I believe Jim Hester (cc'ed) suggested something along these lines at the RSummit a couple of years ago, and thus far R-core has not shown much appetite for changing things in the parser. Without that changing, I'd have to say that my vote, for whatever its worth, comes down on the side of pipes being fine in packages. A summary of my reasoning being that it only makes sense for them to go into R itself if doing so fixes an issue that cna't be fixed with them in package space. Best, ~G On Sun, Oct 6, 2019 at 5:26 AM Ant F <antoine.fabri at gmail.com> wrote:
Yes but this exageration precisely misses the point. Concerning your examples: * I love fread but I think it makes a lot of subjective choices that are best associated with a package. I think it changed a lot with time and can still change, and we have great developers willing to maintain it and be reactive regarding feature requests or bug reports *.group_by() adds a class that works only (or mostly) with tidyverse verbs, that's very easy to dismiss it as an inclusion in base R. * summarize is an alternative to aggregate, that would be very confusing to have both Now to be fair to your argument we could think of other functions such as data.table::rleid() which I believe base R misses deeply, and there is nothing wrong with packaged functions making their way to base R. Maybe there's an existing list of criteria for inclusion, in base R but if not I can make one up for the sake of this discussion :) : * 1) the functionality should not already exist * 2) the function should be general enough * 3) the function should have a large amount of potential of users * 4) the function should be robust, and not require extensive maintenance * 5) the function should be stable, we shouldn't expect new features ever 2 months * 6) the function should have an intuitive interface in the context of the rest ot base R I guess 1 and 6 could be held against my proposal, because : (1) everything can be done without pipes (6) They are somewhat surprising (though with explicit dots not that much, and not more surprising than say `bquote()`) In my opinion the + offset the -. I wouldn't advise taking magrittr's pipe (providing the license allows so) for instance, because it makes a lot of design choices and has a complex behavior, what I propose is 2 lines of code very unlikely to evolve or require maintenance. Antoine PS: I just receive the digest once a day so If you don't "reply all" I can only react later. Le sam. 5 oct. 2019 ? 19:54, Hugh Marera <hugh.marera at gmail.com> a ?crit :
I exaggerated the comparison for effect. However, it is not very
difficult
to find functions in dplyr or data.table or indeed other packages that
one
may wish to be in base R. Examples, for me, could include data.table::fread, dplyr::group_by & dplyr::summari[sZ]e combo, etc.
Also,
the "popularity" of magrittr::`%>%` is mostly attributable to the
tidyverse
(an advanced superset of R). Many R users don't even know that they are installing the magrittr package. On Sat, Oct 5, 2019 at 6:30 PM I?aki Ucar <iucar at fedoraproject.org>
wrote:
On Sat, 5 Oct 2019 at 17:15, Hugh Marera <hugh.marera at gmail.com> wrote:
How is your argument different to, say, "Should dplyr or data.table
be
part of base R as they are the most popular data science packages and
they
are used by a large number of users?"
Two packages with many features, dozens of functions and under heavy development to fix bugs, add new features and improve performance, vs. a single operator with a limited and well-defined functionality, and a reference implementation that hasn't changed in years (but certainly hackish in a way that probably could only be improved from R itself). Can't you really spot the difference? I?aki
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 7 Oct 2019, at 13:47, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: On 07/10/2019 4:22 a.m., Lionel Henry wrote:
Hi Gabe,
There is another way the pipe could go into base R that could not be done in package space and has the potential to mitigate some pretty serious downsides to the pipes relating to debugging
I assume you're thinking about the large stack trace of the magrittr pipe? You don't need a parser transformation to solve this problem though, the pipe could be implemented as a regular function with a very limited impact on the stack. And if implemented as a SPECIALSXP, it would be completely invisible. We've been planning to rewrite %>% to fix the performance and the stack print, it's just low priority.
I don't know what Gabe had in mind, but the downside to pipes that I see is that they are single statements. I'd like the debugger to be able to single step through one stage at a time. I'd like to be able to set a breakpoint on line 3 in a %>% b %>% c %>% d and be able to examine the intermediate result of evaluating b before piping it into c. (Or maybe that's off by one: maybe I'd prefer to examine the inputs to d if I put a breakpoint there. I'd have to try it to find out which feels more natural.)
In order to place a breakpoint on line 3, I think you'll need to wrap
`c()` in curly braces and insert a `browser()` call. And at that point
you're changing the semantics of `c()` and you'll need to manually
write the placeholder for the input:
a() |>
b() |>
{ browser(); c(.) } |>
d()
I don't see any way around this. I guess it could be done behind the
scenes by the IDE when a breakpoint is set though. Note that this
doesn't require any changes to the parser and already works with the
magrittr pipe.
Then there's the issue of continuing to step-debug through the
pipeline. This could be achieved by parsing `a |> b()` as `{a} |>
{b()}`. so that each sub-expression carries source references. In
general, there are metaprogramming patterns that would be made easier
if calls to `function` or `if` always had a body wrapped in `{`. It is
too late to change historical operators but maybe it makes sense for
newer ones?
Lionel
On 07/10/2019 8:38 a.m., Lionel Henry wrote:
On 7 Oct 2019, at 13:47, Duncan Murdoch <murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote: On 07/10/2019 4:22 a.m., Lionel Henry wrote:
Hi Gabe,
There is another way the pipe could go into base R that could not be done in package space and has the potential to mitigate some pretty serious downsides to the pipes relating to debugging
I assume you're thinking about the large stack trace of the magrittr pipe? You don't need a parser transformation to solve this problem though, the pipe could be implemented as a regular function with a very limited impact on the stack. And if implemented as a SPECIALSXP, it would be completely invisible. We've been planning to rewrite %>% to fix the performance and the stack print, it's just low priority.
I don't know what Gabe had in mind, but the downside to pipes that I see is that they are single statements. ?I'd like the debugger to be able to single step through one stage at a time. ?I'd like to be able to set a breakpoint on line 3 in ?a %>% ?b %>% ?c %>% ?d and be able to examine the intermediate result of evaluating b before piping it into c. ?(Or maybe that's off by one: ?maybe I'd prefer to examine the inputs to d if I put a breakpoint there. ?I'd have to try it to find out which feels more natural.)
In order to place a breakpoint on line 3, I think you'll need to wrap
`c()` in curly braces and insert a `browser()` call. And at that point
you're changing the semantics of `c()` and you'll need to manually
write the placeholder for the input:
a() |>
? b() |>
? { browser(); c(.) } |>
? d()
I don't see any way around this. I guess it could be done behind the
scenes by the IDE when a breakpoint is set though. Note that this
doesn't require any changes to the parser and already works with the
magrittr pipe.
Yes, I was hoping this would happen behind the scenes. I agree that the parser doesn't need to be changed, but the IDE would need to break up the statement into 3 or more equivalent statements for this to work with no changes to core R. I think that could be done after parsing at run-time, as described in my earlier message. Duncan Murdoch P.S. Were you just using |> to save typing, or is there a proposal to add a new operator to the language? That would need parser changes.
Then there's the issue of continuing to step-debug through the
pipeline. This could be achieved by parsing `a |> b()` as `{a} |>
{b()}`. so that each sub-expression carries source references. In
general, there are metaprogramming patterns that would be made easier
if calls to `function` or `if` always had a body wrapped in `{`. It is
too late to change historical operators but maybe it makes sense for
newer ones?
Lionel
On 7 Oct 2019, at 15:36, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: I think that could be done after parsing at run-time, as described in my earlier message.
Good point.
P.S. Were you just using |> to save typing, or is there a proposal to add a new operator to the language? That would need parser changes.
Just a hypothetical native pipe for which the parser would automatically wrap the arguments in srcref-carrying braces. Then we get step-debugging of pipelines in all editors. Best, Lionel