Dear R Core, working on my dynamic factor modelling package, which requires several subroutines to create and update several system matrices, I come back to the issue of being annoyed by R not supporting multiple assignment out of the box like Matlab, Python and julia. e.g. something like A, C, Q, R = init_matrices(X, Y, Z) would be a great addition to the language. I know there are several workarounds such as the %<-% operator in the zeallot package or my own %=% operator in collapse, but these don't work well for package development as R CMD Check warns about missing global bindings for the created variables, e.g. I would have to use A <- C <- Q <- R <- NULL .c(A, C, Q, R) %=% init_matrices(X, Y, Z) in a package, which is simply annoying. Of course the standard way of init <- init_matrices(X, Y, Z) A <- init$A; C <- init$C; Q <- init$Q; R <- init$R rm(init) is also super cumbersome compared to Python or Julia. Another reason is of course performance, even my %=% operator written in C has a non-negligible performance cost for very tight loops, compared to a solution at the interpretor level or in a primitive function such as `=`. So my conclusion at this point is that it is just significantly easier to implement such codes in Julia, in addition to the greater performance it offers. There are obvious reasons why I am still coding in R and C, thanks to the robust API and great ecosystem of packages, but adding this could be a presumably low-hanging fruit to make my life a bit easier. Several issues for this have been filed on Stackoverflow, the most popular one ( https://stackoverflow.com/questions/7519790/assign-multiple-new-variables-on-lhs-in-a-single-line) has been viewed 77 thousand times. But maybe this has already been discussed here and already decided against. In that case, a way to browse R-devel archives to find out would be nice. Best regards, Sebastian
Multiple Assignment built into the R Interpreter?
22 messages · Duncan Murdoch, Ivan Krylov, Sebastian Martin Krantz +6 more
I think the standard way to do this in R is given by list2env(), as
described in a couple of answers on the SO page you linked.
The syntax you proposed would be likely to be confusing in complex
expressions, e.g.
f(A, C, Q, R = init_matrices(X, Y, Z))
would obviously not work but wouldn't trigger a syntax error, and
f((A, C, Q, R = init_matrices(X, Y, Z)))
could work, but looks too much like the previous one. So I think R
would want Javascript-like
[A, C, Q, R] <- init_matrices(X, Y, Z)
instead. But then the question would come up about how to handle the
RHS. Does the function have to return a list? What if the length of
the list is not 4? Or is it just guaranteed to be equivalent to
temp <- init_matrices(X, Y, Z)
A <- temp[[1]]
C <- temp[[2]]
Q <- temp[[3]]
R <- temp[[4]]
which would work for other vector types besides lists?
BTW, here's a little hack that almost works:
`vals<-` <- function(x, ..., value) {
others <- substitute(list(...))
if (length(others) > 1)
for (i in seq_along(others)[-1])
assign(as.character(others[[i]]), value[[i]], envir =
parent.frame())
value[[1]]
}
You call it as
vals(a, b, c) <- 1:3
and it assigns 1 to a, 2 to b, and 3 to c. It doesn't quite do what you
want because it requires that a exists already, but b and c don't have to.
Duncan Murdoch
On 11/03/2023 4:04 a.m., Sebastian Martin Krantz wrote:
Dear R Core, working on my dynamic factor modelling package, which requires several subroutines to create and update several system matrices, I come back to the issue of being annoyed by R not supporting multiple assignment out of the box like Matlab, Python and julia. e.g. something like A, C, Q, R = init_matrices(X, Y, Z) would be a great addition to the language. I know there are several workarounds such as the %<-% operator in the zeallot package or my own %=% operator in collapse, but these don't work well for package development as R CMD Check warns about missing global bindings for the created variables, e.g. I would have to use A <- C <- Q <- R <- NULL .c(A, C, Q, R) %=% init_matrices(X, Y, Z) in a package, which is simply annoying. Of course the standard way of init <- init_matrices(X, Y, Z) A <- init$A; C <- init$C; Q <- init$Q; R <- init$R rm(init) is also super cumbersome compared to Python or Julia. Another reason is of course performance, even my %=% operator written in C has a non-negligible performance cost for very tight loops, compared to a solution at the interpretor level or in a primitive function such as `=`. So my conclusion at this point is that it is just significantly easier to implement such codes in Julia, in addition to the greater performance it offers. There are obvious reasons why I am still coding in R and C, thanks to the robust API and great ecosystem of packages, but adding this could be a presumably low-hanging fruit to make my life a bit easier. Several issues for this have been filed on Stackoverflow, the most popular one ( https://stackoverflow.com/questions/7519790/assign-multiple-new-variables-on-lhs-in-a-single-line) has been viewed 77 thousand times. But maybe this has already been discussed here and already decided against. In that case, a way to browse R-devel archives to find out would be nice. Best regards, Sebastian [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks Duncan,
I know about list2env(), in fact a previous version of collapse::`%=%` was
coded as
"%=%" <- function(lhs, rhs) {
if(!is.character(lhs)) stop("lhs needs to be character")
if(!is.list(rhs)) rhs <- as.vector(rhs, "list")
if(length(lhs) != length(rhs)) stop("length(lhs) not equal to
length(rhs)")
list2env(`names<-`(rhs, lhs), envir = parent.frame())
invisible()
}
but as you say, the input needs to be converted to a list, and it calls
several R functions, which led me to end up writing `%=%` in C:
https://github.com/SebKrantz/collapse/blob/master/src/small_helper.c#L162.
This implementation works in the way you describe, i.e. it has separate
methods for all the standard vector types, and coerces to list otherwise.
That being said, all implementations in packages falls short of being very
useful, because R CMD Check it will still require global bindings for
variables,
unless this becomes a standard feature of the language. So I cannot use
this in packages, and there is still a performance cost to it, in my case a
call to
.Call() and parent.frame(), which is quite low, but still high compared to
the cost of `<-` or `=`.
So what I am requesting is indeed nothing less than to consider making this
a permanent feature of the language itself.
Given that the other 3 major scientific computing languages (Matlab, Python
and Julia) have implemented it very successfully,
I don't think the general practicality of it should be an issue. Regarding
implementation in other languages, Julia works as follows:
function init_matrices()
A = 1; C = 2; Q = 3; R = 4
return A, C, Q, R
end
res = init_matrices() # gives a Julia Tuple (A, C, Q, R)
A, C = init_matrices() # Works, A is 1, C is 2, the others are
dropped
A, C, Q, R = init_matrices() # Standard
I think as far as R is concerned multiple return values are not really
necessary given that one can always,
return(list(A, C, Q, R)), although of course there is also a cost to
list(). I also wouldn't mind being strict about it and
not allowing A, C = init_matrices(), but others might disagree.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 15:37, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
I think the standard way to do this in R is given by list2env(), as
described in a couple of answers on the SO page you linked.
The syntax you proposed would be likely to be confusing in complex
expressions, e.g.
f(A, C, Q, R = init_matrices(X, Y, Z))
would obviously not work but wouldn't trigger a syntax error, and
f((A, C, Q, R = init_matrices(X, Y, Z)))
could work, but looks too much like the previous one. So I think R
would want Javascript-like
[A, C, Q, R] <- init_matrices(X, Y, Z)
instead. But then the question would come up about how to handle the
RHS. Does the function have to return a list? What if the length of
the list is not 4? Or is it just guaranteed to be equivalent to
temp <- init_matrices(X, Y, Z)
A <- temp[[1]]
C <- temp[[2]]
Q <- temp[[3]]
R <- temp[[4]]
which would work for other vector types besides lists?
BTW, here's a little hack that almost works:
`vals<-` <- function(x, ..., value) {
others <- substitute(list(...))
if (length(others) > 1)
for (i in seq_along(others)[-1])
assign(as.character(others[[i]]), value[[i]], envir =
parent.frame())
value[[1]]
}
You call it as
vals(a, b, c) <- 1:3
and it assigns 1 to a, 2 to b, and 3 to c. It doesn't quite do what you
want because it requires that a exists already, but b and c don't have to.
Duncan Murdoch
On 11/03/2023 4:04 a.m., Sebastian Martin Krantz wrote:
Dear R Core, working on my dynamic factor modelling package, which requires several subroutines to create and update several system matrices, I come back to the issue of being annoyed by R not supporting multiple assignment out of the box like Matlab, Python and julia. e.g. something like A, C, Q, R = init_matrices(X, Y, Z) would be a great addition to the language. I know there are several workarounds such as the %<-% operator in the zeallot package or my own
%=%
operator in collapse, but these don't work well for package development
as
R CMD Check warns about missing global bindings for the created
variables,
e.g. I would have to use A <- C <- Q <- R <- NULL .c(A, C, Q, R) %=% init_matrices(X, Y, Z) in a package, which is simply annoying. Of course the standard way of init <- init_matrices(X, Y, Z) A <- init$A; C <- init$C; Q <- init$Q; R <- init$R rm(init) is also super cumbersome compared to Python or Julia. Another reason is
of
course performance, even my %=% operator written in C has a
non-negligible
performance cost for very tight loops, compared to a solution at the interpretor level or in a primitive function such as `=`. So my conclusion at this point is that it is just significantly easier to implement such codes in Julia, in addition to the greater performance it offers. There are obvious reasons why I am still coding in R and C,
thanks
to the robust API and great ecosystem of packages, but adding this could
be
a presumably low-hanging fruit to make my life a bit easier. Several
issues
for this have been filed on Stackoverflow, the most popular one (
has been viewed 77 thousand times. But maybe this has already been discussed here and already decided
against.
In that case, a way to browse R-devel archives to find out would be nice.
Best regards,
Sebastian
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 11/03/2023 9:54 a.m., Sebastian Martin Krantz wrote:
Thanks Duncan,
I know about list2env(), in fact a previous version of collapse::`%=%`
was coded as
"%=%" <- function(lhs, rhs) {
?? if(!is.character(lhs)) stop("lhs needs to be character")
?? if(!is.list(rhs)) rhs <- as.vector(rhs, "list")
?? if(length(lhs) != length(rhs)) stop("length(lhs) not equal to
length(rhs)")
?? list2env(`names<-`(rhs, lhs), envir = parent.frame())
?? invisible()
}
but as you say, the input needs to be converted to a list, and it calls
several R functions, which led me to end up writing `%=%` in C:
https://github.com/SebKrantz/collapse/blob/master/src/small_helper.c#L162 <https://github.com/SebKrantz/collapse/blob/master/src/small_helper.c#L162>.
This implementation works in the way you describe, i.e. it has separate
methods for all the standard vector types, and coerces to list otherwise.
That being said, all implementations in packages falls short of being
very useful, because R CMD Check it will still require global bindings
for variables,
unless this becomes a standard feature of the language. So I cannot use
this in packages, and there is still a performance cost to it, in my
case a call to
.Call() and parent.frame(), which is quite low, but still high compared
to the cost of `<-` or `=`.
Another R way to do what you're doing would be to stay within a list the whole time, i.e. code it as mats <- init_matrices(X, Y, Z) with(mats, ... do things with A, C, Q, and R ... ) This won't give warnings about globals, and it makes very clear that those 4 matrices are all closely related, and it allows you to work with multiple 4-tuples of matrices, etc.
So what I am requesting is indeed nothing less than to consider making this a permanent feature of the language itself.
That's clear, but your proposal violates a very basic property of the
language, i.e. that all statements are expressions and have a value.
What's the value of
1 + (A, C = init_matrices())
? I think you would disallow the above (though you didn't address it
when I raised it the first time), which means there would now be two
kinds of statements: ones that are expressions and therefore can be
used as function arguments, and ones that aren't.
Given that the other 3 major scientific computing languages (Matlab, Python and Julia) have implemented it very successfully, I don't think the general practicality of it should be an issue. Regarding implementation in other languages, Julia works as follows: function init_matrices() ??? A = 1; C = 2; Q = 3; R = 4 ??? return A, C, Q, R end res = init_matrices()???????????? # gives a Julia Tuple (A, C, Q, R) A, C = init_matrices()?????????? # Works, A is 1, C is 2, the others are dropped
That's pretty ugly having a singular LHS handled so much differently from a plural LHS.
A, C, Q, R = init_matrices()? # Standard I think as far as R is concerned multiple return values are not really necessary given that one can always, return(list(A, C, Q, R)), although of course there is also a cost to list(). I also wouldn't mind being strict about it and not allowing A, C = init_matrices(), but others might disagree.
Another ambiguity: suppose f() returns list(A = 1, B = 2) and I do B, A <- f() Should assignment be by position or by name? Honestly, given that this is simply syntactic sugar, I don't think I would support it. Duncan Murdoch
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 15:37, Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
I think the standard way to do this in R is given by list2env(), as
described in a couple of answers on the SO page you linked.
The syntax you proposed would be likely to be confusing in complex
expressions, e.g.
? ?f(A, C, Q, R = init_matrices(X, Y, Z))
would obviously not work but wouldn't trigger a syntax error, and
? ?f((A, C, Q, R = init_matrices(X, Y, Z)))
could work, but looks too much like the previous one.? So I think R
would want Javascript-like
? ?[A, C, Q, R] <- init_matrices(X, Y, Z)
instead.? But then the question would come up about how to handle the
RHS.? Does the function have to return a list?? What if the length of
the list is not 4?? Or is it just guaranteed to be equivalent to
? ?temp <- init_matrices(X, Y, Z)
? ?A <- temp[[1]]
? ?C <- temp[[2]]
? ?Q <- temp[[3]]
? ?R <- temp[[4]]
which would work for other vector types besides lists?
BTW, here's a little hack that almost works:
`vals<-` <- function(x, ..., value) {
? ? others <- substitute(list(...))
? ? if (length(others) > 1)
? ? ? for (i in seq_along(others)[-1])
? ? ? ? assign(as.character(others[[i]]), value[[i]], envir =
parent.frame())
? ? value[[1]]
}
You call it as
? vals(a, b, c) <- 1:3
and it assigns 1 to a, 2 to b, and 3 to c.? It doesn't quite do what
you
want because it requires that a exists already, but b and c don't
have to.
Duncan Murdoch
On 11/03/2023 4:04 a.m., Sebastian Martin Krantz wrote:
> Dear R Core,
>
> working on my dynamic factor modelling package, which requires
several
> subroutines to create and update several system matrices, I come
back to
> the issue of being annoyed by R not supporting multiple
assignment out of
> the box like Matlab, Python and julia. e.g. something like
>
> A, C, Q, R = init_matrices(X, Y, Z)
>
> would be a great addition to the language. I know there are several
> workarounds such as the %<-% operator in the zeallot package or
my own %=%
> operator in collapse, but these don't work well for package
development as
> R CMD Check warns about missing global bindings for the created
variables,
> e.g. I would have to use
>
> A <- C <- Q <- R <- NULL
> .c(A, C, Q, R) %=% init_matrices(X, Y, Z)
>
> in a package, which is simply annoying. Of course the standard way of
>
> init <- init_matrices(X, Y, Z)
>? ?A <- init$A; C <- init$C; Q <- init$Q; R <- init$R
> rm(init)
>
> is also super cumbersome compared to Python or Julia. Another
reason is of
> course performance, even my %=% operator written in C has a
non-negligible
> performance cost for very tight loops, compared to a solution at the
> interpretor level or in a primitive function such as `=`.
>
> So my conclusion at this point is that it is just significantly
easier to
> implement such codes in Julia, in addition to the greater
performance it
> offers. There are obvious reasons why I am still coding in R and
C, thanks
> to the robust API and great ecosystem of packages, but adding
this could be
> a presumably low-hanging fruit to make my life a bit easier.
Several issues
> for this have been filed on Stackoverflow, the most popular one (
>
https://stackoverflow.com/questions/7519790/assign-multiple-new-variables-on-lhs-in-a-single-line <https://stackoverflow.com/questions/7519790/assign-multiple-new-variables-on-lhs-in-a-single-line>)
> has been viewed 77 thousand times.
>
> But maybe this has already been discussed here and already
decided against.
> In that case, a way to browse R-devel archives to find out would
be nice.
>
> Best regards,
>
> Sebastian
>
>? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
On Sat, 11 Mar 2023 11:11:06 -0500
Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
That's clear, but your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value.
How about reframing this feature request from multiple assignment
(which does go contrary to "everything has only one value, even if it's
sometimes invisible(NULL)") to "structured binding" / "destructuring
assignment" [*], which takes this single single value returned by the
expression and subsets it subject to certain rules? It may be easier to
make a decision on the semantics for destructuring assignment (e.g.
languages which have this feature typically allow throwing unneeded
parts of the return value away), and it doesn't seem to break as much
of the rest of the language if implemented.
I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true
assignment" by the rest of the language.
The hard part is to propose the actual grammar of the new feature (in
terms of src/main/gram.y, preferably without introducing conflicts) and
its semantics (including the corner cases, some of which you have
already mentioned). I'm not sure I'm up to the task.
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
On Sat, 11 Mar 2023 11:11:06 -0500 Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
That's clear, but your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value.
How about reframing this feature request from multiple assignment
(which does go contrary to "everything has only one value, even if it's
sometimes invisible(NULL)") to "structured binding" / "destructuring
assignment" [*], which takes this single single value returned by the
expression and subsets it subject to certain rules? It may be easier to
make a decision on the semantics for destructuring assignment (e.g.
languages which have this feature typically allow throwing unneeded
parts of the return value away), and it doesn't seem to break as much
of the rest of the language if implemented.
I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true
assignment" by the rest of the language.
The hard part is to propose the actual grammar of the new feature (in
terms of src/main/gram.y, preferably without introducing conflicts) and
its semantics (including the corner cases, some of which you have
already mentioned). I'm not sure I'm up to the task.
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
[A, B] <- c(1, 2)
Duncan Murdoch
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value. What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars);
nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error, as
the above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do B, A <- f() Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I would support it.
You can call it that, but it would be used by almost every R user almost every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language.
I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"
Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate
function, or `=` could be modified to accommodate its capability. It
should be clear that
with more than one LHS variables the assignment is an environment
level operation and the results can only be used in computations once
assigned to the environment, e.g. as in 1 + (c("A", "C") %=%
init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple
assignment operation with no
immediate value. Appears very feasible to my limited understanding,
but I guess there are other things to consider still. Definitely
appreciate the responses so far though.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
On Sat, 11 Mar 2023 11:11:06 -0500 Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
That's clear, but your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value.
How about reframing this feature request from multiple assignment
(which does go contrary to "everything has only one value, even if it's
sometimes invisible(NULL)") to "structured binding" / "destructuring
assignment" [*], which takes this single single value returned by the
expression and subsets it subject to certain rules? It may be easier to
make a decision on the semantics for destructuring assignment (e.g.
languages which have this feature typically allow throwing unneeded
parts of the return value away), and it doesn't seem to break as much
of the rest of the language if implemented.
I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true
assignment" by the rest of the language.
The hard part is to propose the actual grammar of the new feature (in
terms of src/main/gram.y, preferably without introducing conflicts) and
its semantics (including the corner cases, some of which you have
already mentioned). I'm not sure I'm up to the task.
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
[A, B] <- c(1, 2)
Duncan Murdoch
On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr
= d[1]; nc = d[2]; rm(d)), which simply gives a syntax error, as the
above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() > Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I would support it.
You can call it that, but it would be used by almost every R user almost every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language.
I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 > [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"
Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate function,
or `=` could be modified to accommodate its capability. It should be
clear that
with more than one LHS variables the assignment is an environment level
operation and the results can only be used in computations once assigned
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment
operation with no
immediate value. Appears very feasible to my limited understanding, but
I guess there are other things to consider still. Definitely appreciate
the responses so far though.
Show me. Duncan Murdoch
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> On Sat, 11 Mar 2023 11:11:06 -0500
> Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
>
>> That's clear, but your proposal violates a very basic property
of the
>> language, i.e. that all statements are expressions and have a value.
>
> How about reframing this feature request from multiple assignment
> (which does go contrary to "everything has only one value, even
if it's
> sometimes invisible(NULL)") to "structured binding" / "destructuring
> assignment" [*], which takes this single single value returned by the
> expression and subsets it subject to certain rules? It may be
easier to
> make a decision on the semantics for destructuring assignment (e.g.
> languages which have this feature typically allow throwing unneeded
> parts of the return value away), and it doesn't seem to break as much
> of the rest of the language if implemented.
>
> I see you've already mentioned it ("JavaScript-like"). I think it
would
> fulfil Sebastian's requirements too, as long as it is considered
"true
> assignment" by the rest of the language.
>
> The hard part is to propose the actual grammar of the new feature (in
> terms of src/main/gram.y, preferably without introducing
conflicts) and
> its semantics (including the corner cases, some of which you have
> already mentioned). I'm not sure I'm up to the task.
>
If I were doing it, here's what I'd propose:
? ?'[' formlist ']' LEFT_ASSIGN expr
? ?'[' formlist ']' EQ_ASSIGN expr
? ?expr RIGHT_ASSIGN? '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition.? This would have the following semantics:
? ? {
? ? ? *tmp* <- expr
? ? ? # For arguments with no "default" expression,
? ? ? argname1 <- *tmp*[[1]]
? ? ? argname2 <- *tmp*[[2]]
? ? ? ...
? ? ? # For arguments with a default listed
? ? ? argname3 <- with(*tmp*, default3)
? ? }
The value of the whole thing would therefore be (invisibly) the
value of
the last item in the assignment.
Two examples:
? ?[A, B, C] <- expr? ?# assign the first three elements of expr to A,
B, and C
? ?[A, B, C = a + b] <- expr? # assign the first two elements of expr
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # to A and B,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by
transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about.? E.g. what does this do?
? ?A <- 0
? ?[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11.? And according to that
recipe this is an error:
? ?[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
? ?[A, B] <- c(1, 2)
Duncan Murdoch
On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d) is not a statement, it is a sequence of 4 statements. Duncan Murdoch as the
above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() > Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I would support it.
You can call it that, but it would be used by almost every R user almost every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language.
I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 > [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"
Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate function,
or `=` could be modified to accommodate its capability. It should be
clear that
with more than one LHS variables the assignment is an environment level
operation and the results can only be used in computations once assigned
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment
operation with no
immediate value. Appears very feasible to my limited understanding, but
I guess there are other things to consider still. Definitely appreciate
the responses so far though.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> On Sat, 11 Mar 2023 11:11:06 -0500
> Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
>
>> That's clear, but your proposal violates a very basic property
of the
>> language, i.e. that all statements are expressions and have a value.
>
> How about reframing this feature request from multiple assignment
> (which does go contrary to "everything has only one value, even
if it's
> sometimes invisible(NULL)") to "structured binding" / "destructuring
> assignment" [*], which takes this single single value returned by the
> expression and subsets it subject to certain rules? It may be
easier to
> make a decision on the semantics for destructuring assignment (e.g.
> languages which have this feature typically allow throwing unneeded
> parts of the return value away), and it doesn't seem to break as much
> of the rest of the language if implemented.
>
> I see you've already mentioned it ("JavaScript-like"). I think it
would
> fulfil Sebastian's requirements too, as long as it is considered
"true
> assignment" by the rest of the language.
>
> The hard part is to propose the actual grammar of the new feature (in
> terms of src/main/gram.y, preferably without introducing
conflicts) and
> its semantics (including the corner cases, some of which you have
> already mentioned). I'm not sure I'm up to the task.
>
If I were doing it, here's what I'd propose:
? ?'[' formlist ']' LEFT_ASSIGN expr
? ?'[' formlist ']' EQ_ASSIGN expr
? ?expr RIGHT_ASSIGN? '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition.? This would have the following semantics:
? ? {
? ? ? *tmp* <- expr
? ? ? # For arguments with no "default" expression,
? ? ? argname1 <- *tmp*[[1]]
? ? ? argname2 <- *tmp*[[2]]
? ? ? ...
? ? ? # For arguments with a default listed
? ? ? argname3 <- with(*tmp*, default3)
? ? }
The value of the whole thing would therefore be (invisibly) the
value of
the last item in the assignment.
Two examples:
? ?[A, B, C] <- expr? ?# assign the first three elements of expr to A,
B, and C
? ?[A, B, C = a + b] <- expr? # assign the first two elements of expr
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # to A and B,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by
transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about.? E.g. what does this do?
? ?A <- 0
? ?[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11.? And according to that
recipe this is an error:
? ?[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
? ?[A, B] <- c(1, 2)
Duncan Murdoch
There are some other considerations too (apologies if these were mentioned
above and I missed them). Also below are initial thoughts, so apologies for
any mistakes or oversights.
For example, if
[a, b] <- my2valuefun()
works the same as
local({
tmp <- my2valuefun()
stopifnot(is.list(tmp) && length(tmp) == 2)
a <<- tmp[[1]]
b <<- tmp[[2]]
})
Do we expect
[a[1], b[3]] <- my2valuefun()
to also work? That doesn't sound very fun to me, personally, but obviously
the "single value return" versions of these do work and have for a long
time, i.e.
a[1] <- my2valuefun()[[1]]
b[3] <- my2valuefun()[[2]]
is perfectly valid R code (though it does call the function twice which is
"silly" in some sense).
Another thing which arises from the Julia API specifically which I think is
problematic is the ambiguity of's atomic "types" being vectors. Consider
the following
coolest_function <- function() c(a = 15, b = 65, c = 275)
a <- coolest_function()
That obviously makes a vector of length 3. Anything else would break *like
all the R code*
But now, what does
[a] <- coolest_function()
do? Does it assign 15 to a, because b and c arent' being assigned to?
Does this mean variables being assigned to actually need to *match the
names within the return object*? I don't think that would work at all in
general...
Alternatively, is the second one an error, because the function isn't
returning a list? This doesn't really fix the problem either though
Because a single list of length > 1 *is a valid thing to return from an R
function*. I think, like in Julia, you'd need to declare the set of things
being returned, and perhaps map them to the variables you want assigned
crazy_notworking_fun <- function() {
return(a = 5, b = 65, c = 275)
}
[a_val = a, b_val = b] <- crazy_notworking_fun()
Or even,
[a_val <- a, b_val <-b] <- crazy_notworking_fun()
In that case, however, it becomes somewhat unclear (to me at least) what
only_val <- crazy_notworking_fun()
would do. Throw an error because multivalued functions are fundamentally
different and we can't pretend they aren't? This would disallow all of the
things you think "most r users would use every day" (a claim I'm somewhat
skeptical of, to be honest). If thats not it, though, what? I don't think
it can/should return the full list of results, because that introduces the
ambiguity this is trying to avoid right back in. Perhaps just the first
thing returned? That is internally consistent, but somewhat strange
behavior...
Best,
~G
On Sat, Mar 11, 2023 at 2:15?PM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e. that
all statements are expressions and have a value.
What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars);
nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error, as
the above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do B, A <- f() Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I
would support it. You can call it that, but it would be used by almost every R user almost every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language. I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"
Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate
function, or `=` could be modified to accommodate its capability. It
should be clear that
with more than one LHS variables the assignment is an environment
level operation and the results can only be used in computations once
assigned to the environment, e.g. as in 1 + (c("A", "C") %=%
init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple
assignment operation with no
immediate value. Appears very feasible to my limited understanding,
but I guess there are other things to consider still. Definitely
appreciate the responses so far though.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
On Sat, 11 Mar 2023 11:11:06 -0500 Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
That's clear, but your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value.
How about reframing this feature request from multiple assignment
(which does go contrary to "everything has only one value, even if it's
sometimes invisible(NULL)") to "structured binding" / "destructuring
assignment" [*], which takes this single single value returned by the
expression and subsets it subject to certain rules? It may be easier to
make a decision on the semantics for destructuring assignment (e.g.
languages which have this feature typically allow throwing unneeded
parts of the return value away), and it doesn't seem to break as much
of the rest of the language if implemented.
I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true
assignment" by the rest of the language.
The hard part is to propose the actual grammar of the new feature (in
terms of src/main/gram.y, preferably without introducing conflicts) and
its semantics (including the corner cases, some of which you have
already mentioned). I'm not sure I'm up to the task.
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
[A, B] <- c(1, 2)
Duncan Murdoch
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:
https://github.com/kevinushey/dotty
The package exports an object called `.`, with a special `[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:
.[nr, nc] <- dim(mtcars)
and that will define 'nr' and 'nc' as you expect.
As for R CMD check warnings, you can suppress those through the use of
globalVariables(), and that can also be automated within the package.
The 'dotty' package includes a function 'dotify()' which automates
looking for such usages in your package, and calling globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique would
be applicable to other packages defining similar operators (zeallot,
collapse).
Obviously, globalVariables() is a very heavy hammer to swing for this
issue, but you might consider the benefits worth the tradeoffs.
Best,
Kevin
On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d) is not a statement, it is a sequence of 4 statements. Duncan Murdoch as the
above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() > Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I would support it.
You can call it that, but it would be used by almost every R user almost every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language.
I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 > [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"
Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate function,
or `=` could be modified to accommodate its capability. It should be
clear that
with more than one LHS variables the assignment is an environment level
operation and the results can only be used in computations once assigned
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment
operation with no
immediate value. Appears very feasible to my limited understanding, but
I guess there are other things to consider still. Definitely appreciate
the responses so far though.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> On Sat, 11 Mar 2023 11:11:06 -0500
> Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
>
>> That's clear, but your proposal violates a very basic property
of the
>> language, i.e. that all statements are expressions and have a value.
>
> How about reframing this feature request from multiple assignment
> (which does go contrary to "everything has only one value, even
if it's
> sometimes invisible(NULL)") to "structured binding" / "destructuring
> assignment" [*], which takes this single single value returned by the
> expression and subsets it subject to certain rules? It may be
easier to
> make a decision on the semantics for destructuring assignment (e.g.
> languages which have this feature typically allow throwing unneeded
> parts of the return value away), and it doesn't seem to break as much
> of the rest of the language if implemented.
>
> I see you've already mentioned it ("JavaScript-like"). I think it
would
> fulfil Sebastian's requirements too, as long as it is considered
"true
> assignment" by the rest of the language.
>
> The hard part is to propose the actual grammar of the new feature (in
> terms of src/main/gram.y, preferably without introducing
conflicts) and
> its semantics (including the corner cases, some of which you have
> already mentioned). I'm not sure I'm up to the task.
>
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the
value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by
transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
[A, B] <- c(1, 2)
Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
I am not personally for or against changes to the R main language but do find that too many people keep wanting to change R so it should be like some other language. Many features would be nice, especially if they do not break existing code, but the time and effort and other overheads need to be a consideration.
R has long had a concept of returning a data structure with inner parts such as from calling lm() or ggplot() that can be used to store lots of info, often including saved copies of the data and parameters that generated it, and you can often update the object or query it for specific fields or pass it intact to some next step.
Anyone wanting to return multiple results has classically done something like return a list containing named items and your code had the option of unpacking the parts you want and ignoring others. This may not be modern or elegant but so what?
I could go through a long list of nice things I see in other languages and ask if R has that ability now (perhaps in a package) or should have it.
Do we need a swap like:
a, b <- b, a
Do we need a dual comparative like:
If ( -5 < X < 5) ...
The two examples work fine in Python as do many other things but Python is not R and cannot trivially do many things R does either.
Many programming languages have been converging in some ways. SCALA version 3 simplified many parts of the language by borrowing their own version of using indentation from languages like Haskell and Python. Personally, I like it but it causes some headaches for older code and you have to either change the code or disable the feature. So, anyone think R should follow through and also allow many places that now use grouping by curly braces or other methods, to use indentation level?
I am NOT saying any additions are impossible but we need to keep the base language working well on existing code or perhaps make a clean break and make a new version of R (I assume logically Q as in S went to S-- (AKA R) so R-- would be Q, LOL!
There is a community of users of a language who are partially here based on existing R packages. Some can probably find more and more functionality along the same lines as Python modules and elsewhere with some work and continue in a language that may fit their personal preferences. But the people responsible for maintaining and developing R are not just casual users and would have to do serious amounts of work choosing what it would look like and how to implement it, and deal with edge cases and complaints.
The new R pipe is a case in point. Why was it added? I mean I have been using various pipes including in the tidyverse for years quite happily. I did not need it added except perhaps for performance reasons. But when it came out, it had a rather glaring incompatibility in some ways with not providing a fairly trivial way to pass the pipeline to anything other than the first positional variable. Sure, they added a kludge using a horrible to read anonymous function syntax that I suspect many will not use and rather find their own ways around.
But am I glad it was added? Sure. Could they have improved something else instead that was missing? I mean I use the glue package but would love something like an f-string built in that did pretty much stuff like that as can be found in other languages.
This may sound stupid, but if someone wants features from another language, maybe they should consider using the two languages together and alternating which one diddles with your data as can be done multiple ways already.
Will that solve what the OP wants? Nope. They want whatever tool they are using to become a Swiss Army Knife.
But there is something to be said for a sparse language that does a few things well and does not grow just to grow and be like everyone else.
-----Original Message-----
From: R-devel <r-devel-bounces at r-project.org> On Behalf Of Gabriel Becker
Sent: Saturday, March 11, 2023 5:54 PM
To: Sebastian Martin Krantz <sebastian.krantz at graduateinstitute.ch>
Cc: r-devel <r-devel at r-project.org>
Subject: Re: [Rd] Multiple Assignment built into the R Interpreter?
There are some other considerations too (apologies if these were mentioned
above and I missed them). Also below are initial thoughts, so apologies for
any mistakes or oversights.
For example, if
[a, b] <- my2valuefun()
works the same as
local({
tmp <- my2valuefun()
stopifnot(is.list(tmp) && length(tmp) == 2)
a <<- tmp[[1]]
b <<- tmp[[2]]
})
Do we expect
[a[1], b[3]] <- my2valuefun()
to also work? That doesn't sound very fun to me, personally, but obviously
the "single value return" versions of these do work and have for a long
time, i.e.
a[1] <- my2valuefun()[[1]]
b[3] <- my2valuefun()[[2]]
is perfectly valid R code (though it does call the function twice which is
"silly" in some sense).
Another thing which arises from the Julia API specifically which I think is
problematic is the ambiguity of's atomic "types" being vectors. Consider
the following
coolest_function <- function() c(a = 15, b = 65, c = 275)
a <- coolest_function()
That obviously makes a vector of length 3. Anything else would break *like
all the R code*
But now, what does
[a] <- coolest_function()
do? Does it assign 15 to a, because b and c arent' being assigned to?
Does this mean variables being assigned to actually need to *match the
names within the return object*? I don't think that would work at all in
general...
Alternatively, is the second one an error, because the function isn't
returning a list? This doesn't really fix the problem either though
Because a single list of length > 1 *is a valid thing to return from an R
function*. I think, like in Julia, you'd need to declare the set of things
being returned, and perhaps map them to the variables you want assigned
crazy_notworking_fun <- function() {
return(a = 5, b = 65, c = 275)
}
[a_val = a, b_val = b] <- crazy_notworking_fun()
Or even,
[a_val <- a, b_val <-b] <- crazy_notworking_fun()
In that case, however, it becomes somewhat unclear (to me at least) what
only_val <- crazy_notworking_fun()
would do. Throw an error because multivalued functions are fundamentally
different and we can't pretend they aren't? This would disallow all of the
things you think "most r users would use every day" (a claim I'm somewhat
skeptical of, to be honest). If thats not it, though, what? I don't think
it can/should return the full list of results, because that introduces the
ambiguity this is trying to avoid right back in. Perhaps just the first
thing returned? That is internally consistent, but somewhat strange
behavior...
Best,
~G
On Sat, Mar 11, 2023 at 2:15?PM Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e. that
all statements are expressions and have a value.
What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars);
nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error, as
the above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do B, A <- f() Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I
would support it. You can call it that, but it would be used by almost every R user almost every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language. I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"
Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate
function, or `=` could be modified to accommodate its capability. It
should be clear that
with more than one LHS variables the assignment is an environment
level operation and the results can only be used in computations once
assigned to the environment, e.g. as in 1 + (c("A", "C") %=%
init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple
assignment operation with no
immediate value. Appears very feasible to my limited understanding,
but I guess there are other things to consider still. Definitely
appreciate the responses so far though.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
On Sat, 11 Mar 2023 11:11:06 -0500 Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
That's clear, but your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value.
How about reframing this feature request from multiple assignment
(which does go contrary to "everything has only one value, even if it's
sometimes invisible(NULL)") to "structured binding" / "destructuring
assignment" [*], which takes this single single value returned by the
expression and subsets it subject to certain rules? It may be easier to
make a decision on the semantics for destructuring assignment (e.g.
languages which have this feature typically allow throwing unneeded
parts of the return value away), and it doesn't seem to break as much
of the rest of the language if implemented.
I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true
assignment" by the rest of the language.
The hard part is to propose the actual grammar of the new feature (in
terms of src/main/gram.y, preferably without introducing conflicts) and
its semantics (including the corner cases, some of which you have
already mentioned). I'm not sure I'm up to the task.
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
[A, B] <- c(1, 2)
Duncan Murdoch
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks Gabriel and Kevin for your inputs, regarding your points Gabriel, I think Python and Julia do allow multiple sub-assignment, but in-line with my earlier suggestion in response to Duncan to make multiple assignment an environment-level operation (like collapse::%=% currently works), this would not be possible in R. Regarding the [a] <- coolest_function() syntax, yeah it would mean do multiple assignment and set a equal to the first element dropping all other elements. Multiple assignment should be positional loke in other languages, enabling flexible renaming of objects on the fly. So it should be irrelevant whether the function returns a named or unnamed list or vector. Thanks also Kevin for this contribution. I think it?s a remarkable effort, and I wouldn?t mind such semantics e.g. making it a function call to ?.[? or any other one-letter function, as long as it?s coded in C and recognized by the interpreter as an assignment operation. Best regards, Sebastian
On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey at gmail.com> wrote:
FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:
https://github.com/kevinushey/dotty
The package exports an object called `.`, with a special `[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:
.[nr, nc] <- dim(mtcars)
and that will define 'nr' and 'nc' as you expect.
As for R CMD check warnings, you can suppress those through the use of
globalVariables(), and that can also be automated within the package.
The 'dotty' package includes a function 'dotify()' which automates
looking for such usages in your package, and calling globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique would
be applicable to other packages defining similar operators (zeallot,
collapse).
Obviously, globalVariables() is a very heavy hammer to swing for this
issue, but you might consider the benefits worth the tradeoffs.
Best,
Kevin
On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e.
that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d) is not a statement, it is a sequence of 4 statements. Duncan Murdoch as the
above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() >
Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I
would support it.
You can call it that, but it would be used by almost every R user
almost
every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it
would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language.
I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 > [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-" Regarding the practical implementation, I think `collapse::%=%` is a good starting point. It could be introduced in R as a separate
function,
or `=` could be modified to accommodate its capability. It should be clear that with more than one LHS variables the assignment is an environment level operation and the results can only be used in computations once
assigned
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple
assignment
operation with no
immediate value. Appears very feasible to my limited understanding, but
I guess there are other things to consider still. Definitely appreciate
the responses so far though.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> On Sat, 11 Mar 2023 11:11:06 -0500
> Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
>
>> That's clear, but your proposal violates a very basic property
of the
>> language, i.e. that all statements are expressions and have a
value.
>
> How about reframing this feature request from multiple
assignment
> (which does go contrary to "everything has only one value, even
if it's
> sometimes invisible(NULL)") to "structured binding" /
"destructuring
> assignment" [*], which takes this single single value returned
by the
> expression and subsets it subject to certain rules? It may be
easier to
> make a decision on the semantics for destructuring assignment
(e.g.
> languages which have this feature typically allow throwing
unneeded
> parts of the return value away), and it doesn't seem to break
as much
> of the rest of the language if implemented.
>
> I see you've already mentioned it ("JavaScript-like"). I think
it
would
> fulfil Sebastian's requirements too, as long as it is considered
"true
> assignment" by the rest of the language.
>
> The hard part is to propose the actual grammar of the new
feature (in
> terms of src/main/gram.y, preferably without introducing
conflicts) and
> its semantics (including the corner cases, some of which you
have
> already mentioned). I'm not sure I'm up to the task.
>
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the
value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr
to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of
expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by
transforming
the expression (which is the way |> was done), and that makes it a
lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12,
but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is
fine:
[A, B] <- c(1, 2)
Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
I really like it! Nicely done. Duncan Murdoch
On 11/03/2023 6:00 p.m., Kevin Ushey wrote:
FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:
https://github.com/kevinushey/dotty
The package exports an object called `.`, with a special `[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:
.[nr, nc] <- dim(mtcars)
and that will define 'nr' and 'nc' as you expect.
As for R CMD check warnings, you can suppress those through the use of
globalVariables(), and that can also be automated within the package.
The 'dotty' package includes a function 'dotify()' which automates
looking for such usages in your package, and calling globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique would
be applicable to other packages defining similar operators (zeallot,
collapse).
Obviously, globalVariables() is a very heavy hammer to swing for this
issue, but you might consider the benefits worth the tradeoffs.
Best,
Kevin
On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e. that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
is not a statement, it is a sequence of 4 statements.
Duncan Murdoch
as the
above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() > Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I would support it.
You can call it that, but it would be used by almost every R user almost every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language.
I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be positional.
A <- 0 > [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-"
Regarding the practical implementation, I think `collapse::%=%` is a
good starting point. It could be introduced in R as a separate function,
or `=` could be modified to accommodate its capability. It should be
clear that
with more than one LHS variables the assignment is an environment level
operation and the results can only be used in computations once assigned
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple assignment
operation with no
immediate value. Appears very feasible to my limited understanding, but
I guess there are other things to consider still. Definitely appreciate
the responses so far though.
Best regards,
Sebastian
On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> On Sat, 11 Mar 2023 11:11:06 -0500
> Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
>
>> That's clear, but your proposal violates a very basic property
of the
>> language, i.e. that all statements are expressions and have a value.
>
> How about reframing this feature request from multiple assignment
> (which does go contrary to "everything has only one value, even
if it's
> sometimes invisible(NULL)") to "structured binding" / "destructuring
> assignment" [*], which takes this single single value returned by the
> expression and subsets it subject to certain rules? It may be
easier to
> make a decision on the semantics for destructuring assignment (e.g.
> languages which have this feature typically allow throwing unneeded
> parts of the return value away), and it doesn't seem to break as much
> of the rest of the language if implemented.
>
> I see you've already mentioned it ("JavaScript-like"). I think it
would
> fulfil Sebastian's requirements too, as long as it is considered
"true
> assignment" by the rest of the language.
>
> The hard part is to propose the actual grammar of the new feature (in
> terms of src/main/gram.y, preferably without introducing
conflicts) and
> its semantics (including the corner cases, some of which you have
> already mentioned). I'm not sure I'm up to the task.
>
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the
value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by
transforming
the expression (which is the way |> was done), and that makes it a lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to 12, but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is fine:
[A, B] <- c(1, 2)
Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thinking more about this, and seeing Kevins examples at https://github.com/kevinushey/dotty, I think this is the most R-like way of doing it, with an additional benefit as it would allow to introduce the useful data.table semantics DT[, .(a = b, c, d)] to more general R. So I would propose to introduce a new primitive function . <- function(...) .Primitive(".") in R with an assignment method and the following features: - Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All the functionality proposed by Kevin at https://github.com/kevinushey/dotty is useful, unambiguous and feasible. - Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars. - Mixing of positional and named assignment e.g .[mpg_new, carb_new = carb, cyl_new] <- mtcars. The inputs not assigned by name are simply the elements of RHS in the order they occur, regardless of whether they have been used previously e.g. .[mpg_new, cyl_new = cyl, log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could be any named vector type. - Conventional use of the function as lazy version of of list(), as in data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D). This would also be useful, allowing more parsimonious code, and avoid the need to assign names to all return values in a function return, e.g. if I already have matrices A, C, Q and R as internal objects in my function, I can simply end by return(.(A, C, Q, R)) instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the list to be named with the object names. The implementation of this in R and C should be pretty straightforward. It would just require a modification to R CMD Check to recognize .[<- as assignment. Best regards, Sebastian - 2.) On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz <
sebastian.krantz at graduateinstitute.ch> wrote:
Thanks Gabriel and Kevin for your inputs, regarding your points Gabriel, I think Python and Julia do allow multiple sub-assignment, but in-line with my earlier suggestion in response to Duncan to make multiple assignment an environment-level operation (like collapse::%=% currently works), this would not be possible in R. Regarding the [a] <- coolest_function() syntax, yeah it would mean do multiple assignment and set a equal to the first element dropping all other elements. Multiple assignment should be positional loke in other languages, enabling flexible renaming of objects on the fly. So it should be irrelevant whether the function returns a named or unnamed list or vector. Thanks also Kevin for this contribution. I think it?s a remarkable effort, and I wouldn?t mind such semantics e.g. making it a function call to ?.[? or any other one-letter function, as long as it?s coded in C and recognized by the interpreter as an assignment operation. Best regards, Sebastian On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey at gmail.com> wrote:
FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:
https://github.com/kevinushey/dotty
The package exports an object called `.`, with a special `[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:
.[nr, nc] <- dim(mtcars)
and that will define 'nr' and 'nc' as you expect.
As for R CMD check warnings, you can suppress those through the use of
globalVariables(), and that can also be automated within the package.
The 'dotty' package includes a function 'dotify()' which automates
looking for such usages in your package, and calling globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique would
be applicable to other packages defining similar operators (zeallot,
collapse).
Obviously, globalVariables() is a very heavy hammer to swing for this
issue, but you might consider the benefits worth the tradeoffs.
Best,
Kevin
On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch <murdoch.duncan at gmail.com>
wrote:
On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
Thanks Duncan and Ivan for the careful thoughts. I'm not sure I can follow all aspects you raised, but to give my limited take on a few:
your proposal violates a very basic property of the language, i.e.
that all statements are expressions and have a value. > What's the value of 1 + (A, C = init_matrices()).
I'm not sure I see the point here. I evaluated 1 + (d = dim(mtcars);
nr
= d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d) is not a statement, it is a sequence of 4 statements. Duncan Murdoch as the
above expression should. `%=%` assigns to
environments, so 1 + (c("A", "C") %=% init_matrices()) returns
numeric(0), with A and C having their values assigned.
suppose f() returns list(A = 1, B = 2) and I do > B, A <- f() >
Should assignment be by position or by name?
In other languages this is by position. The feature is not meant to replace list2env(), and being able to rename objects in the assignment is a vital feature of codes using multi input and output functions e.g. in Matlab or Julia.
Honestly, given that this is simply syntactic sugar, I don't think I
would support it.
You can call it that, but it would be used by almost every R user
almost
every day. Simple things like nr, nc = dim(x); values, vectors = eigen(x) etc. where the creation of intermediate objects is cumbersome and redundant.
I see you've already mentioned it ("JavaScript-like"). I think it
would fulfil Sebastian's requirements too, as long as it is considered "true assignment" by the rest of the language.
I don't have strong opinions about how the issue is phrased or implemented. Something like [t, n] = dim(x) might even be more clear. It's important though that assignment remains by position, so even if some output gets thrown away that should also be
positional.
A <- 0 > [A, B = A + 10] <- list(1, A = 2)
I also fail to see the use of allowing this. something like this is an error.
A = 2 (B = A + 1) <- 1
Error in (B = A + 1) <- 1 : could not find function "(<-" Regarding the practical implementation, I think `collapse::%=%` is a good starting point. It could be introduced in R as a separate
function,
or `=` could be modified to accommodate its capability. It should be clear that with more than one LHS variables the assignment is an environment
level
operation and the results can only be used in computations once
assigned
to the environment, e.g. as in 1 + (c("A", "C") %=% init_matrices()),
A and C are not available for the addition in this statement. The
interpretor then needs to be modified to read something like nr, nc =
dim(x) or [nr, nc] = dim(x). as an environment-level multiple
assignment
operation with no immediate value. Appears very feasible to my limited understanding,
but
I guess there are other things to consider still. Definitely
appreciate
the responses so far though. Best regards, Sebastian On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch <
murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> On Sat, 11 Mar 2023 11:11:06 -0500
> Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>> wrote:
>
>> That's clear, but your proposal violates a very basic property
of the
>> language, i.e. that all statements are expressions and have a
value.
>
> How about reframing this feature request from multiple
assignment
> (which does go contrary to "everything has only one value, even
if it's
> sometimes invisible(NULL)") to "structured binding" /
"destructuring
> assignment" [*], which takes this single single value returned
by the
> expression and subsets it subject to certain rules? It may be
easier to
> make a decision on the semantics for destructuring assignment
(e.g.
> languages which have this feature typically allow throwing
unneeded
> parts of the return value away), and it doesn't seem to break
as much
> of the rest of the language if implemented.
>
> I see you've already mentioned it ("JavaScript-like"). I think
it
would
> fulfil Sebastian's requirements too, as long as it is
considered
"true
> assignment" by the rest of the language.
>
> The hard part is to propose the actual grammar of the new
feature (in
> terms of src/main/gram.y, preferably without introducing
conflicts) and
> its semantics (including the corner cases, some of which you
have
> already mentioned). I'm not sure I'm up to the task.
>
If I were doing it, here's what I'd propose:
'[' formlist ']' LEFT_ASSIGN expr
'[' formlist ']' EQ_ASSIGN expr
expr RIGHT_ASSIGN '[' formlist ']'
where `formlist` has the syntax of the formals list for a function
definition. This would have the following semantics:
{
*tmp* <- expr
# For arguments with no "default" expression,
argname1 <- *tmp*[[1]]
argname2 <- *tmp*[[2]]
...
# For arguments with a default listed
argname3 <- with(*tmp*, default3)
}
The value of the whole thing would therefore be (invisibly) the
value of
the last item in the assignment.
Two examples:
[A, B, C] <- expr # assign the first three elements of expr
to A,
B, and C
[A, B, C = a + b] <- expr # assign the first two elements of
expr
# to A and B,
# assign with(expr, a + b) to C.
Unfortunately, I don't think this could be done entirely by
transforming
the expression (which is the way |> was done), and that makes it
a lot
harder to write and to reason about. E.g. what does this do?
A <- 0
[A, B = A + 10] <- list(1, A = 2)
According to the recipe above, I think it sets A to 1 and B to
12, but
maybe a user would expect B to be 10 or 11. And according to that
recipe this is an error:
[A, B = A + 10] <- c(1, A = 2)
which probably isn't what a user would expect, given that this is
fine:
[A, B] <- c(1, 2)
Duncan Murdoch
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
Thinking more about this, and seeing Kevins examples at https://github.com/kevinushey/dotty <https://github.com/kevinushey/dotty>, I think this is the most R-like way of doing it, with an additional benefit as it would allow to introduce the useful data.table semantics DT[, .(a = b, c, d)] to more general R. So I would propose to introduce a new primitive function . <- function(...) .Primitive(".") in R with an assignment method and the following features:
I think that proposal is very unlikely to be accepted. If it was a primitive function, it could only be maintained by R Core. They are justifiably very reluctant to take on extra work for themselves. Kevin's package demonstrates that this can be done entirely in a contributed package, which means there's no need for R Core to be involved. I don't know if he has plans to turn his prototype into a CRAN package. If he doesn't, then it will be up to some other interested maintainer to step up and take on the task, or it will just fade away. I haven't checked whether your proposals below represent changes from the current version of dotty, but if they do, the way to proceed is to fork that project, implement your changes, and offer to contribute them back to the main branch. Duncan Murdoch
* Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All
the functionality proposed by Kevin at
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty> is useful, unambiguous and
feasible.
* Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
* Mixing of positional and named assignment e.g .[mpg_new, carb_new =
carb, cyl_new] <- mtcars. The inputs not assigned by name are simply
the elements of RHS in the order they occur, regardless of whether
they have been used previously e.g. .[mpg_new, cyl_new = cyl,
log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could
be any named vector type.
* Conventional use of the function as lazy version of of list(), as in
data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D).
This would also be useful, allowing more parsimonious code, and
avoid the need to assign names to all return values in a function
return, e.g. if I already have matrices A, C, Q and R as internal
objects in my function, I can simply end by return(.(A, C, Q, R))
instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the
list to be named with the object names.
The implementation of this in R and C should be pretty straightforward.
It would just require a modification to R CMD Check to recognize .[<- as
assignment.
Best regards,
Sebastian
-
2.)
On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz
<sebastian.krantz at graduateinstitute.ch
<mailto:sebastian.krantz at graduateinstitute.ch>> wrote:
Thanks Gabriel and Kevin for your inputs,
regarding your points Gabriel, I think Python and Julia do allow
multiple sub-assignment, but in-line with my earlier suggestion in
response to Duncan to make multiple assignment an environment-level
operation (like collapse::%=% currently works), ?this would not be
possible in R.
Regarding the [a] <- coolest_function()?syntax, yeah it would mean
do multiple assignment and set a equal to the first element dropping
all other elements. Multiple assignment should be positional loke in
other languages, enabling flexible renaming of objects on the fly.
So it should be irrelevant whether the function returns a named or
unnamed list or vector.
Thanks also Kevin for this contribution. I think it?s a remarkable
effort, and I wouldn?t mind such semantics e.g. making it a function
call to ?.[? or any other one-letter function, as long as it?s coded
in C and recognized by the interpreter as an assignment operation.
Best regards,
Sebastian
On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey at gmail.com
<mailto:kevinushey at gmail.com>> wrote:
FWIW, it's possible to get fairly close to your proposed semantics
using the existing metaprogramming facilities in R. I put together a
prototype package here to demonstrate:
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty>
The package exports an object called `.`, with a special
`[<-.dot` S3
method which enables destructuring assignments. This means you can
write code like:
? ? .[nr, nc] <- dim(mtcars)
and that will define 'nr' and 'nc' as you expect.
As for R CMD check warnings, you can suppress those through the
use of
globalVariables(), and that can also be automated within the
package.
The 'dotty' package includes a function 'dotify()' which automates
looking for such usages in your package, and calling
globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique
would
be applicable to other packages defining similar operators (zeallot,
collapse).
Obviously, globalVariables() is a very heavy hammer to swing for
this
issue, but you might consider the benefits worth the tradeoffs.
Best,
Kevin
On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch
<murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>> wrote:
>
> On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
> > Thanks Duncan and Ivan for the careful thoughts. I'm not
sure I can
> > follow all aspects you raised, but to give my limited take
on a few:
> >
> >> your proposal violates a very basic property of the
language, i.e. that all statements are expressions and have a
value.? > What's the value of 1 + (A, C = init_matrices()).
> >
> > I'm not sure I see the point here. I evaluated 1 + (d =
dim(mtcars); nr
> > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax error,
>
>
>? ? d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
>
> is not a statement, it is a sequence of 4 statements.
>
> Duncan Murdoch
>
>? ?as the
> > above expression should. `%=%` assigns to
> > environments, so 1 + (c("A", "C") %=% init_matrices()) returns
> > numeric(0), with A and C having their values assigned.
> >
> >> suppose f() returns list(A = 1, B = 2) and I do? > B, A <-
f() > Should assignment be by position or by name?
> >
> > In other languages this is by position. The feature is not
meant to
> > replace list2env(), and being able to rename objects in the
assignment
> > is a vital feature of codes
> > using multi input and output functions e.g. in Matlab or Julia.
> >
> >> Honestly, given that this is simply syntactic sugar, I
don't think I would support it.
> >
> > You can call it that, but it would be used by almost every
R user almost
> > every day. Simple things like nr, nc = dim(x); values,
vectors =
> > eigen(x) etc. where the creation of intermediate objects
> > is cumbersome and redundant.
> >
> >> I see you've already mentioned it ("JavaScript-like"). I
think it would? fulfil Sebastian's requirements too, as long as
it is considered "true assignment" by the rest of the language.
> >
> > I don't have strong opinions about how the issue is phrased or
> > implemented. Something like [t, n] = dim(x) might even be
more clear.
> > It's important though that assignment remains by position,
> > so even if some output gets thrown away that should also be
positional.
> >
> >>? A <- 0? > [A, B = A + 10] <- list(1, A = 2)
> >
> > I also fail to see the use of allowing this. something like
this is an
> > error.
> >
> >> A = 2
> >> (B = A + 1) <- 1
> > Error in (B = A + 1) <- 1 : could not find function "(<-"
> >
> > Regarding the practical implementation, I think
`collapse::%=%` is a
> > good starting point. It could be introduced in R as a
separate function,
> > or `=` could be modified to accommodate its capability. It
should be
> > clear that
> > with more than one LHS variables the assignment is an
environment level
> > operation and the results can only be used in computations
once assigned
> > to the environment, e.g. as in 1 + (c("A", "C") %=%
init_matrices()),
> > A and C are not available for the addition in this
statement. The
> > interpretor then needs to be modified to read something
like nr, nc =
> > dim(x) or [nr, nc] = dim(x). as an environment-level
multiple assignment
> > operation with no
> > immediate value. Appears very feasible to my limited
understanding, but
> > I guess there are other things to consider still.
Definitely appreciate
> > the responses so far though.
> >
> > Best regards,
> >
> > Sebastian
> >
> >
> >
> >
> >
> > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch
<murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>
> > <mailto:murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>>> wrote:
> >
> >? ? ?On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> >? ? ? > On Sat, 11 Mar 2023 11:11:06 -0500
> >? ? ? > Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>
> >? ? ?<mailto:murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>>> wrote:
> >? ? ? >
> >? ? ? >> That's clear, but your proposal violates a very
basic property
> >? ? ?of the
> >? ? ? >> language, i.e. that all statements are expressions
and have a value.
> >? ? ? >
> >? ? ? > How about reframing this feature request from
multiple assignment
> >? ? ? > (which does go contrary to "everything has only one
value, even
> >? ? ?if it's
> >? ? ? > sometimes invisible(NULL)") to "structured binding"
/ "destructuring
> >? ? ? > assignment" [*], which takes this single single
value returned by the
> >? ? ? > expression and subsets it subject to certain rules?
It may be
> >? ? ?easier to
> >? ? ? > make a decision on the semantics for destructuring
assignment (e.g.
> >? ? ? > languages which have this feature typically allow
throwing unneeded
> >? ? ? > parts of the return value away), and it doesn't seem
to break as much
> >? ? ? > of the rest of the language if implemented.
> >? ? ? >
> >? ? ? > I see you've already mentioned it
("JavaScript-like"). I think it
> >? ? ?would
> >? ? ? > fulfil Sebastian's requirements too, as long as it
is considered
> >? ? ?"true
> >? ? ? > assignment" by the rest of the language.
> >? ? ? >
> >? ? ? > The hard part is to propose the actual grammar of
the new feature (in
> >? ? ? > terms of src/main/gram.y, preferably without introducing
> >? ? ?conflicts) and
> >? ? ? > its semantics (including the corner cases, some of
which you have
> >? ? ? > already mentioned). I'm not sure I'm up to the task.
> >? ? ? >
> >
> >? ? ?If I were doing it, here's what I'd propose:
> >
> >? ? ? ? ?'[' formlist ']' LEFT_ASSIGN expr
> >? ? ? ? ?'[' formlist ']' EQ_ASSIGN expr
> >? ? ? ? ?expr RIGHT_ASSIGN? '[' formlist ']'
> >
> >? ? ?where `formlist` has the syntax of the formals list for
a function
> >? ? ?definition.? This would have the following semantics:
> >
> >? ? ? ? ? {
> >? ? ? ? ? ? *tmp* <- expr
> >
> >? ? ? ? ? ? # For arguments with no "default" expression,
> >
> >? ? ? ? ? ? argname1 <- *tmp*[[1]]
> >? ? ? ? ? ? argname2 <- *tmp*[[2]]
> >? ? ? ? ? ? ...
> >
> >? ? ? ? ? ? # For arguments with a default listed
> >
> >? ? ? ? ? ? argname3 <- with(*tmp*, default3)
> >? ? ? ? ? }
> >
> >
> >? ? ?The value of the whole thing would therefore be
(invisibly) the
> >? ? ?value of
> >? ? ?the last item in the assignment.
> >
> >? ? ?Two examples:
> >
> >? ? ? ? ?[A, B, C] <- expr? ?# assign the first three
elements of expr to A,
> >? ? ?B, and C
> >
> >? ? ? ? ?[A, B, C = a + b] <- expr? # assign the first two
elements of expr
> >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # to A and B,
> >? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? # assign with(expr, a +
b) to C.
> >
> >? ? ?Unfortunately, I don't think this could be done entirely by
> >? ? ?transforming
> >? ? ?the expression (which is the way |> was done), and that
makes it a lot
> >? ? ?harder to write and to reason about.? E.g. what does
this do?
> >
> >? ? ? ? ?A <- 0
> >? ? ? ? ?[A, B = A + 10] <- list(1, A = 2)
> >
> >? ? ?According to the recipe above, I think it sets A to 1
and B to 12, but
> >? ? ?maybe a user would expect B to be 10 or 11.? And
according to that
> >? ? ?recipe this is an error:
> >
> >? ? ? ? ?[A, B = A + 10] <- c(1, A = 2)
> >
> >? ? ?which probably isn't what a user would expect, given
that this is fine:
> >
> >? ? ? ? ?[A, B] <- c(1, 2)
> >
> >? ? ?Duncan Murdoch
> >
>
> ______________________________________________
> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
Kevins package is very nice as a proof of concept, no doubt about that, but it is not at the level of performance or convenience that a native R implementation would offer. I would probably not use it to translate matlab routines into R packages placed on CRAN, because it?s an additional dependency, I have a performance burden in every iteration, and utils::globalVariables() is everything but elegant. From that perspective it would be more convenient for me right now to stick with collapse::%=%, which is already written in C, and also call utils::globalVariables(). But again my hope in starting this was that R Core might see that the addition of multiple assignment would be a significant enhancement to the language, of the same order as the base pipe |> in my opinion. I think the discussion so far has at least brought forth a way to implement this in a way that does not violate fundamental principles of the language. Which could form a basis for thinking about an actual addition to the language. Best regards, Sebastian On Sun 12. Mar 2023 at 13:18, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
Thinking more about this, and seeing Kevins examples at https://github.com/kevinushey/dotty <https://github.com/kevinushey/dotty>, I think this is the most R-like way of doing it, with an additional benefit as it would allow to introduce the useful data.table semantics DT[, .(a = b, c, d)] to more general R. So I would propose to introduce a new primitive function . <- function(...) .Primitive(".") in R with an assignment method and the following features:
I think that proposal is very unlikely to be accepted. If it was a primitive function, it could only be maintained by R Core. They are justifiably very reluctant to take on extra work for themselves. Kevin's package demonstrates that this can be done entirely in a contributed package, which means there's no need for R Core to be involved. I don't know if he has plans to turn his prototype into a CRAN package. If he doesn't, then it will be up to some other interested maintainer to step up and take on the task, or it will just fade away. I haven't checked whether your proposals below represent changes from the current version of dotty, but if they do, the way to proceed is to fork that project, implement your changes, and offer to contribute them back to the main branch. Duncan Murdoch
* Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment
e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All
the functionality proposed by Kevin at
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty> is useful, unambiguous and
feasible.
* Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars.
* Mixing of positional and named assignment e.g .[mpg_new, carb_new =
carb, cyl_new] <- mtcars. The inputs not assigned by name are simply
the elements of RHS in the order they occur, regardless of whether
they have been used previously e.g. .[mpg_new, cyl_new = cyl,
log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could
be any named vector type.
* Conventional use of the function as lazy version of of list(), as in
data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D).
This would also be useful, allowing more parsimonious code, and
avoid the need to assign names to all return values in a function
return, e.g. if I already have matrices A, C, Q and R as internal
objects in my function, I can simply end by return(.(A, C, Q, R))
instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the
list to be named with the object names.
The implementation of this in R and C should be pretty straightforward.
It would just require a modification to R CMD Check to recognize .[<- as
assignment.
Best regards,
Sebastian
-
2.)
On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz
<sebastian.krantz at graduateinstitute.ch
<mailto:sebastian.krantz at graduateinstitute.ch>> wrote:
Thanks Gabriel and Kevin for your inputs,
regarding your points Gabriel, I think Python and Julia do allow
multiple sub-assignment, but in-line with my earlier suggestion in
response to Duncan to make multiple assignment an environment-level
operation (like collapse::%=% currently works), this would not be
possible in R.
Regarding the [a] <- coolest_function() syntax, yeah it would mean
do multiple assignment and set a equal to the first element dropping
all other elements. Multiple assignment should be positional loke in
other languages, enabling flexible renaming of objects on the fly.
So it should be irrelevant whether the function returns a named or
unnamed list or vector.
Thanks also Kevin for this contribution. I think it?s a remarkable
effort, and I wouldn?t mind such semantics e.g. making it a function
call to ?.[? or any other one-letter function, as long as it?s coded
in C and recognized by the interpreter as an assignment operation.
Best regards,
Sebastian
On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey at gmail.com
<mailto:kevinushey at gmail.com>> wrote:
FWIW, it's possible to get fairly close to your proposed
semantics
using the existing metaprogramming facilities in R. I put
together a
prototype package here to demonstrate:
https://github.com/kevinushey/dotty
<https://github.com/kevinushey/dotty>
The package exports an object called `.`, with a special
`[<-.dot` S3
method which enables destructuring assignments. This means you
can
write code like:
.[nr, nc] <- dim(mtcars)
and that will define 'nr' and 'nc' as you expect.
As for R CMD check warnings, you can suppress those through the
use of
globalVariables(), and that can also be automated within the
package.
The 'dotty' package includes a function 'dotify()' which
automates
looking for such usages in your package, and calling
globalVariables()
so that R CMD check doesn't warn. In theory, a similar technique
would
be applicable to other packages defining similar operators
(zeallot,
collapse).
Obviously, globalVariables() is a very heavy hammer to swing for
this
issue, but you might consider the benefits worth the tradeoffs.
Best,
Kevin
On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch
<murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>>
wrote:
>
> On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote:
> > Thanks Duncan and Ivan for the careful thoughts. I'm not
sure I can
> > follow all aspects you raised, but to give my limited take
on a few:
> >
> >> your proposal violates a very basic property of the
language, i.e. that all statements are expressions and have a
value. > What's the value of 1 + (A, C = init_matrices()).
> >
> > I'm not sure I see the point here. I evaluated 1 + (d =
dim(mtcars); nr
> > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax
error,
>
>
> d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
>
> is not a statement, it is a sequence of 4 statements.
>
> Duncan Murdoch
>
> as the
> > above expression should. `%=%` assigns to
> > environments, so 1 + (c("A", "C") %=% init_matrices())
returns
> > numeric(0), with A and C having their values assigned.
> >
> >> suppose f() returns list(A = 1, B = 2) and I do > B, A <-
f() > Should assignment be by position or by name?
> >
> > In other languages this is by position. The feature is not
meant to
> > replace list2env(), and being able to rename objects in the
assignment
> > is a vital feature of codes
> > using multi input and output functions e.g. in Matlab or
Julia.
> >
> >> Honestly, given that this is simply syntactic sugar, I
don't think I would support it.
> >
> > You can call it that, but it would be used by almost every
R user almost
> > every day. Simple things like nr, nc = dim(x); values,
vectors =
> > eigen(x) etc. where the creation of intermediate objects
> > is cumbersome and redundant.
> >
> >> I see you've already mentioned it ("JavaScript-like"). I
think it would fulfil Sebastian's requirements too, as long as
it is considered "true assignment" by the rest of the language.
> >
> > I don't have strong opinions about how the issue is phrased
or
> > implemented. Something like [t, n] = dim(x) might even be
more clear.
> > It's important though that assignment remains by position,
> > so even if some output gets thrown away that should also be
positional.
> >
> >> A <- 0 > [A, B = A + 10] <- list(1, A = 2)
> >
> > I also fail to see the use of allowing this. something like
this is an
> > error.
> >
> >> A = 2
> >> (B = A + 1) <- 1
> > Error in (B = A + 1) <- 1 : could not find function "(<-"
> >
> > Regarding the practical implementation, I think
`collapse::%=%` is a
> > good starting point. It could be introduced in R as a
separate function,
> > or `=` could be modified to accommodate its capability. It
should be
> > clear that
> > with more than one LHS variables the assignment is an
environment level
> > operation and the results can only be used in computations
once assigned
> > to the environment, e.g. as in 1 + (c("A", "C") %=%
init_matrices()),
> > A and C are not available for the addition in this
statement. The
> > interpretor then needs to be modified to read something
like nr, nc =
> > dim(x) or [nr, nc] = dim(x). as an environment-level
multiple assignment
> > operation with no
> > immediate value. Appears very feasible to my limited
understanding, but
> > I guess there are other things to consider still.
Definitely appreciate
> > the responses so far though.
> >
> > Best regards,
> >
> > Sebastian
> >
> >
> >
> >
> >
> > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch
<murdoch.duncan at gmail.com <mailto:murdoch.duncan at gmail.com>
> > <mailto:murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>>> wrote:
> >
> > On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
> > > On Sat, 11 Mar 2023 11:11:06 -0500
> > > Duncan Murdoch <murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>
> > <mailto:murdoch.duncan at gmail.com
<mailto:murdoch.duncan at gmail.com>>> wrote:
> > >
> > >> That's clear, but your proposal violates a very
basic property
> > of the
> > >> language, i.e. that all statements are expressions
and have a value.
> > >
> > > How about reframing this feature request from
multiple assignment
> > > (which does go contrary to "everything has only one
value, even
> > if it's
> > > sometimes invisible(NULL)") to "structured binding"
/ "destructuring
> > > assignment" [*], which takes this single single
value returned by the
> > > expression and subsets it subject to certain rules?
It may be
> > easier to
> > > make a decision on the semantics for destructuring
assignment (e.g.
> > > languages which have this feature typically allow
throwing unneeded
> > > parts of the return value away), and it doesn't seem
to break as much
> > > of the rest of the language if implemented.
> > >
> > > I see you've already mentioned it
("JavaScript-like"). I think it
> > would
> > > fulfil Sebastian's requirements too, as long as it
is considered
> > "true
> > > assignment" by the rest of the language.
> > >
> > > The hard part is to propose the actual grammar of
the new feature (in
> > > terms of src/main/gram.y, preferably without
introducing
> > conflicts) and
> > > its semantics (including the corner cases, some of
which you have
> > > already mentioned). I'm not sure I'm up to the task.
> > >
> >
> > If I were doing it, here's what I'd propose:
> >
> > '[' formlist ']' LEFT_ASSIGN expr
> > '[' formlist ']' EQ_ASSIGN expr
> > expr RIGHT_ASSIGN '[' formlist ']'
> >
> > where `formlist` has the syntax of the formals list for
a function
> > definition. This would have the following semantics:
> >
> > {
> > *tmp* <- expr
> >
> > # For arguments with no "default" expression,
> >
> > argname1 <- *tmp*[[1]]
> > argname2 <- *tmp*[[2]]
> > ...
> >
> > # For arguments with a default listed
> >
> > argname3 <- with(*tmp*, default3)
> > }
> >
> >
> > The value of the whole thing would therefore be
(invisibly) the
> > value of
> > the last item in the assignment.
> >
> > Two examples:
> >
> > [A, B, C] <- expr # assign the first three
elements of expr to A,
> > B, and C
> >
> > [A, B, C = a + b] <- expr # assign the first two
elements of expr
> > # to A and B,
> > # assign with(expr, a +
b) to C.
> >
> > Unfortunately, I don't think this could be done
entirely by
> > transforming
> > the expression (which is the way |> was done), and that
makes it a lot
> > harder to write and to reason about. E.g. what does
this do?
> >
> > A <- 0
> > [A, B = A + 10] <- list(1, A = 2)
> >
> > According to the recipe above, I think it sets A to 1
and B to 12, but
> > maybe a user would expect B to be 10 or 11. And
according to that
> > recipe this is an error:
> >
> > [A, B = A + 10] <- c(1, A = 2)
> >
> > which probably isn't what a user would expect, given
that this is fine:
> >
> > [A, B] <- c(1, 2)
> >
> > Duncan Murdoch
> >
>
> ______________________________________________
> R-devel at r-project.org <mailto:R-devel at r-project.org> mailing
list
Dear All, As a maintainer of large, complex packages, I can think of many places in which deconstructing assignment would simplify the code, as well as facilitate readability by breaking up larger functions into helpers, so I would be very glad to see this incorporated somehow. I think the crux of the matter is that while there is a number of ways to implement deconstructing assignment within R, there is no mechanism to tell R CMD check about it without also suppressing checks for every other instance of that variable name. This is particularly problematic because those variable names are likely to be used elsewhere in the package. Workarounds that have been suggested all defeat the conciseness and clarity of the deconstructing assignment and introduce potential for subtle bugs. The check warnings are something that can only be addressed in 'codetools', with a finer API than what utils::globalVariables() provides.?Perhaps this would have a lower hurdle than modifying R language itself?
Yes, this is really a problem with the checks, not with the language. A simpler approach than your alternativeAssignment function would be simply to allow globalVariables() to be limited to a single function as the note in its help page says. This might be tedious to write by hand, but could be automated using methods like "dotify" in dotty. Duncan Murdoch
On 12/03/2023 10:36 p.m., Pavel Krivitsky wrote:
Dear All,
As a maintainer of large, complex packages, I can think of many places
in which deconstructing assignment would simplify the code, as well as
facilitate readability by breaking up larger functions into helpers, so
I would be very glad to see this incorporated somehow.
I think the crux of the matter is that while there is a number of ways
to implement deconstructing assignment within R, there is no mechanism
to tell R CMD check about it without also suppressing checks for every
other instance of that variable name. This is particularly problematic
because those variable names are likely to be used elsewhere in the
package.
Workarounds that have been suggested all defeat the conciseness and
clarity of the deconstructing assignment and introduce potential for
subtle bugs.
The check warnings are something that can only be addressed in
'codetools', with a finer API than what utils::globalVariables()
provides.?Perhaps this would have a lower hurdle than modifying R
language itself?
From skimming through the relevant 'codetools' code, one idea for such
an API would be a function, along the lines of
utils::alternativeAssignment(op, assigned)
that sets up a callback assigned = function(op, e) that given the
operator (as string) and the expression it's embedded in, returns a
list of three elements:
* a character vector containing a list of variables assigned to that
might not otherwise be detected
* a character vector containing a list of variables referenced that
might not otherwise be detected
* expression e with potentially "offending" elements removed, which
will then be processed by the rest of the checking code
Then, say, 'zeallot' could implement zeallot::zeallot_assign_detect(),
and a package developer using it could put
utils::alternativeAssignment("%<-%", zeallot::zeallot_assign_detect)
in their .onLoad() function. Similarly, users of 'dotty' could set up
callbacks for all standard assignment operators to inform the code
about the nonstandard assignment.
Best Regards,Pavel
On Sun, 2023-03-12 at 14:05 +0200, Sebastian Martin Krantz wrote:
Kevins package is very nice as a proof of concept, no doubt about that, but it is not at the level of performance or convenience that a native R implementation would offer. I would probably not use it to translate matlab routines into R packages placed on CRAN, because it?s an additional dependency, I have a performance burden in every iteration, and utils::globalVariables() is everything but elegant. From that perspective it would be more convenient for me right now to stick with collapse::%=%, which is already written in C, and also call utils::globalVariables(). But again my hope in starting this was that R Core might see that the addition of multiple assignment would be a significant enhancement to the language, of the same order as the base pipe |> in my opinion. I think the discussion so far has at least brought forth a way to implement this in a way that does not violate fundamental principles of the language. Which could form a basis for thinking about an actual addition to the language. Best regards, Sebastian On Sun 12. Mar 2023 at 13:18, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
Thinking more about this, and seeing Kevins examples at https://github.com/kevinushey/dotty <https://github.com/kevinushey/dotty>, I think this is the most R-like way of doing it, with an additional benefit as it would allow to introduce the useful data.table semantics DT[, .(a = b, c, d)] to more general R. So I would propose to introduce a new primitive function . <- function(...) .Primitive(".") in R with an assignment method and the following features:
I think that proposal is very unlikely to be accepted.? If it was a primitive function, it could only be maintained by R Core.? They are justifiably very reluctant to take on extra work for themselves. Kevin's package demonstrates that this can be done entirely in a contributed package, which means there's no need for R Core to be involved.? I don't know if he has plans to turn his prototype into a CRAN package.? If he doesn't, then it will be up to some other interested maintainer to step up and take on the task, or it will just fade away. I haven't checked whether your proposals below represent changes from the current version of dotty, but if they do, the way to proceed is to fork that project, implement your changes, and offer to contribute them back to the main branch. Duncan Murdoch
? * Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment ??? e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All ??? the functionality proposed by Kevin at ??? https://github.com/kevinushey/dotty ??? <https://github.com/kevinushey/dotty> is useful, unambiguous and ??? feasible. ? * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars. ? * Mixing of positional and named assignment e.g .[mpg_new, carb_new = ??? carb, cyl_new] <- mtcars. The inputs not assigned by name are simply ??? the elements of RHS in the order they occur, regardless of whether ??? they have been used previously e.g. .[mpg_new, cyl_new = cyl, ??? log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could ??? be any named vector type. ? * Conventional use of the function as lazy version of of list(), as in ??? data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D). ??? This would also be useful, allowing more parsimonious code, and ??? avoid the need to assign names to all return values in a function ??? return, e.g. if I already have matrices A, C, Q and R as internal ??? objects in my function, I can simply end by return(.(A, C, Q, R)) ??? instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the ??? list to be named with the object names. The implementation of this in R and C should be pretty straightforward. It would just require a modification to R CMD Check to recognize .[<- as assignment. Best regards, Sebastian - 2.) On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz <sebastian.krantz at graduateinstitute.ch <mailto:sebastian.krantz at graduateinstitute.ch>> wrote: ??? Thanks Gabriel and Kevin for your inputs, ??? regarding your points Gabriel, I think Python and Julia do allow ??? multiple sub-assignment, but in-line with my earlier suggestion in ??? response to Duncan to make multiple assignment an environment-level ??? operation (like collapse::%=% currently works),? this would not be ??? possible in R. ??? Regarding the [a] <- coolest_function() syntax, yeah it would mean ??? do multiple assignment and set a equal to the first element dropping ??? all other elements. Multiple assignment should be positional loke in ??? other languages, enabling flexible renaming of objects on the fly. ??? So it should be irrelevant whether the function returns a named or ??? unnamed list or vector. ??? Thanks also Kevin for this contribution. I think it?s a remarkable ??? effort, and I wouldn?t mind such semantics e.g. making it a function ??? call to ?.[? or any other one-letter function, as long as it?s coded ??? in C and recognized by the interpreter as an assignment operation. ??? Best regards, ??? Sebastian ??? On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey at gmail.com ??? <mailto:kevinushey at gmail.com>> wrote: ??????? FWIW, it's possible to get fairly close to your proposed
semantics
??????? using the existing metaprogramming facilities in R. I put
together a
??????? prototype package here to demonstrate: ??????? https://github.com/kevinushey/dotty ??????? <https://github.com/kevinushey/dotty> ??????? The package exports an object called `.`, with a special ??????? `[<-.dot` S3 ??????? method which enables destructuring assignments. This means you
can
??????? write code like: ???????????? .[nr, nc] <- dim(mtcars) ??????? and that will define 'nr' and 'nc' as you expect. ??????? As for R CMD check warnings, you can suppress those through the ??????? use of ??????? globalVariables(), and that can also be automated within the ??????? package. ??????? The 'dotty' package includes a function 'dotify()' which
automates
??????? looking for such usages in your package, and calling ??????? globalVariables() ??????? so that R CMD check doesn't warn. In theory, a similar technique ??????? would ??????? be applicable to other packages defining similar operators
(zeallot,
??????? collapse).
??????? Obviously, globalVariables() is a very heavy hammer to
swing for
??????? this
??????? issue, but you might consider the benefits worth the
tradeoffs.
??????? Best,
??????? Kevin
??????? On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch
<murdoch.duncan at gmail.com?<mailto:murdoch.duncan at gmail.com>>
wrote:
???????? > ???????? > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote: ???????? > > Thanks Duncan and Ivan for the careful thoughts. I'm not ??????? sure I can ???????? > > follow all aspects you raised, but to give my limited take ??????? on a few: ???????? > > ???????? > >> your proposal violates a very basic property of the ??????? language, i.e. that all statements are expressions and have a ??????? value.? > What's the value of 1 + (A, C = init_matrices()). ???????? > > ???????? > > I'm not sure I see the point here. I evaluated 1 + (d = ??????? dim(mtcars); nr ???????? > > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax
error,
???????? >
???????? >
???????? >??? d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
???????? >
???????? > is not a statement, it is a sequence of 4 statements.
???????? >
???????? > Duncan Murdoch
???????? >
???????? >?? as the
???????? > > above expression should. `%=%` assigns to
???????? > > environments, so 1 + (c("A", "C") %=%
init_matrices())
returns
???????? > > numeric(0), with A and C having their values assigned. ???????? > > ???????? > >> suppose f() returns list(A = 1, B = 2) and I do? > B, A <- ??????? f() > Should assignment be by position or by name? ???????? > > ???????? > > In other languages this is by position. The feature is not ??????? meant to ???????? > > replace list2env(), and being able to rename objects in the ??????? assignment ???????? > > is a vital feature of codes ???????? > > using multi input and output functions e.g. in Matlab or
Julia.
???????? > >
???????? > >> Honestly, given that this is simply syntactic
sugar, I
??????? don't think I would support it.
???????? > >
???????? > > You can call it that, but it would be used by almost
every
??????? R user almost
???????? > > every day. Simple things like nr, nc = dim(x);
values,
??????? vectors =
???????? > > eigen(x) etc. where the creation of intermediate
objects
???????? > > is cumbersome and redundant.
???????? > >
???????? > >> I see you've already mentioned it ("JavaScript-
like"). I
??????? think it would? fulfil Sebastian's requirements too, as
long as
??????? it is considered "true assignment" by the rest of the
language.
???????? > >
???????? > > I don't have strong opinions about how the issue is
phrased
or
???????? > > implemented. Something like [t, n] = dim(x) might
even be
??????? more clear.
???????? > > It's important though that assignment remains by
position,
???????? > > so even if some output gets thrown away that should
also be
??????? positional.
???????? > >
???????? > >>? A <- 0? > [A, B = A + 10] <- list(1, A = 2)
???????? > >
???????? > > I also fail to see the use of allowing this.
something like
??????? this is an
???????? > > error.
???????? > >
???????? > >> A = 2
???????? > >> (B = A + 1) <- 1
???????? > > Error in (B = A + 1) <- 1 : could not find function
"(<-"
???????? > >
???????? > > Regarding the practical implementation, I think
??????? `collapse::%=%` is a
???????? > > good starting point. It could be introduced in R as
a
??????? separate function,
???????? > > or `=` could be modified to accommodate its
capability. It
??????? should be
???????? > > clear that
???????? > > with more than one LHS variables the assignment is
an
??????? environment level
???????? > > operation and the results can only be used in
computations
??????? once assigned
???????? > > to the environment, e.g. as in 1 + (c("A", "C") %=%
??????? init_matrices()),
???????? > > A and C are not available for the addition in this
??????? statement. The
???????? > > interpretor then needs to be modified to read
something
??????? like nr, nc =
???????? > > dim(x) or [nr, nc] = dim(x). as an environment-level
??????? multiple assignment
???????? > > operation with no
???????? > > immediate value. Appears very feasible to my limited
??????? understanding, but
???????? > > I guess there are other things to consider still.
??????? Definitely appreciate
???????? > > the responses so far though.
???????? > >
???????? > > Best regards,
???????? > >
???????? > > Sebastian
???????? > >
???????? > >
???????? > >
???????? > >
???????? > >
???????? > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch
<murdoch.duncan at gmail.com?<mailto:murdoch.duncan at gmail.com>
???????? > > <mailto:murdoch.duncan at gmail.com
??????? <mailto:murdoch.duncan at gmail.com>>> wrote:
???????? > >
???????? > >???? On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
???????? > >????? > On Sat, 11 Mar 2023 11:11:06 -0500
???????? > >????? > Duncan Murdoch <murdoch.duncan at gmail.com
??????? <mailto:murdoch.duncan at gmail.com>
???????? > >???? <mailto:murdoch.duncan at gmail.com
??????? <mailto:murdoch.duncan at gmail.com>>> wrote:
???????? > >????? >
???????? > >????? >> That's clear, but your proposal violates a
very
??????? basic property
???????? > >???? of the
???????? > >????? >> language, i.e. that all statements are
expressions
??????? and have a value.
???????? > >????? >
???????? > >????? > How about reframing this feature request from
??????? multiple assignment
???????? > >????? > (which does go contrary to "everything has
only one
??????? value, even
???????? > >???? if it's
???????? > >????? > sometimes invisible(NULL)") to "structured
binding"
??????? / "destructuring
???????? > >????? > assignment" [*], which takes this single
single
??????? value returned by the
???????? > >????? > expression and subsets it subject to certain
rules?
??????? It may be
???????? > >???? easier to
???????? > >????? > make a decision on the semantics for
destructuring
??????? assignment (e.g.
???????? > >????? > languages which have this feature typically
allow
??????? throwing unneeded
???????? > >????? > parts of the return value away), and it
doesn't seem
??????? to break as much
???????? > >????? > of the rest of the language if implemented.
???????? > >????? >
???????? > >????? > I see you've already mentioned it
??????? ("JavaScript-like"). I think it
???????? > >???? would
???????? > >????? > fulfil Sebastian's requirements too, as long
as it
??????? is considered
???????? > >???? "true
???????? > >????? > assignment" by the rest of the language.
???????? > >????? >
???????? > >????? > The hard part is to propose the actual
grammar of
??????? the new feature (in
???????? > >????? > terms of src/main/gram.y, preferably without
introducing
???????? > >???? conflicts) and
???????? > >????? > its semantics (including the corner cases,
some of
??????? which you have
???????? > >????? > already mentioned). I'm not sure I'm up to
the task.
???????? > >????? >
???????? > >
???????? > >???? If I were doing it, here's what I'd propose:
???????? > >
???????? > >???????? '[' formlist ']' LEFT_ASSIGN expr
???????? > >???????? '[' formlist ']' EQ_ASSIGN expr
???????? > >???????? expr RIGHT_ASSIGN? '[' formlist ']'
???????? > >
???????? > >???? where `formlist` has the syntax of the formals
list for
??????? a function
???????? > >???? definition.? This would have the following
semantics:
???????? > >
???????? > >????????? {
???????? > >??????????? *tmp* <- expr
???????? > >
???????? > >??????????? # For arguments with no "default"
expression,
???????? > >
???????? > >??????????? argname1 <- *tmp*[[1]]
???????? > >??????????? argname2 <- *tmp*[[2]]
???????? > >??????????? ...
???????? > >
???????? > >??????????? # For arguments with a default listed
???????? > >
???????? > >??????????? argname3 <- with(*tmp*, default3)
???????? > >????????? }
???????? > >
???????? > >
???????? > >???? The value of the whole thing would therefore be
??????? (invisibly) the
???????? > >???? value of
???????? > >???? the last item in the assignment.
???????? > >
???????? > >???? Two examples:
???????? > >
???????? > >???????? [A, B, C] <- expr?? # assign the first three
??????? elements of expr to A,
???????? > >???? B, and C
???????? > >
???????? > >???????? [A, B, C = a + b] <- expr? # assign the
first two
??????? elements of expr
???????? > >??????????????????????????????????? # to A and B,
???????? > >??????????????????????????????????? # assign
with(expr, a +
??????? b) to C.
???????? > >
???????? > >???? Unfortunately, I don't think this could be done
entirely by
???????? > >???? transforming ???????? > >???? the expression (which is the way |> was done), and that ??????? makes it a lot ???????? > >???? harder to write and to reason about.? E.g. what does ??????? this do? ???????? > > ???????? > >???????? A <- 0 ???????? > >???????? [A, B = A + 10] <- list(1, A = 2) ???????? > > ???????? > >???? According to the recipe above, I think it sets A to 1 ??????? and B to 12, but ???????? > >???? maybe a user would expect B to be 10 or 11.? And ??????? according to that ???????? > >???? recipe this is an error: ???????? > > ???????? > >???????? [A, B = A + 10] <- c(1, A = 2) ???????? > > ???????? > >???? which probably isn't what a user would expect, given ??????? that this is fine: ???????? > > ???????? > >???????? [A, B] <- c(1, 2) ???????? > > ???????? > >???? Duncan Murdoch ???????? > > ???????? > ???????? > ______________________________________________ ???????? > R-devel at r-project.org?<mailto:R-devel at r-project.org> mailing
list
???????? > https://stat.ethz.ch/mailman/listinfo/r-devel ??????? <https://stat.ethz.ch/mailman/listinfo/r-devel>
????????[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org?mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
The gsubfn package can do that.
library(gsubfn)
# swap a and b without explicitly creating a temporary
a <- 1; b <- 2
list[a,b] <- list(b,a)
# get eigenvectors and eigenvalues
list[eval, evec] <- eigen(cbind(1,1:3,3:1))
# get today's month, day, year
require(chron)
list[Month, Day, Year] <- month.day.year(unclass(Sys.Date()))
# get first two components of linear model ignoring rest
list[Coef, Resid] <- lm(rnorm(10) ~ seq(10))
# assign Green and Blue (but not Red) components
list[,Green,Blue] <- col2rgb("aquamarine")
# Assign QR and QRaux but not other components
list[QR,,QRaux] <- qr(c(1,1:3,3:1))
On Sat, Mar 11, 2023 at 7:47?AM Sebastian Martin Krantz
<sebastian.krantz at graduateinstitute.ch> wrote:
Dear R Core, working on my dynamic factor modelling package, which requires several subroutines to create and update several system matrices, I come back to the issue of being annoyed by R not supporting multiple assignment out of the box like Matlab, Python and julia. e.g. something like A, C, Q, R = init_matrices(X, Y, Z) would be a great addition to the language. I know there are several workarounds such as the %<-% operator in the zeallot package or my own %=% operator in collapse, but these don't work well for package development as R CMD Check warns about missing global bindings for the created variables, e.g. I would have to use A <- C <- Q <- R <- NULL .c(A, C, Q, R) %=% init_matrices(X, Y, Z) in a package, which is simply annoying. Of course the standard way of init <- init_matrices(X, Y, Z) A <- init$A; C <- init$C; Q <- init$Q; R <- init$R rm(init) is also super cumbersome compared to Python or Julia. Another reason is of course performance, even my %=% operator written in C has a non-negligible performance cost for very tight loops, compared to a solution at the interpretor level or in a primitive function such as `=`. So my conclusion at this point is that it is just significantly easier to implement such codes in Julia, in addition to the greater performance it offers. There are obvious reasons why I am still coding in R and C, thanks to the robust API and great ecosystem of packages, but adding this could be a presumably low-hanging fruit to make my life a bit easier. Several issues for this have been filed on Stackoverflow, the most popular one ( https://stackoverflow.com/questions/7519790/assign-multiple-new-variables-on-lhs-in-a-single-line) has been viewed 77 thousand times. But maybe this has already been discussed here and already decided against. In that case, a way to browse R-devel archives to find out would be nice. Best regards, Sebastian [[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
1 day later
Sebastian Martin Krantz
on Sat, 11 Mar 2023 11:04:54 +0200 writes:
[............]
> But maybe this has already been discussed here and already
> decided against. In that case, a way to browse R-devel
> archives to find out would be nice.
> Best regards,
> Sebastian
....
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
As every mailing list message contains a footer like this,
and if you know about 'site:' in Google or say, https://duckduckgo.com/
you can use search terms such as
site:stat.ethz.ch '[R-devel]' "Multiple Assignment"
or site:stat.ethz.ch/pipermail '[R-devel]' "Multiple Assignment"
or site:stat.ethz.ch/pipermail/r-devel 'Multiple Assignment'
giving results (but for me, currently,
much much better ones on duckduckgo.com than Google.com).
"Search" from the sidebar of www.r-project.org
is https://www.r-project.org/search.html
which mentions
- https://search.r-project.org/
- Rseek
where Rseek is very widely searching, and (the R foundation's)
search.r-project.org is mainly searching on CRAN
and both give interesting hits on 'multiple assignment'
Martin
On 13/03/2023 6:01 a.m., Duncan Murdoch wrote:
Yes, this is really a problem with the checks, not with the language. A simpler approach than your alternativeAssignment function would be simply to allow globalVariables() to be limited to a single function as the note in its help page says.
I just took a look, and this would be quite easy to do. It would require changes to codetools and to utils, but probably just a few dozen lines. Duncan Murdoch
This might be tedious to write by hand, but could be automated using methods like "dotify" in dotty. Duncan Murdoch On 12/03/2023 10:36 p.m., Pavel Krivitsky wrote:
Dear All,
As a maintainer of large, complex packages, I can think of many places
in which deconstructing assignment would simplify the code, as well as
facilitate readability by breaking up larger functions into helpers, so
I would be very glad to see this incorporated somehow.
I think the crux of the matter is that while there is a number of ways
to implement deconstructing assignment within R, there is no mechanism
to tell R CMD check about it without also suppressing checks for every
other instance of that variable name. This is particularly problematic
because those variable names are likely to be used elsewhere in the
package.
Workarounds that have been suggested all defeat the conciseness and
clarity of the deconstructing assignment and introduce potential for
subtle bugs.
The check warnings are something that can only be addressed in
'codetools', with a finer API than what utils::globalVariables()
provides.?Perhaps this would have a lower hurdle than modifying R
language itself?
From skimming through the relevant 'codetools' code, one idea for such
an API would be a function, along the lines of
utils::alternativeAssignment(op, assigned)
that sets up a callback assigned = function(op, e) that given the
operator (as string) and the expression it's embedded in, returns a
list of three elements:
* a character vector containing a list of variables assigned to that
might not otherwise be detected
* a character vector containing a list of variables referenced that
might not otherwise be detected
* expression e with potentially "offending" elements removed, which
will then be processed by the rest of the checking code
Then, say, 'zeallot' could implement zeallot::zeallot_assign_detect(),
and a package developer using it could put
utils::alternativeAssignment("%<-%", zeallot::zeallot_assign_detect)
in their .onLoad() function. Similarly, users of 'dotty' could set up
callbacks for all standard assignment operators to inform the code
about the nonstandard assignment.
Best Regards,Pavel
On Sun, 2023-03-12 at 14:05 +0200, Sebastian Martin Krantz wrote:
Kevins package is very nice as a proof of concept, no doubt about that, but it is not at the level of performance or convenience that a native R implementation would offer. I would probably not use it to translate matlab routines into R packages placed on CRAN, because it?s an additional dependency, I have a performance burden in every iteration, and utils::globalVariables() is everything but elegant. From that perspective it would be more convenient for me right now to stick with collapse::%=%, which is already written in C, and also call utils::globalVariables(). But again my hope in starting this was that R Core might see that the addition of multiple assignment would be a significant enhancement to the language, of the same order as the base pipe |> in my opinion. I think the discussion so far has at least brought forth a way to implement this in a way that does not violate fundamental principles of the language. Which could form a basis for thinking about an actual addition to the language. Best regards, Sebastian On Sun 12. Mar 2023 at 13:18, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 12/03/2023 6:07 a.m., Sebastian Martin Krantz wrote:
Thinking more about this, and seeing Kevins examples at https://github.com/kevinushey/dotty <https://github.com/kevinushey/dotty>, I think this is the most R-like way of doing it, with an additional benefit as it would allow to introduce the useful data.table semantics DT[, .(a = b, c, d)] to more general R. So I would propose to introduce a new primitive function . <- function(...) .Primitive(".") in R with an assignment method and the following features:
I think that proposal is very unlikely to be accepted.? If it was a primitive function, it could only be maintained by R Core.? They are justifiably very reluctant to take on extra work for themselves. Kevin's package demonstrates that this can be done entirely in a contributed package, which means there's no need for R Core to be involved.? I don't know if he has plans to turn his prototype into a CRAN package.? If he doesn't, then it will be up to some other interested maintainer to step up and take on the task, or it will just fade away. I haven't checked whether your proposals below represent changes from the current version of dotty, but if they do, the way to proceed is to fork that project, implement your changes, and offer to contribute them back to the main branch. Duncan Murdoch
? * Positional assignment e.g. .[nr, nc] <- dim(x), and named assignment ??? e.g. .[new = carb] <- mtcars or .[new = log(carb)] <- mtcars. All ??? the functionality proposed by Kevin at ??? https://github.com/kevinushey/dotty ??? <https://github.com/kevinushey/dotty> is useful, unambiguous and ??? feasible. ? * Silent dropping of RHS values e.g. .[mpg_new, cyl_new] <- mtcars. ? * Mixing of positional and named assignment e.g .[mpg_new, carb_new = ??? carb, cyl_new] <- mtcars. The inputs not assigned by name are simply ??? the elements of RHS in the order they occur, regardless of whether ??? they have been used previously e.g. .[mpg_new, cyl_new = cyl, ??? log_cyl = log(cyl), cyl_new2] <- mtcars is feasible. RHS here could ??? be any named vector type. ? * Conventional use of the function as lazy version of of list(), as in ??? data.table: .(A = B, C, D) is the same as list(A = B, C = C, D = D). ??? This would also be useful, allowing more parsimonious code, and ??? avoid the need to assign names to all return values in a function ??? return, e.g. if I already have matrices A, C, Q and R as internal ??? objects in my function, I can simply end by return(.(A, C, Q, R)) ??? instead of return(list(A = A, C = C, Q = Q, R = R)) if I wanted the ??? list to be named with the object names. The implementation of this in R and C should be pretty straightforward. It would just require a modification to R CMD Check to recognize .[<- as assignment. Best regards, Sebastian - 2.) On Sun, 12 Mar 2023 at 09:42, Sebastian Martin Krantz <sebastian.krantz at graduateinstitute.ch <mailto:sebastian.krantz at graduateinstitute.ch>> wrote: ??? Thanks Gabriel and Kevin for your inputs, ??? regarding your points Gabriel, I think Python and Julia do allow ??? multiple sub-assignment, but in-line with my earlier suggestion in ??? response to Duncan to make multiple assignment an environment-level ??? operation (like collapse::%=% currently works),? this would not be ??? possible in R. ??? Regarding the [a] <- coolest_function() syntax, yeah it would mean ??? do multiple assignment and set a equal to the first element dropping ??? all other elements. Multiple assignment should be positional loke in ??? other languages, enabling flexible renaming of objects on the fly. ??? So it should be irrelevant whether the function returns a named or ??? unnamed list or vector. ??? Thanks also Kevin for this contribution. I think it?s a remarkable ??? effort, and I wouldn?t mind such semantics e.g. making it a function ??? call to ?.[? or any other one-letter function, as long as it?s coded ??? in C and recognized by the interpreter as an assignment operation. ??? Best regards, ??? Sebastian ??? On Sun 12. Mar 2023 at 01:00, Kevin Ushey <kevinushey at gmail.com ??? <mailto:kevinushey at gmail.com>> wrote: ??????? FWIW, it's possible to get fairly close to your proposed
semantics
??????? using the existing metaprogramming facilities in R. I put
together a
??????? prototype package here to demonstrate: ??????? https://github.com/kevinushey/dotty ??????? <https://github.com/kevinushey/dotty> ??????? The package exports an object called `.`, with a special ??????? `[<-.dot` S3 ??????? method which enables destructuring assignments. This means you
can
??????? write code like: ???????????? .[nr, nc] <- dim(mtcars) ??????? and that will define 'nr' and 'nc' as you expect. ??????? As for R CMD check warnings, you can suppress those through the ??????? use of ??????? globalVariables(), and that can also be automated within the ??????? package. ??????? The 'dotty' package includes a function 'dotify()' which
automates
??????? looking for such usages in your package, and calling ??????? globalVariables() ??????? so that R CMD check doesn't warn. In theory, a similar technique ??????? would ??????? be applicable to other packages defining similar operators
(zeallot,
??????? collapse).
??????? Obviously, globalVariables() is a very heavy hammer to
swing for
??????? this
??????? issue, but you might consider the benefits worth the
tradeoffs.
??????? Best,
??????? Kevin
??????? On Sat, Mar 11, 2023 at 2:53?PM Duncan Murdoch
<murdoch.duncan at gmail.com?<mailto:murdoch.duncan at gmail.com>>
wrote:
???????? > ???????? > On 11/03/2023 4:42 p.m., Sebastian Martin Krantz wrote: ???????? > > Thanks Duncan and Ivan for the careful thoughts. I'm not ??????? sure I can ???????? > > follow all aspects you raised, but to give my limited take ??????? on a few: ???????? > > ???????? > >> your proposal violates a very basic property of the ??????? language, i.e. that all statements are expressions and have a ??????? value.? > What's the value of 1 + (A, C = init_matrices()). ???????? > > ???????? > > I'm not sure I see the point here. I evaluated 1 + (d = ??????? dim(mtcars); nr ???????? > > = d[1]; nc = d[2]; rm(d)), which simply gives a syntax
error,
???????? >
???????? >
???????? >??? d = dim(mtcars); nr = d[1]; nc = d[2]; rm(d)
???????? >
???????? > is not a statement, it is a sequence of 4 statements.
???????? >
???????? > Duncan Murdoch
???????? >
???????? >?? as the
???????? > > above expression should. `%=%` assigns to
???????? > > environments, so 1 + (c("A", "C") %=%
init_matrices())
returns
???????? > > numeric(0), with A and C having their values assigned. ???????? > > ???????? > >> suppose f() returns list(A = 1, B = 2) and I do? > B, A <- ??????? f() > Should assignment be by position or by name? ???????? > > ???????? > > In other languages this is by position. The feature is not ??????? meant to ???????? > > replace list2env(), and being able to rename objects in the ??????? assignment ???????? > > is a vital feature of codes ???????? > > using multi input and output functions e.g. in Matlab or
Julia.
???????? > >
???????? > >> Honestly, given that this is simply syntactic
sugar, I
??????? don't think I would support it.
???????? > >
???????? > > You can call it that, but it would be used by almost
every
??????? R user almost
???????? > > every day. Simple things like nr, nc = dim(x);
values,
??????? vectors =
???????? > > eigen(x) etc. where the creation of intermediate
objects
???????? > > is cumbersome and redundant.
???????? > >
???????? > >> I see you've already mentioned it ("JavaScript-
like"). I
??????? think it would? fulfil Sebastian's requirements too, as
long as
??????? it is considered "true assignment" by the rest of the
language.
???????? > >
???????? > > I don't have strong opinions about how the issue is
phrased
or
???????? > > implemented. Something like [t, n] = dim(x) might
even be
??????? more clear.
???????? > > It's important though that assignment remains by
position,
???????? > > so even if some output gets thrown away that should
also be
??????? positional.
???????? > >
???????? > >>? A <- 0? > [A, B = A + 10] <- list(1, A = 2)
???????? > >
???????? > > I also fail to see the use of allowing this.
something like
??????? this is an
???????? > > error.
???????? > >
???????? > >> A = 2
???????? > >> (B = A + 1) <- 1
???????? > > Error in (B = A + 1) <- 1 : could not find function
"(<-"
???????? > >
???????? > > Regarding the practical implementation, I think
??????? `collapse::%=%` is a
???????? > > good starting point. It could be introduced in R as
a
??????? separate function,
???????? > > or `=` could be modified to accommodate its
capability. It
??????? should be
???????? > > clear that
???????? > > with more than one LHS variables the assignment is
an
??????? environment level
???????? > > operation and the results can only be used in
computations
??????? once assigned
???????? > > to the environment, e.g. as in 1 + (c("A", "C") %=%
??????? init_matrices()),
???????? > > A and C are not available for the addition in this
??????? statement. The
???????? > > interpretor then needs to be modified to read
something
??????? like nr, nc =
???????? > > dim(x) or [nr, nc] = dim(x). as an environment-level
??????? multiple assignment
???????? > > operation with no
???????? > > immediate value. Appears very feasible to my limited
??????? understanding, but
???????? > > I guess there are other things to consider still.
??????? Definitely appreciate
???????? > > the responses so far though.
???????? > >
???????? > > Best regards,
???????? > >
???????? > > Sebastian
???????? > >
???????? > >
???????? > >
???????? > >
???????? > >
???????? > > On Sat, 11 Mar 2023 at 20:38, Duncan Murdoch
<murdoch.duncan at gmail.com?<mailto:murdoch.duncan at gmail.com>
???????? > > <mailto:murdoch.duncan at gmail.com
??????? <mailto:murdoch.duncan at gmail.com>>> wrote:
???????? > >
???????? > >???? On 11/03/2023 11:57 a.m., Ivan Krylov wrote:
???????? > >????? > On Sat, 11 Mar 2023 11:11:06 -0500
???????? > >????? > Duncan Murdoch <murdoch.duncan at gmail.com
??????? <mailto:murdoch.duncan at gmail.com>
???????? > >???? <mailto:murdoch.duncan at gmail.com
??????? <mailto:murdoch.duncan at gmail.com>>> wrote:
???????? > >????? >
???????? > >????? >> That's clear, but your proposal violates a
very
??????? basic property
???????? > >???? of the
???????? > >????? >> language, i.e. that all statements are
expressions
??????? and have a value.
???????? > >????? >
???????? > >????? > How about reframing this feature request from
??????? multiple assignment
???????? > >????? > (which does go contrary to "everything has
only one
??????? value, even
???????? > >???? if it's
???????? > >????? > sometimes invisible(NULL)") to "structured
binding"
??????? / "destructuring
???????? > >????? > assignment" [*], which takes this single
single
??????? value returned by the
???????? > >????? > expression and subsets it subject to certain
rules?
??????? It may be
???????? > >???? easier to
???????? > >????? > make a decision on the semantics for
destructuring
??????? assignment (e.g.
???????? > >????? > languages which have this feature typically
allow
??????? throwing unneeded
???????? > >????? > parts of the return value away), and it
doesn't seem
??????? to break as much
???????? > >????? > of the rest of the language if implemented.
???????? > >????? >
???????? > >????? > I see you've already mentioned it
??????? ("JavaScript-like"). I think it
???????? > >???? would
???????? > >????? > fulfil Sebastian's requirements too, as long
as it
??????? is considered
???????? > >???? "true
???????? > >????? > assignment" by the rest of the language.
???????? > >????? >
???????? > >????? > The hard part is to propose the actual
grammar of
??????? the new feature (in
???????? > >????? > terms of src/main/gram.y, preferably without
introducing
???????? > >???? conflicts) and
???????? > >????? > its semantics (including the corner cases,
some of
??????? which you have
???????? > >????? > already mentioned). I'm not sure I'm up to
the task.
???????? > >????? >
???????? > >
???????? > >???? If I were doing it, here's what I'd propose:
???????? > >
???????? > >???????? '[' formlist ']' LEFT_ASSIGN expr
???????? > >???????? '[' formlist ']' EQ_ASSIGN expr
???????? > >???????? expr RIGHT_ASSIGN? '[' formlist ']'
???????? > >
???????? > >???? where `formlist` has the syntax of the formals
list for
??????? a function
???????? > >???? definition.? This would have the following
semantics:
???????? > >
???????? > >????????? {
???????? > >??????????? *tmp* <- expr
???????? > >
???????? > >??????????? # For arguments with no "default"
expression,
???????? > >
???????? > >??????????? argname1 <- *tmp*[[1]]
???????? > >??????????? argname2 <- *tmp*[[2]]
???????? > >??????????? ...
???????? > >
???????? > >??????????? # For arguments with a default listed
???????? > >
???????? > >??????????? argname3 <- with(*tmp*, default3)
???????? > >????????? }
???????? > >
???????? > >
???????? > >???? The value of the whole thing would therefore be
??????? (invisibly) the
???????? > >???? value of
???????? > >???? the last item in the assignment.
???????? > >
???????? > >???? Two examples:
???????? > >
???????? > >???????? [A, B, C] <- expr?? # assign the first three
??????? elements of expr to A,
???????? > >???? B, and C
???????? > >
???????? > >???????? [A, B, C = a + b] <- expr? # assign the
first two
??????? elements of expr
???????? > >??????????????????????????????????? # to A and B,
???????? > >??????????????????????????????????? # assign
with(expr, a +
??????? b) to C.
???????? > >
???????? > >???? Unfortunately, I don't think this could be done
entirely by
???????? > >???? transforming ???????? > >???? the expression (which is the way |> was done), and that ??????? makes it a lot ???????? > >???? harder to write and to reason about.? E.g. what does ??????? this do? ???????? > > ???????? > >???????? A <- 0 ???????? > >???????? [A, B = A + 10] <- list(1, A = 2) ???????? > > ???????? > >???? According to the recipe above, I think it sets A to 1 ??????? and B to 12, but ???????? > >???? maybe a user would expect B to be 10 or 11.? And ??????? according to that ???????? > >???? recipe this is an error: ???????? > > ???????? > >???????? [A, B = A + 10] <- c(1, A = 2) ???????? > > ???????? > >???? which probably isn't what a user would expect, given ??????? that this is fine: ???????? > > ???????? > >???????? [A, B] <- c(1, 2) ???????? > > ???????? > >???? Duncan Murdoch ???????? > > ???????? > ???????? > ______________________________________________ ???????? > R-devel at r-project.org?<mailto:R-devel at r-project.org> mailing
list
???????? > https://stat.ethz.ch/mailman/listinfo/r-devel ??????? <https://stat.ethz.ch/mailman/listinfo/r-devel>
????????[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org?mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel