Hi,
I would really like to have a way to split long string literals across
multiple lines in R.
Currently, if a string literal spans multiple lines, there is no way to
inhibit the introduction of newline characters:
> "aaa
+ bbb"
[1] "aaa\nbbb"
If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb"
[1] "aaa\nbbb"
We could use this fact to implement string splitting in a fairly
backward-compatible way, since currently such trailing backslashes
should hardly be used as they do not have any effect. The attached patch
makes the parser ignore a newline character directly following a backslash:
> "aaa\
+ bbb"
[1] "aaabbb"
I personally would also prefer if leading blanks (spaces and tabs) in
the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb"
[1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-patch
Size: 2598 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170614/7986e543/attachment.bin>
[WISH / PATCH] possibility to split string literals across multiple lines
18 messages · Duncan Murdoch, Gábor Csárdi, Mark van der Loo +7 more
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R.
I don't understand why you require the string to be a literal. Why not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On Wed, Jun 14, 2017 at 12:12 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R.
You can also look at the glue package, it supports continuation and a lot more:
glue("
A formatted string \\
can also be on a \\
single line
")
#> A formatted string can also be on a single line
Gabor
[...]
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R.
I don't understand why you require the string to be a literal. Why not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Actually "execution time" is precisely one of the reasons why I would like to see this feature as - depending on the context (e.g. in a tight loop) - the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant. The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call. Andreas
Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Having some line-breaking character for string literals would have benefits as string literals can then be constructed parse-time rather than run-time. I have run into this myself a few times as well. One way to at least emulate something like that is the following. `%+%` <- function(x,y) paste0(x,y) "hello" %+% " pretty" %+% " world" -Mark Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting <r-devel at akersting.de>:
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch < murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R.
I don't understand why you require the string to be a literal. Why not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Actually "execution time" is precisely one of the reasons why I would like to see this feature as - depending on the context (e.g. in a tight loop) - the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant. The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call. Andreas
Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached
patch
makes the parser ignore a newline character directly following a
backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Mark, that's actually a fair statement, although your extra operator
doesn't cause construction at parse time. You still call paste0(), but just
add an extra layer on top of it.
I also doubt that even in gigantic loops the benefit is going to be
significant. Take following example:
atestfun <- function(x){
y <- paste0("a very long",
"string for testing")
grep(x, y)
}
atestfun2 <- function(x){
y <- "a very long
string for testing"
grep(x,y)
}
cfun <- cmpfun(atestfun)
cfun2 <- cmpfun(atestfun2)
require(rbenchmark)
benchmark(atestfun("a"),
atestfun2("a"),
cfun("a"),
cfun2("a"),
replications = 100000)
Which gives after 100,000 replications:
test replications elapsed relative
1 atestfun("a") 100000 0.83 1.339
2 atestfun2("a") 100000 0.62 1.000
3 cfun("a") 100000 0.81 1.306
4 cfun2("a") 100000 0.62 1.000
The patch can in principle make similar code marginally faster, but I'm not
convinced the patch is going to make any real difference except for in some
very specific and exotic cases. Even more, calling a function like the
examples inside the loop is the only way I can come up with where this
might be a problem. If you just construct the string inside the loop,
there's two possibilities:
- the string does not need to change, and then you better construct it
outside of the loop
- the string does need to change, and then you need paste() or paste0()
anyway
I'm not against incorporating the patch, as it would eliminate a few
keystrokes. It's a neat idea, but I don't expect any other noticeable
advantage from it.
my humble 2 cents
Cheers
Joris
On Wed, Jun 14, 2017 at 2:00 PM, Mark van der Loo <mark.vanderloo at gmail.com>
wrote:
Having some line-breaking character for string literals would have benefits as string literals can then be constructed parse-time rather than run-time. I have run into this myself a few times as well. One way to at least emulate something like that is the following. `%+%` <- function(x,y) paste0(x,y) "hello" %+% " pretty" %+% " world" -Mark Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting <r-devel at akersting.de
:
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch < murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals
across
multiple lines in R.
I don't understand why you require the string to be a literal. Why not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Actually "execution time" is precisely one of the reasons why I would
like
to see this feature as - depending on the context (e.g. in a tight loop)
-
the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant. The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call. Andreas
Currently, if a string literal spans multiple lines, there is no way
to
inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached
patch
makes the parser ignore a newline character directly following a
backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature
in
the first place? Regards, Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
I know it doesn't cause construction at parse time, and it was also not what I said. What I meant was that it makes the syntax at least look a little as if you have a line-breaking character within string literals. Op wo 14 jun. 2017 om 14:18 schreef Joris Meys <jorismeys at gmail.com>:
Mark, that's actually a fair statement, although your extra operator
doesn't cause construction at parse time. You still call paste0(), but just
add an extra layer on top of it.
I also doubt that even in gigantic loops the benefit is going to be
significant. Take following example:
atestfun <- function(x){
y <- paste0("a very long",
"string for testing")
grep(x, y)
}
atestfun2 <- function(x){
y <- "a very long
string for testing"
grep(x,y)
}
cfun <- cmpfun(atestfun)
cfun2 <- cmpfun(atestfun2)
require(rbenchmark)
benchmark(atestfun("a"),
atestfun2("a"),
cfun("a"),
cfun2("a"),
replications = 100000)
Which gives after 100,000 replications:
test replications elapsed relative
1 atestfun("a") 100000 0.83 1.339
2 atestfun2("a") 100000 0.62 1.000
3 cfun("a") 100000 0.81 1.306
4 cfun2("a") 100000 0.62 1.000
The patch can in principle make similar code marginally faster, but I'm
not convinced the patch is going to make any real difference except for in
some very specific and exotic cases. Even more, calling a function like the
examples inside the loop is the only way I can come up with where this
might be a problem. If you just construct the string inside the loop,
there's two possibilities:
- the string does not need to change, and then you better construct it
outside of the loop
- the string does need to change, and then you need paste() or paste0()
anyway
I'm not against incorporating the patch, as it would eliminate a few
keystrokes. It's a neat idea, but I don't expect any other noticeable
advantage from it.
my humble 2 cents
Cheers
Joris
On Wed, Jun 14, 2017 at 2:00 PM, Mark van der Loo <
mark.vanderloo at gmail.com> wrote:
Having some line-breaking character for string literals would have benefits as string literals can then be constructed parse-time rather than run-time. I have run into this myself a few times as well. One way to at least emulate something like that is the following. `%+%` <- function(x,y) paste0(x,y) "hello" %+% " pretty" %+% " world" -Mark Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting < r-devel at akersting.de>:
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch < murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals
across
multiple lines in R.
I don't understand why you require the string to be a literal. Why
not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Actually "execution time" is precisely one of the reasons why I would
like
to see this feature as - depending on the context (e.g. in a tight
loop) -
the execution time of paste0 (or probably also glue, thanks Gabor) is
not
necessarily insignificant. The other reason is style: I think it is cleaner if we can construct
such
a long string literal without the need for a function call. Andreas
Currently, if a string literal spans multiple lines, there is no
way to
inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached
patch
makes the parser ignore a newline character directly following a
backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs)
in
the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this
feature in
the first place? Regards, Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 <+32%209%20264%2061%2079> Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
Hi Mark, I got you. I just pointed out the obvious to illustrate why your emulation didn't eliminate the need for the real thing. I didn't mean to imply you weren't aware of this, even though it may seem so. Sometimes I'm not 100% aware of the subtleties of the English language. This seems one of those cases. Met vriendelijke groeten Joris On Wed, Jun 14, 2017 at 2:23 PM, Mark van der Loo <mark.vanderloo at gmail.com> wrote:
I know it doesn't cause construction at parse time, and it was also not what I said. What I meant was that it makes the syntax at least look a little as if you have a line-breaking character within string literals. Op wo 14 jun. 2017 om 14:18 schreef Joris Meys <jorismeys at gmail.com>:
Mark, that's actually a fair statement, although your extra operator
doesn't cause construction at parse time. You still call paste0(), but just
add an extra layer on top of it.
I also doubt that even in gigantic loops the benefit is going to be
significant. Take following example:
atestfun <- function(x){
y <- paste0("a very long",
"string for testing")
grep(x, y)
}
atestfun2 <- function(x){
y <- "a very long
string for testing"
grep(x,y)
}
cfun <- cmpfun(atestfun)
cfun2 <- cmpfun(atestfun2)
require(rbenchmark)
benchmark(atestfun("a"),
atestfun2("a"),
cfun("a"),
cfun2("a"),
replications = 100000)
Which gives after 100,000 replications:
test replications elapsed relative
1 atestfun("a") 100000 0.83 1.339
2 atestfun2("a") 100000 0.62 1.000
3 cfun("a") 100000 0.81 1.306
4 cfun2("a") 100000 0.62 1.000
The patch can in principle make similar code marginally faster, but I'm
not convinced the patch is going to make any real difference except for in
some very specific and exotic cases. Even more, calling a function like the
examples inside the loop is the only way I can come up with where this
might be a problem. If you just construct the string inside the loop,
there's two possibilities:
- the string does not need to change, and then you better construct it
outside of the loop
- the string does need to change, and then you need paste() or paste0()
anyway
I'm not against incorporating the patch, as it would eliminate a few
keystrokes. It's a neat idea, but I don't expect any other noticeable
advantage from it.
my humble 2 cents
Cheers
Joris
On Wed, Jun 14, 2017 at 2:00 PM, Mark van der Loo <
mark.vanderloo at gmail.com> wrote:
Having some line-breaking character for string literals would have benefits as string literals can then be constructed parse-time rather than run-time. I have run into this myself a few times as well. One way to at least emulate something like that is the following. `%+%` <- function(x,y) paste0(x,y) "hello" %+% " pretty" %+% " world" -Mark Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting < r-devel at akersting.de>:
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch < murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals
across
multiple lines in R.
I don't understand why you require the string to be a literal. Why
not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Actually "execution time" is precisely one of the reasons why I would
like
to see this feature as - depending on the context (e.g. in a tight
loop) -
the execution time of paste0 (or probably also glue, thanks Gabor) is
not
necessarily insignificant. The other reason is style: I think it is cleaner if we can construct
such
a long string literal without the need for a function call. Andreas
Currently, if a string literal spans multiple lines, there is no
way to
inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached
patch
makes the parser ignore a newline character directly following a
backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs)
in
the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please
very
carefully review the attached patch), but I guess this would be
more
work to implement!? What do you think? Is there anybody else who is missing this
feature in
the first place? Regards, Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 <+32%209%20264%2061%2079> Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 (0)9 264 61 79 Joris.Meys at Ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
On 14/06/2017 6:45 AM, Andreas Kersting wrote:
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R.
I don't understand why you require the string to be a literal. Why not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Actually "execution time" is precisely one of the reasons why I would like to see this feature as - depending on the context (e.g. in a tight loop) - the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant.
You also need to consider implementation time. This is not just changes to R itself; trailing backslashes *are* used in some packages (e.g. geoparser), so those packages would need to be identified and modified and resubmitted to CRAN. Core changes to existing behaviour need really strong arguments, and I'm just not seeing those here. Duncan Murdoch
The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call. Andreas
Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
As I recall this has been discussed at least a few times (unfortunately I'm traveling so can't check the references), but the justification was never satisfactory. Personally, I wouldn't mind string continuation supported since it makes for more readable code (I had one of my packages raise a NOTE in examples because there is no way in R to split a long hash into multiple lines), but I would be strongly against random removal of whitespaces as it's counter-intuitive, misleading and makes it impossible to continue spaces on the next line. None of the languages that I can think of with multiline strings do that as that's way too dangerous. Cheers, Simon
On Jun 14, 2017, at 6:58 AM, Andreas Kersting <r-devel at akersting.de> wrote: Hi, I would really like to have a way to split long string literals across multiple lines in R. Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
"aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
"aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
"aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
"aaa \
+ bbb" [1] "aaa bbb"
"aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very carefully review the attached patch), but I guess this would be more work to implement!?
What do you think? Is there anybody else who is missing this feature in the first place?
Regards,
Andreas
<patch.diff>______________________________________________
R-devel at r-project.org mailing list
On Wed, Jun 14, 2017 at 8:48 AM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
As I recall this has been discussed at least a few times (unfortunately I'm traveling so can't check the references), but the justification was never satisfactory. Personally, I wouldn't mind string continuation supported since it makes for more readable code (I had one of my packages raise a NOTE in examples because there is no way in R to split a long hash into multiple lines), but I would be strongly against random removal of whitespaces as it's counter-intuitive, misleading and makes it impossible to continue spaces on the next line. None of the languages that I can think of with multiline strings do that as that's way too dangerous.
Julia does, but uses triple quotes: https://docs.julialang.org/en/stable/manual/strings/#triple-quoted-string-literals Hadley
-------- Original Message -------- From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Sent: Wednesday, Jun 14, 2017 1:36 PM GMT To: Andreas Kersting Cc: r-devel Subject: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
On 14/06/2017 6:45 AM, Andreas Kersting wrote:
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R.
I don't understand why you require the string to be a literal. Why not
construct the long string in an expression like
paste0("aaa",
"bbb")
? Surely the execution time of the paste0 call is negligible.
Duncan Murdoch
Actually "execution time" is precisely one of the reasons why I would like to see this feature as - depending on the context (e.g. in a tight loop) - the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant.
You also need to consider implementation time. This is not just changes to R itself; trailing backslashes *are* used in some packages (e.g. geoparser), so those packages would need to be identified and modified and resubmitted to CRAN.
I am totally with you on this "runtime vs. implementation-time"-issue. That is why I proposed the patch as I did: It seemed to require only minor changes to base R and I didn't see how it could be incompatible with existing code. Actually I can still not see how a package could have potentially *used* backslashes immediately followed by newlines up to now, since those backslashes were just ignored by the parser (And changes to the function StringValue are just about the parser, aren't they?). Of course I cannot rule out the possibility that there is code like var <- "aaa\ bbb" around, but this would be based on the undocumented(?) features that "backslash newline" is a valid escape sequence and that it is treated as "newline". Maybe its a good idea to show some more examples how the patched parser behaves. There should only be difference to the current implementation if a string literal spans multiple lines and a line ends in an odd number of backslashes (see last example): > "aaa\\ + bbb" [1] "aaa\\\nbbb" > "aaa\\nbbb" [1] "aaa\\nbbb" > "aaa\\\nbbb" [1] "aaa\\\nbbb" > "aaa\\" [1] "aaa\\" > "aaa\\\n" [1] "aaa\\\n" > "aaa\\\\" [1] "aaa\\\\" > "aaa\\\\\n" [1] "aaa\\\\\n" > "aaa\\\\ + bbb" [1] "aaa\\\\\nbbb" > "aaa\\\ + bbb" [1] "aaa\\bbb" Andreas
Core changes to existing behaviour need really strong arguments, and I'm just not seeing those here. Duncan Murdoch
The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call. Andreas
Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
> "aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
> "aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
> "aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
> "aaa \
+ bbb" [1] "aaa bbb"
> "aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more
work to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Le 14/06/2017 ? 12:58, Andreas Kersting a ?crit :
Hi,
I would really like to have a way to split long string literals across multiple lines in R.
...
An alternative approach could be to have something like
("aaa "
"bbb")
This is C-style and if the core-team decides to implement it, it could be useful and intuitive.
If you are changing the parser (which is a major change) you
might consider treating strings in the C/C++ way:
char *s = "A"
"B";
means the same as
char *s = "AB";
I am not a big fan of that syntax but it is widely used.
A backslash at the end of the line leads to errors when you accidently
put a space after the backslash and the editor doesn't flag it.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Jun 14, 2017 at 3:58 AM, Andreas Kersting <r-devel at akersting.de>
wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R. Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
"aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
"aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
"aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
"aaa \
+ bbb" [1] "aaa bbb"
"aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more work
to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
I don't think it is reasonable to change the parser this way. This is currently valid R code: a <- "foo" "bar" and with the new syntax, it is also valid, but with a different meaning. Or you can even consider a <- "foo" bar %>% func() %>% print() etc. I like the idea of string literals, but the C/C++ way clearly does not work. The Python/Julia way might, i.e.: """this is a multi-line lineral""" Gabor On Wed, Jun 14, 2017 at 4:12 PM, William Dunlap via R-devel
<r-devel at r-project.org> wrote:
If you are changing the parser (which is a major change) you
might consider treating strings in the C/C++ way:
char *s = "A"
"B";
means the same as
char *s = "AB";
I am not a big fan of that syntax but it is widely used.
A backslash at the end of the line leads to errors when you accidently
put a space after the backslash and the editor doesn't flag it.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Jun 14, 2017 at 3:58 AM, Andreas Kersting <r-devel at akersting.de>
wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R. Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
"aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
"aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
"aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
"aaa \
+ bbb" [1] "aaa bbb"
"aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more work
to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
-------- Original Message -------- From: Hadley Wickham [mailto:h.wickham at gmail.com] Sent: Wednesday, Jun 14, 2017 2:51 PM GMT To: Simon Urbanek Cc: Andreas Kersting; r-devel at r-project.org Subject: [Rd] [WISH / PATCH] possibility to split string literals across multiple lines
On Wed, Jun 14, 2017 at 8:48 AM, Simon Urbanek <simon.urbanek at r-project.org> wrote:
As I recall this has been discussed at least a few times (unfortunately I'm traveling so can't check the references), but the justification was never satisfactory. Personally, I wouldn't mind string continuation supported since it makes for more readable code (I had one of my packages raise a NOTE in examples because there is no way in R to split a long hash into multiple lines), but I would be strongly against random removal of whitespaces as it's counter-intuitive, misleading and makes it impossible to continue spaces on the next line. None of the languages that I can think of with multiline strings do that as that's way too dangerous.
Julia does, but uses triple quotes: https://docs.julialang.org/en/stable/manual/strings/#triple-quoted-string-literals Hadley
If we consider bash a programming language: Here documents (http://tldp.org/LDP/abs/html/here-docs.html) can have leading tabs be removed (see Example 19-4).
1 day later
On Wed, 14 Jun 2017, G?bor Cs?rdi wrote:
I don't think it is reasonable to change the parser this way. This is currently valid R code: a <- "foo" "bar" and with the new syntax, it is also valid, but with a different meaning. Or you can even consider a <- "foo" bar %>% func() %>% print() etc. I like the idea of string literals, but the C/C++ way clearly does not work. The Python/Julia way might, i.e.: """this is a multi-line lineral"""
This does look like a promising option; some more careful checking would be needed to make sure there aren't cases where currently working code would be broken. Another Python idea worth considering is the raw string notation r"xyx" that does not process escape sequences -- this would make writing things like regular expressions easier. Best, luke
Gabor On Wed, Jun 14, 2017 at 4:12 PM, William Dunlap via R-devel <r-devel at r-project.org> wrote:
If you are changing the parser (which is a major change) you
might consider treating strings in the C/C++ way:
char *s = "A"
"B";
means the same as
char *s = "AB";
I am not a big fan of that syntax but it is widely used.
A backslash at the end of the line leads to errors when you accidently
put a space after the backslash and the editor doesn't flag it.
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Jun 14, 2017 at 3:58 AM, Andreas Kersting <r-devel at akersting.de>
wrote:
Hi, I would really like to have a way to split long string literals across multiple lines in R. Currently, if a string literal spans multiple lines, there is no way to inhibit the introduction of newline characters:
"aaa
+ bbb" [1] "aaa\nbbb" If a line ends with a backslash, it is just ignored:
"aaa\
+ bbb" [1] "aaa\nbbb" We could use this fact to implement string splitting in a fairly backward-compatible way, since currently such trailing backslashes should hardly be used as they do not have any effect. The attached patch makes the parser ignore a newline character directly following a backslash:
"aaa\
+ bbb" [1] "aaabbb" I personally would also prefer if leading blanks (spaces and tabs) in the second line are ignored to allow for proper indentation:
"aaa \
+ bbb" [1] "aaa bbb"
"aaa\
+ \ bbb"
[1] "aaa bbb"
This is also implemented by this patch.
An alternative approach could be to have something like
("aaa "
"bbb")
or
("aaa ",
"bbb")
be interpreted as "aaa bbb".
I don't know the ins and outs of the parser of R (hence: please very
carefully review the attached patch), but I guess this would be more work
to implement!?
What do you think? Is there anybody else who is missing this feature in
the first place?
Regards,
Andreas
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[[alternative HTML version deleted]]
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney at uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
I don't think it is reasonable to change the parser this way. This is currently valid R code: a <- "foo" "bar" and with the new syntax, it is also valid, but with a different meaning. Or you can even consider a <- "foo" bar %>% func() %>% print() etc. I like the idea of string literals, but the C/C++ way clearly does not work. The Python/Julia way might, i.e.: """this is a multi-line lineral"""
This does look like a promising option; some more careful checking would be needed to make sure there aren't cases where currently working code would be broken. Another Python idea worth considering is the raw string notation r"xyx" that does not process escape sequences -- this would make writing things like regular expressions easier.
If this is something you would consider, we'd be happy to put together a patch for review. Hadley