Skip to content

[WISH / PATCH] possibility to split string literals across multiple lines

18 messages · Duncan Murdoch, Gábor Csárdi, Mark van der Loo +7 more

#
Hi,

I would really like to have a way to split long string literals across 
multiple lines in R.

Currently, if a string literal spans multiple lines, there is no way to 
inhibit the introduction of newline characters:

 > "aaa
+ bbb"
[1] "aaa\nbbb"


If a line ends with a backslash, it is just ignored:

 > "aaa\
+ bbb"
[1] "aaa\nbbb"


We could use this fact to implement string splitting in a fairly 
backward-compatible way, since currently such trailing backslashes 
should hardly be used as they do not have any effect. The attached patch 
makes the parser ignore a newline character directly following a backslash:

 > "aaa\
+ bbb"
[1] "aaabbb"


I personally would also prefer if leading blanks (spaces and tabs) in 
the second line are ignored to allow for proper indentation:

 >   "aaa \
+    bbb"
[1] "aaa bbb"

 >   "aaa\
+    \ bbb"
[1] "aaa bbb"

This is also implemented by this patch.


An alternative approach could be to have something like

("aaa "
"bbb")

or

("aaa ",
"bbb")

be interpreted as "aaa bbb".

I don't know the ins and outs of the parser of R (hence: please very 
carefully review the attached patch), but I guess this would be more 
work to implement!?


What do you think? Is there anybody else who is missing this feature in 
the first place?

Regards,
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.diff
Type: text/x-patch
Size: 2598 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170614/7986e543/attachment.bin>
#
On 14/06/2017 5:58 AM, Andreas Kersting wrote:
I don't understand why you require the string to be a literal.  Why not 
construct the long string in an expression like

  paste0("aaa",
         "bbb")

?  Surely the execution time of the paste0 call is negligible.

Duncan Murdoch
#
On Wed, Jun 14, 2017 at 12:12 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
You can also look at the glue package, it supports continuation and a lot more:

glue("
    A formatted string \\
    can also be on a \\
    single line
    ")
#> A formatted string can also be on a single line

Gabor

[...]
#
On Wed, 14 Jun 2017 06:12:09 -0500, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:

            
Actually "execution time" is precisely one of the reasons why I would like to see this feature as - depending on the context (e.g. in a tight loop) - the execution time of paste0 (or probably also glue, thanks Gabor) is not necessarily insignificant. 

The other reason is style: I think it is cleaner if we can construct such a long string literal without the need for a function call.

Andreas
#
Having some line-breaking character for string literals would have benefits
as string literals can then be constructed parse-time rather than run-time.
I have run into this myself a few times as well. One way to at least
emulate something like that is the following.

`%+%` <- function(x,y) paste0(x,y)

"hello" %+%
  " pretty" %+%
  " world"


-Mark



Op wo 14 jun. 2017 om 13:53 schreef Andreas Kersting <r-devel at akersting.de>:

  
  
#
Mark, that's actually a fair statement, although your extra operator
doesn't cause construction at parse time. You still call paste0(), but just
add an extra layer on top of it.

I also doubt that even in gigantic loops the benefit is going to be
significant. Take following example:

atestfun <- function(x){
  y <- paste0("a very long",
         "string for testing")
  grep(x, y)
}
atestfun2 <- function(x){
  y <- "a very long
string for testing"
  grep(x,y)
}
cfun <- cmpfun(atestfun)
cfun2 <- cmpfun(atestfun2)

require(rbenchmark)
benchmark(atestfun("a"),
          atestfun2("a"),
          cfun("a"),
          cfun2("a"),
          replications = 100000)

Which gives after 100,000 replications:

            test replications elapsed relative
1  atestfun("a")       100000    0.83    1.339
2 atestfun2("a")       100000    0.62    1.000
3      cfun("a")       100000    0.81    1.306
4     cfun2("a")       100000    0.62    1.000

The patch can in principle make similar code marginally faster, but I'm not
convinced the patch is going to make any real difference except for in some
very specific and exotic cases. Even more, calling a function like the
examples inside the loop is the only way I can come up with where this
might be a problem. If you just construct the string inside the loop,
there's two possibilities:

- the string does not need to change, and then you better construct it
outside of the loop
- the string does need to change, and then you need paste() or paste0()
anyway

I'm not against incorporating the patch, as it would eliminate a few
keystrokes. It's a neat idea, but I don't expect any other noticeable
advantage from it.

my humble 2 cents
Cheers
Joris

On Wed, Jun 14, 2017 at 2:00 PM, Mark van der Loo <mark.vanderloo at gmail.com>
wrote:

  
    
#
I know it doesn't cause construction at parse time, and it was also not
what I said. What I meant was that it makes the syntax at least look a
little as if you have a line-breaking character within string literals.

Op wo 14 jun. 2017 om 14:18 schreef Joris Meys <jorismeys at gmail.com>:

  
  
#
Hi Mark,

I got you. I just pointed out the obvious to illustrate why your emulation
didn't eliminate the need for the real thing. I didn't mean to imply you
weren't aware of this, even though it may seem so. Sometimes I'm not 100%
aware of the subtleties of the English language. This seems one of those
cases.

Met vriendelijke groeten
Joris

On Wed, Jun 14, 2017 at 2:23 PM, Mark van der Loo <mark.vanderloo at gmail.com>
wrote:

  
    
#
On 14/06/2017 6:45 AM, Andreas Kersting wrote:
You also need to consider implementation time.  This is not just changes 
to R itself; trailing backslashes *are* used in some packages (e.g. 
geoparser), so those packages would need to be identified and modified 
and resubmitted to CRAN.

Core changes to existing behaviour need really strong arguments, and I'm 
just not seeing those here.

Duncan Murdoch
#
As I recall this has been discussed at least a few times (unfortunately I'm traveling so can't check the references), but the justification was never satisfactory.

Personally, I wouldn't mind string continuation supported since it makes for more readable code (I had one of my packages raise a NOTE in examples because there is no way in R to split a long hash into multiple lines), but I would be strongly against random removal of whitespaces as it's counter-intuitive, misleading and makes it impossible to continue spaces on the next line. None of the languages that I can think of with multiline strings do that as that's way too dangerous.

Cheers,
Simon
#
On Wed, Jun 14, 2017 at 8:48 AM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
Julia does, but uses triple quotes:
https://docs.julialang.org/en/stable/manual/strings/#triple-quoted-string-literals

Hadley
#
-------- Original Message --------
From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com]
Sent: Wednesday, Jun 14, 2017 1:36 PM GMT
To: Andreas Kersting
Cc: r-devel
Subject: [Rd] [WISH / PATCH] possibility to split string literals across 
multiple lines
I am totally with you on this "runtime vs. implementation-time"-issue. 
That is why I proposed the patch as I did: It seemed to require only 
minor changes to base R and I didn't see how it could be incompatible 
with existing code.

Actually I can still not see how a package could have potentially *used* 
backslashes immediately followed by newlines up to now, since those 
backslashes were just ignored by the parser (And changes to the function 
StringValue are just about the parser, aren't they?). Of course I cannot 
rule out the possibility that there is code like
var <- "aaa\
bbb"
around, but this would be based on the undocumented(?) features that 
"backslash newline" is a valid escape sequence and that it is treated as 
"newline".

Maybe its a good idea to show some more examples how the patched parser 
behaves. There should only be difference to the current implementation 
if a string literal spans multiple lines and a line ends in an odd 
number of backslashes (see last example):

 > "aaa\\
+ bbb"
[1] "aaa\\\nbbb"

 > "aaa\\nbbb"
[1] "aaa\\nbbb"

 > "aaa\\\nbbb"
[1] "aaa\\\nbbb"

 > "aaa\\"
[1] "aaa\\"

 > "aaa\\\n"
[1] "aaa\\\n"

 > "aaa\\\\"
[1] "aaa\\\\"

 > "aaa\\\\\n"
[1] "aaa\\\\\n"

 > "aaa\\\\
+ bbb"
[1] "aaa\\\\\nbbb"

 > "aaa\\\
+ bbb"
[1] "aaa\\bbb"

Andreas
#
Le 14/06/2017 ? 12:58, Andreas Kersting a ?crit :
This is C-style and if the core-team decides to implement it,
it could be useful and intuitive.
#
If you are changing the parser (which is a major change) you
might consider treating strings in the C/C++ way:
   char *s = "A"
                   "B";
means the same as
   char *s = "AB";

I am not a big fan of that syntax but it is widely used.

A backslash at the end of the line leads to errors when you accidently
put a space after the backslash and the editor doesn't flag it.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jun 14, 2017 at 3:58 AM, Andreas Kersting <r-devel at akersting.de>
wrote:

  
  
#
I don't think it is reasonable to change the parser this way. This is
currently valid R code:

a <- "foo"
"bar"

and with the new syntax, it is also valid, but with a different
meaning. Or you can even consider

a <- "foo"
bar %>% func() %>% print()

etc.

I like the idea of string literals, but the C/C++ way clearly does not
work. The Python/Julia way might, i.e.:

"""this is a
multi-line
lineral"""

Gabor

On Wed, Jun 14, 2017 at 4:12 PM, William Dunlap via R-devel
<r-devel at r-project.org> wrote:
#
-------- Original Message --------
From: Hadley Wickham [mailto:h.wickham at gmail.com]
Sent: Wednesday, Jun 14, 2017 2:51 PM GMT
To: Simon Urbanek
Cc: Andreas Kersting; r-devel at r-project.org
Subject: [Rd] [WISH / PATCH] possibility to split string literals across 
multiple lines
If we consider bash a programming language: Here documents 
(http://tldp.org/LDP/abs/html/here-docs.html) can have leading tabs be 
removed (see Example 19-4).
1 day later
#
On Wed, 14 Jun 2017, G?bor Cs?rdi wrote:

            
This does look like a promising option; some more careful checking
would be needed to make sure there aren't cases where currently
working code would be broken.

Another Python idea worth considering is the raw string notation
r"xyx" that does not process escape sequences -- this would make
writing things like regular expressions easier.

Best,

luke

  
    
#
If this is something you would consider, we'd be happy to put together
a patch for review.

Hadley