Skip to content

"+" operator on characters revisited

13 messages · Hadley Wickham, Peter Dalgaard, Duncan Murdoch +4 more

#
Hello everyone!

Motivated by the recent post on SO
http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r

I wonder what is the current state of argument on making "+" to
concatenate character vectors. The "+" method is still sealed for
signature("character", "character") in the current version of R.

The 4 years old R-devel thread
https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html
on the same topic, stopped without reaching any definite conclusion.

The only definite argument occurred in the thread against "+" operator
was the lack of commutativity (as if one have to prove algebraic
theorems in R).

Yet another useful suggestion of introducing cat0() and paste0(), for
the common use of cat and paste with sep="" was not absorbed by the
core R either.

Thanks,
Vitalie
#
On Sat, Jan 22, 2011 at 3:08 PM, Vitalie S. <spinuvit.list at gmail.com> wrote:
The gsubfn package has always had a paste0 function and I would be
happy to remove it if the core adds it.

Also the gsubfn supports quasi perl style string interpolation that
can sometimes be used to avoid the use of paste in the first place.
Just preface the function in question by fn$ like this:

library(gsubfn)
fn$cat("pi = $pi\n")
#
Gabor Grothendieck <ggrothendieck at gmail.com> writes:
Thanks for the tip. Not bad indeed.
Almost as readable as

cat("pi = " + pi + "\n")
#
On Sun, Jan 23, 2011 at 6:56 AM, Vitalie S. <spinuvit.list at gmail.com> wrote:
To me the + can be substantially less readable.  The need to
repeatedly quote everything makes it just as bad as paste.  Compare
the following and try to figure out if there is an error in quoting in
the + and paste solutions.  Trying to distinguish the single and
double quotes is pretty difficult but simple in the fn$ and sprintf
solutions.  Even if there were no quotes the constant need to
interpose quotes makes it hard to read.

library(sqldf) # also pulls in gsubfn which has fn$ and paste0
plant <- "Qn1"
treatment <- "nonchilled"

# using +
# sqldf("select * from CO2 where Plant = '" + plant + "' and Treatment
= '" + treatment + "' limit 10")

# using paste0, also from gsubfn
sqldf(paste0("select * from CO2 where Plant = '", plant, "' and
Treatment = '", treatment, "' limit 10"))

# using paste, almost the same as last one
sqldf(paste("select * from CO2 where Plant = '", plant, "' and
Treatment = '", treatment, "' limit 10", sep = ""))

# With the perl-like interpolation you don't need the repeated quoting
in the first place so its much clearer.

# using perl-like interpolation from gsubfn
fn$sqldf("select * from CO2 where Plant = '$plant' and Treatment =
'$treatment' limit 10")

# sprintf is nearly as good as the perl-like interpolation except you
have to match up % codes and arguments which is a bit of nuisance #
and there are more parentheses.  On the other hand it does have the
advantage that there is the facility for fancier formatting codes
# (though this example does not illustrate that aspect):

# using sprintf
sqldf(sprintf("select * from CO2 where Plant = '%s' and Treatment =
'%s' limit 10", plant, treatment))
#
stringr has str_c which is a replacement for paste with sep = "" and
automatic removal of length 0 inputs.

Hadley
#
On Jan 22, 2011, at 21:08 , Vitalie S. wrote:

            
I think the real killer was associativity, combined with coercion rules: 

Is "x"+1+2 supposed to be equal to "x12" or "x3"?
#
Consider the following from Python 2.6.5:


 >>> 'abc'+ 2

Traceback (most recent call last):
   File "<pyshell#0>", line 1, in <module>
     'abc'+ 2
TypeError: cannot concatenate 'str' and 'int' objects
 >>> 'abc'+'2'
'abc2'
 >>>


       Spencer
On 1/23/2011 8:09 AM, Hadley Wickham wrote:

  
    
#
On 1/23/2011 8:50 AM, peter dalgaard wrote:
Excellent:  This seems like a good reason to follow Python:  
Allow "a+b" with a character vector "a" only if "b" is also a character 
vector (or factor?).


       This example raises another question:  If we allow "a+b" for "a" 
and "b" both character vectors (and give an error if one is numeric), 
what do we do with factors?  If "a" is a factor, return a factor?


       Spencer
#
On 23/01/2011 11:50 AM, peter dalgaard wrote:
As I pointed out at the time, we don't even have associativity for 
integer addition.  For example in

-1L + .Machine$integer.max + 1L

the two possibilities

(-1L + .Machine$integer.max) + 1L

and

-1L + (.Machine$integer.max + 1L)

give different results.  When I try it now without parentheses, I get 
the same answer as the first one, but I don't believe we guarantee that 
that will always be so.

Duncan Murdoch
#
Spencer Graves <spencer.graves at structuremonitoring.com> writes:
If we define custom %+% as:

    `%+%` <- function(a, b){
        if(is.character(a) || is.character(b))
            paste(as.character(a), as.character(b), sep="")
        else
            a + b
    }

because of higher precedence of %any% operators over binary + we have:

    "a" %+% 1 %+% 2
    ## [1] "a12"

and

   str("a" %+% factor(1:2))
   ## chr [1:2] "a1" "a2"

so if + on characters would behave "as if" having slightly higher priority than
other + operators that might solve reasonably the problem. 

Vitalie.
#
On 1/23/2011 12:15 PM, Vitalie S. wrote:
No:  'a' %+% (1 %+%2)  != ('a' %+% 1) %+% 2, as Peter Dalgaard noted:  
'a3' != 'a12'.
#
On 2011-01-23, at 4:34 AM, Gabor Grothendieck wrote:

            
That may be a matter of taste, but FWIW it seems that shell-style string interpolation (using the dollar prefix) has going out of style in recent scripting languages. Ruby uses the expression substitution construct ("#{expr}"), while Python has "str.format", both allowing arbitrary expressions.

And most editors have syntax highlighting that distinguishes strings from other program elements. This makes quoting errors pretty obvious.

Davor
#
On Mon, Jan 24, 2011 at 2:15 PM, Davor Cubranic <cubranic at stat.ubc.ca> wrote:
fn$ supports that too using `...`
Time demand
1    3     19
2    4     16
That only makes it slightly easier to handle the mess.  Its better to
get rid of the quotes in the first place.