Hello everyone! Motivated by the recent post on SO http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r I wonder what is the current state of argument on making "+" to concatenate character vectors. The "+" method is still sealed for signature("character", "character") in the current version of R. The 4 years old R-devel thread https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html on the same topic, stopped without reaching any definite conclusion. The only definite argument occurred in the thread against "+" operator was the lack of commutativity (as if one have to prove algebraic theorems in R). Yet another useful suggestion of introducing cat0() and paste0(), for the common use of cat and paste with sep="" was not absorbed by the core R either. Thanks, Vitalie
"+" operator on characters revisited
13 messages · Hadley Wickham, Peter Dalgaard, Duncan Murdoch +4 more
On Sat, Jan 22, 2011 at 3:08 PM, Vitalie S. <spinuvit.list at gmail.com> wrote:
Hello everyone! Motivated by the recent post on SO http://stackoverflow.com/questions/4730551/making-a-string-concatenation-operator-in-r I wonder what is the current state of argument on making "+" to concatenate character vectors. The "+" method is still sealed for signature("character", "character") in the current version of R. The 4 years old R-devel thread https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html on the same topic, stopped without reaching any definite conclusion. The only definite argument occurred in the thread against "+" operator was the lack of commutativity (as if one have to prove algebraic theorems in R). Yet another useful suggestion of introducing cat0() and paste0(), for the common use of cat and paste with sep="" was not absorbed by the core R either.
The gsubfn package has always had a paste0 function and I would be
happy to remove it if the core adds it.
Also the gsubfn supports quasi perl style string interpolation that
can sometimes be used to avoid the use of paste in the first place.
Just preface the function in question by fn$ like this:
library(gsubfn)
fn$cat("pi = $pi\n")
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Gabor Grothendieck <ggrothendieck at gmail.com> writes:
On Sat, Jan 22, 2011 at 3:08 PM, Vitalie S. <spinuvit.list at gmail.com> wrote:
Hello everyone! Motivated by the recent post on SO
I wonder what is the current state of argument on making "+" to
concatenate character vectors. The "+" method is still sealed for
signature("character", "character") in the current version of R.
The 4 years old R-devel thread
https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html>
on the same topic, stopped without reaching any definite conclusion.
The only definite argument occurred in the thread against "+" operator
was the lack of commutativity (as if one have to prove algebraic
theorems in R).
Yet another useful suggestion of introducing cat0() and paste0(), for
the common use of cat and paste with sep="" was not absorbed by the
core R either.
The gsubfn package has always had a paste0 function and I would be
happy to remove it if the core adds it.
Also the gsubfn supports quasi perl style string interpolation that
can sometimes be used to avoid the use of paste in the first place.
Just preface the function in question by fn$ like this:
library(gsubfn)
fn$cat("pi = $pi\n")
Thanks for the tip. Not bad indeed.
Almost as readable as
cat("pi = " + pi + "\n")
On Sun, Jan 23, 2011 at 6:56 AM, Vitalie S. <spinuvit.list at gmail.com> wrote:
Gabor Grothendieck <ggrothendieck at gmail.com> writes:
On Sat, Jan 22, 2011 at 3:08 PM, Vitalie S. <spinuvit.list at gmail.com> wrote:
Hello everyone! Motivated by the recent post on SO
I wonder what is the current state of argument on making "+" to
concatenate character vectors. The "+" method is still sealed for
signature("character", "character") in the current version of R.
The 4 years old R-devel thread
https://www.stat.math.ethz.ch/pipermail/r-devel/2006-August/038991.html>
on the same topic, stopped without reaching any definite conclusion.
The only definite argument occurred in the thread against "+" operator
was the lack of commutativity (as if one have to prove algebraic
theorems in R).
Yet another useful suggestion of introducing cat0() and paste0(), for
the common use of cat and paste with sep="" was not absorbed by the
core R either.
The gsubfn package has always had a paste0 function and I would be
happy to remove it if the core adds it.
Also the gsubfn supports quasi perl style string interpolation that
can sometimes be used to avoid the use of paste in the first place.
Just preface the function in question by fn$ like this:
library(gsubfn)
fn$cat("pi = $pi\n")
Thanks for the tip. Not bad indeed.
Almost as readable as
cat("pi = " + pi + "\n")
To me the + can be substantially less readable. The need to
repeatedly quote everything makes it just as bad as paste. Compare
the following and try to figure out if there is an error in quoting in
the + and paste solutions. Trying to distinguish the single and
double quotes is pretty difficult but simple in the fn$ and sprintf
solutions. Even if there were no quotes the constant need to
interpose quotes makes it hard to read.
library(sqldf) # also pulls in gsubfn which has fn$ and paste0
plant <- "Qn1"
treatment <- "nonchilled"
# using +
# sqldf("select * from CO2 where Plant = '" + plant + "' and Treatment
= '" + treatment + "' limit 10")
# using paste0, also from gsubfn
sqldf(paste0("select * from CO2 where Plant = '", plant, "' and
Treatment = '", treatment, "' limit 10"))
# using paste, almost the same as last one
sqldf(paste("select * from CO2 where Plant = '", plant, "' and
Treatment = '", treatment, "' limit 10", sep = ""))
# With the perl-like interpolation you don't need the repeated quoting
in the first place so its much clearer.
# using perl-like interpolation from gsubfn
fn$sqldf("select * from CO2 where Plant = '$plant' and Treatment =
'$treatment' limit 10")
# sprintf is nearly as good as the perl-like interpolation except you
have to match up % codes and arguments which is a bit of nuisance #
and there are more parentheses. On the other hand it does have the
advantage that there is the facility for fancier formatting codes
# (though this example does not illustrate that aspect):
# using sprintf
sqldf(sprintf("select * from CO2 where Plant = '%s' and Treatment =
'%s' limit 10", plant, treatment))
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
Yet another useful suggestion of introducing cat0() and paste0(), for the common use of cat and paste with sep="" was not absorbed by the core R either.
stringr has str_c which is a replacement for paste with sep = "" and automatic removal of length 0 inputs. Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/
On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
The only definite argument occurred in the thread against "+" operator was the lack of commutativity (as if one have to prove algebraic theorems in R).
I think the real killer was associativity, combined with coercion rules: Is "x"+1+2 supposed to be equal to "x12" or "x3"?
Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Consider the following from Python 2.6.5:
>>> 'abc'+ 2
Traceback (most recent call last):
File "<pyshell#0>", line 1, in <module>
'abc'+ 2
TypeError: cannot concatenate 'str' and 'int' objects
>>> 'abc'+'2'
'abc2'
>>>
Spencer
On 1/23/2011 8:09 AM, Hadley Wickham wrote:
Yet another useful suggestion of introducing cat0() and paste0(), for the common use of cat and paste with sep="" was not absorbed by the core R either.
stringr has str_c which is a replacement for paste with sep = "" and automatic removal of length 0 inputs. Hadley
Spencer Graves, PE, PhD President and Chief Operating Officer Structure Inspection and Monitoring, Inc. 751 Emerson Ct. San Jos?, CA 95126 ph: 408-655-4567
On 1/23/2011 8:50 AM, peter dalgaard wrote:
On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
The only definite argument occurred in the thread against "+" operator was the lack of commutativity (as if one have to prove algebraic theorems in R).
I think the real killer was associativity, combined with coercion rules: Is "x"+1+2 supposed to be equal to "x12" or "x3"?
Excellent: This seems like a good reason to follow Python:
Allow "a+b" with a character vector "a" only if "b" is also a character
vector (or factor?).
This example raises another question: If we allow "a+b" for "a"
and "b" both character vectors (and give an error if one is numeric),
what do we do with factors? If "a" is a factor, return a factor?
Spencer
On 23/01/2011 11:50 AM, peter dalgaard wrote:
On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
The only definite argument occurred in the thread against "+" operator was the lack of commutativity (as if one have to prove algebraic theorems in R).
I think the real killer was associativity, combined with coercion rules: Is "x"+1+2 supposed to be equal to "x12" or "x3"?
As I pointed out at the time, we don't even have associativity for integer addition. For example in -1L + .Machine$integer.max + 1L the two possibilities (-1L + .Machine$integer.max) + 1L and -1L + (.Machine$integer.max + 1L) give different results. When I try it now without parentheses, I get the same answer as the first one, but I don't believe we guarantee that that will always be so. Duncan Murdoch
Spencer Graves <spencer.graves at structuremonitoring.com> writes:
On 1/23/2011 8:50 AM, peter dalgaard wrote:
On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
The only definite argument occurred in the thread against "+" operator was the lack of commutativity (as if one have to prove algebraic theorems in R).
I think the real killer was associativity, combined with coercion rules: Is "x"+1+2 supposed to be equal to "x12" or "x3"?
Excellent: This seems like a good reason to follow Python: Allow "a+b" with a character vector "a" only if
"b" is also a character vector (or factor?).
This example raises another question: If we allow "a+b" for "a" and "b" both character vectors (and give an
error if one is numeric), what do we do with factors? If "a" is a factor,
return a factor?
If we define custom %+% as:
`%+%` <- function(a, b){
if(is.character(a) || is.character(b))
paste(as.character(a), as.character(b), sep="")
else
a + b
}
because of higher precedence of %any% operators over binary + we have:
"a" %+% 1 %+% 2
## [1] "a12"
and
str("a" %+% factor(1:2))
## chr [1:2] "a1" "a2"
so if + on characters would behave "as if" having slightly higher priority than
other + operators that might solve reasonably the problem.
Vitalie.
On 1/23/2011 12:15 PM, Vitalie S. wrote:
Spencer Graves<spencer.graves at structuremonitoring.com> writes:
On 1/23/2011 8:50 AM, peter dalgaard wrote:
On Jan 22, 2011, at 21:08 , Vitalie S. wrote:
The only definite argument occurred in the thread against "+" operator was the lack of commutativity (as if one have to prove algebraic theorems in R).
I think the real killer was associativity, combined with coercion rules: Is "x"+1+2 supposed to be equal to "x12" or "x3"?
Excellent: This seems like a good reason to follow Python: Allow "a+b" with a character vector "a" only if
"b" is also a character vector (or factor?).
This example raises another question: If we allow "a+b" for "a" and "b" both character vectors (and give an
error if one is numeric), what do we do with factors? If "a" is a factor,
return a factor?
If we define custom %+% as:
`%+%`<- function(a, b){
if(is.character(a) || is.character(b))
paste(as.character(a), as.character(b), sep="")
else
a + b
}
because of higher precedence of %any% operators over binary + we have:
"a" %+% 1 %+% 2
## [1] "a12"
and
str("a" %+% factor(1:2))
## chr [1:2] "a1" "a2"
so if + on characters would behave "as if" having slightly higher priority than
other + operators that might solve reasonably the problem.
Vitalie.
No: 'a' %+% (1 %+%2) != ('a' %+% 1) %+% 2, as Peter Dalgaard noted:
'a3' != 'a12'.
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
On 2011-01-23, at 4:34 AM, Gabor Grothendieck wrote:
On Sun, Jan 23, 2011 at 6:56 AM, Vitalie S. <spinuvit.list at gmail.com> wrote:
Gabor Grothendieck <ggrothendieck at gmail.com> writes:
Also the gsubfn supports quasi perl style string interpolation that
can sometimes be used to avoid the use of paste in the first place.
Just preface the function in question by fn$ like this:
library(gsubfn)
fn$cat("pi = $pi\n")
Thanks for the tip. Not bad indeed.
Almost as readable as
cat("pi = " + pi + "\n")
To me the + can be substantially less readable. The need to repeatedly quote everything makes it just as bad as paste. Compare the following and try to figure out if there is an error in quoting in the + and paste solutions. Trying to distinguish the single and double quotes is pretty difficult but simple in the fn$ and sprintf solutions. Even if there were no quotes the constant need to interpose quotes makes it hard to read.
That may be a matter of taste, but FWIW it seems that shell-style string interpolation (using the dollar prefix) has going out of style in recent scripting languages. Ruby uses the expression substitution construct ("#{expr}"), while Python has "str.format", both allowing arbitrary expressions.
And most editors have syntax highlighting that distinguishes strings from other program elements. This makes quoting errors pretty obvious.
Davor
On Mon, Jan 24, 2011 at 2:15 PM, Davor Cubranic <cubranic at stat.ubc.ca> wrote:
On 2011-01-23, at 4:34 AM, Gabor Grothendieck wrote:
On Sun, Jan 23, 2011 at 6:56 AM, Vitalie S. <spinuvit.list at gmail.com> wrote:
Gabor Grothendieck <ggrothendieck at gmail.com> writes:
Also the gsubfn supports quasi perl style string interpolation that
can sometimes be used to avoid the use of paste in the first place.
Just preface the function in question by fn$ like this:
library(gsubfn)
fn$cat("pi = $pi\n")
Thanks for the tip. Not bad indeed.
Almost as readable as
cat("pi = " + pi + "\n")
To me the + can be substantially less readable. ?The need to repeatedly quote everything makes it just as bad as paste. ?Compare the following and try to figure out if there is an error in quoting in the + and paste solutions. ?Trying to distinguish the single and double quotes is pretty difficult but simple in the fn$ and sprintf solutions. ?Even if there were no quotes the constant need to interpose quotes makes it hard to read.
That may be a matter of taste, but FWIW it seems that shell-style string interpolation (using the dollar prefix) has going out of style in recent scripting languages. Ruby uses the expression substitution construct ("#{expr}"), while Python has "str.format", both allowing arbitrary expressions.
fn$ supports that too using `...`
library(sqldf)
fn$sqldf("select * from BOD where demand > `mean(BOD$demand)` limit 2")
Time demand 1 3 19 2 4 16
And most editors have syntax highlighting that distinguishes strings from other program elements. This makes quoting errors pretty obvious.
That only makes it slightly easier to handle the mess. Its better to get rid of the quotes in the first place.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com