Thanks a lot
Jannis
----- Urspr?ngliche Message -----
Von: Sarah Goslee<sarah.goslee at gmail.com>
An: Duncan Murdoch<murdoch.duncan at gmail.com>
Cc: Jannis<bt_jannis at yahoo.de>; "r-help at r-project.org"<r-help at r-project.org>
Gesendet: 15:37 Freitag, 9.Dezember 2011
Betreff: Re: [R] unexpected behaviour of sub() / usage of regexp
But I do get the incorrect result on R 2.14.0 on linux:
sub('[[:digit:]]{1,2}', '', '9ewww')
sub('[[:digit:]]{1,2}', '', '9ewww')
sub('[[:digit:]]{1,2}', '', 'ewww9')
sub('\\d{1,2}', '', 'ewww9')
[1] "ewww"
So it seems to be something about the way the curly braces are
handled, but only with certain groups:
sub('e{1,2}', '', '9ewww')
sub('9{1,2}', '', '9ewww')
[1] "ewww"
But, as Prof. Ripley's email suggests, perl=TRUE solves the problem.
(I was trying out various combinations when it appeared in my inbox.)
R version 2.14.0 (2011-10-31)
Platform: x86_64-redhat-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
On Fri, Dec 9, 2011 at 9:25 AM, Duncan Murdoch<murdoch.duncan at gmail.com> wrote:
On 09/12/2011 9:20 AM, Jannis wrote:
Dear R users,
the way I understand the documentation of sub() and regexp the following
code:
sub('[[:digit:]]{1,2}', '', '9ewww')
... should yield:
'ewww'
It returns, however:
'www'
Why is this the case? My code should just substitute 1 (minimum) or up to
2 (maximum) digits, i.e. numbers and not the 'e' in the string. Do I
misinterpret something here?
I get your expected output of "ewww" running 2.14.0 or 2.14.0-patched on
Windows. So it's not a universal problem...
Duncan Murdoch
Thanks for any ideas
Jannis
R version 2.14.0 (2011-10-31)
Platform: i686-pc-linux-gnu (32-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3]
LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5]
LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C
LC_NAME=C [9] LC_ADDRESS=C
LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base