unexpected behaviour of sub() / usage of regexp
But I do get the incorrect result on R 2.14.0 on linux:
sub('[[:digit:]]{1,2}', '', '9ewww')
[1] "www" And also:
sub('[[:digit:]]{1,2}', '', '9ewww')
[1] "www"
sub('[[:digit:]]{1,2}', '', 'ewww9')
[1] "ww9"
sub('\\d{1,2}', '', 'ewww9')
[1] "ww9" But:
sub('\\d', '', 'ewww9')
[1] "ewww"
sub('\\d*', '', '9ewww')
[1] "ewww" So it seems to be something about the way the curly braces are handled, but only with certain groups:
sub('e{1,2}', '', '9ewww')
[1] "9www"
sub('9{1,2}', '', '9ewww')
[1] "ewww" But, as Prof. Ripley's email suggests, perl=TRUE solves the problem. (I was trying out various combinations when it appeared in my inbox.)
sessionInfo()
R version 2.14.0 (2011-10-31) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base
On Fri, Dec 9, 2011 at 9:25 AM, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:
On 09/12/2011 9:20 AM, Jannis wrote:
Dear R users,
the way I understand the documentation of sub() and regexp the following
code:
sub('[[:digit:]]{1,2}', '', '9ewww')
... should yield:
'ewww'
It returns, however:
'www'
Why is this the case? My code should just substitute 1 (minimum) or up to
2 (maximum) digits, i.e. numbers and not the 'e' in the string. Do I
misinterpret something here?
I get your expected output of "ewww" running 2.14.0 or 2.14.0-patched on Windows. ? So it's not a universal problem... Duncan Murdoch
Thanks for any ideas Jannis
?sessionInfo()
R version 2.14.0 (2011-10-31) Platform: i686-pc-linux-gnu (32-bit) locale: ?[1] LC_CTYPE=en_US.UTF-8 ? ? ? LC_NUMERIC=C ? ? ? ? ? ? ? ?[3] LC_TIME=en_US.UTF-8 ? ? ? ?LC_COLLATE=en_US.UTF-8 ? ? ?[5] LC_MONETARY=en_US.UTF-8 ? ?LC_MESSAGES=en_US.UTF-8 ? ? [7] LC_PAPER=C ? ? ? ? ? LC_NAME=C ? ? ? ? ? ? ? ? ? [9] LC_ADDRESS=C LC_TELEPHONE=C ? ? ? ? ? ?[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base
Sarah Goslee http://www.functionaldiversity.org