Skip to content

regex find anything which is not a number

6 messages · John McKown, Steve Taylor, Adrian Dusa

#
Hi everyone,

I need a regular expression to find those positions in a character
vector which contain something which is not a number (either positive
or negative, having decimals or not).

myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.")

In this vector, only positions 3 and 4 are numbers, the rest should be captured.
So far I am able to detect anything which is not a number, excluding - and .
[1] 1 2

I still need to capture positions 5 and 6, which in human language
would mean to detect anything which contains a "-" or a "." anywhere
else except at the beginning of a number.

Thanks very much in advance,
Adrian
#
See if the following will work for you:

grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
[1] 1 2 5 6
The key is to match a number, and then invert the TRUE / FALSE (invert=TRUE).
^ == start of string
-? == 0 or 1 minus signs
[0-9]+ == one or more digits

optionally followed by the following via use of (...)?
[.] == an actual period. I tried to escape this, but it failed
[0-9]+ == followed by one or more digits

$ == followed by the end of the string.

so: optional minus, followed by one or more digits, optionally
followed by (a period with one or more ending digits).
On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote:

  
    
#
Perfect, perfect, perfect.
Thanks very much, John.
Adrian

On Wed, Mar 11, 2015 at 10:00 PM, John McKown
<john.archie.mckown at gmail.com> wrote:

  
    
#
How about letting a standard function decide which are numbers:

which(!is.na(suppressWarnings(as.numeric(myvector))))

Also works with numbers in scientific notation and (presumably) different decimal characters, e.g. comma if that's what the locale uses.


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Adrian Du?a
Sent: Thursday, 12 March 2015 8:27a
To: r-help at r-project.org
Subject: [R] regex find anything which is not a number

Hi everyone,

I need a regular expression to find those positions in a character
vector which contain something which is not a number (either positive
or negative, having decimals or not).

myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.")

In this vector, only positions 3 and 4 are numbers, the rest should be captured.
So far I am able to detect anything which is not a number, excluding - and .
[1] 1 2

I still need to capture positions 5 and 6, which in human language
would mean to detect anything which contains a "-" or a "." anywhere
else except at the beginning of a number.

Thanks very much in advance,
Adrian
#
On Thu, Mar 12, 2015 at 2:43 PM, Steve Taylor <steve.taylor at aut.ac.nz> wrote:
One problem is that Adrian wanted, for some reason, to exclude numbers
such as "2." but accept "2.0" . That is, no unnecessary trailing
decimal point. as.numeric() will not fail on "2." since that is a
number. The example grep() specifically excludes this by requiring at
least one digit after any decimal point.
#
On Thu, Mar 12, 2015 at 9:52 PM, John McKown
<john.archie.mckown at gmail.com> wrote:
Indeed, but that was completely unintentional since I knew ".2" should
be a number and I somehow thought "2." was not supposed to be a
number.
I learned a lot about regular expressions (thanks again John), and
both solutions work (thanks very much Steve as well).

Best,
Adrian