See if the following will work for you:
grep('^-?[0-9]+([.]?[0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.")
grep('^-?[0-9]+([.][0-9]+)?$',myvector,perl=TRUE,invert=TRUE)
The key is to match a number, and then invert the TRUE / FALSE (invert=TRUE).
^ == start of string
-? == 0 or 1 minus signs
[0-9]+ == one or more digits
optionally followed by the following via use of (...)?
[.] == an actual period. I tried to escape this, but it failed
[0-9]+ == followed by one or more digits
$ == followed by the end of the string.
so: optional minus, followed by one or more digits, optionally
followed by (a period with one or more ending digits).
On Wed, Mar 11, 2015 at 2:27 PM, Adrian Du?a <dusa.adrian at unibuc.ro> wrote:
Hi everyone,
I need a regular expression to find those positions in a character
vector which contain something which is not a number (either positive
or negative, having decimals or not).
myvector <- c("a3", "N.A", "1.2", "-3", "3-2", "2.")
In this vector, only positions 3 and 4 are numbers, the rest should be captured.
So far I am able to detect anything which is not a number, excluding - and .
grep("[^-0-9.]", myvector)
[1] 1 2
I still need to capture positions 5 and 6, which in human language
would mean to detect anything which contains a "-" or a "." anywhere
else except at the beginning of a number.
Thanks very much in advance,
Adrian
--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania