Skip to content

Numbers in a string

14 messages · Luis Felipe Parra, Dieter Menne, Rainer Schuermann +7 more

#
Luis Felipe Parra wrote:
What to do if for AA3213Be45C02? A site search would have given a few
hundreds hits, for example

http://r-project.markmail.org/thread/3u6gxyzbnm5x3ksp

Dieter
#
If your OS is Linux, you might want to look at sed or gawk. They are very good and efficient for such tasks.
You need it once or as a part of program? 
Some samples would be helpful...
Rgds,
Rainer


-------- Original-Nachricht --------

  
    
#
Hi Felipe,

gsub("[^0123456789]", "", "AB15E9SDF654VKBN?dvb.65")
results in "15965465".
Would that be what you are looking for?


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove




-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Rainer Schuermann
Sent: woensdag 15 december 2010 11:19
To: r-help at r-project.org
Subject: Re: [R] Numbers in a string

If your OS is Linux, you might want to look at sed or gawk. They are very
good and efficient for such tasks.
You need it once or as a part of program? 
Some samples would be helpful...
Rgds,
Rainer


-------- Original-Nachricht --------

  
    
#
On Wed, Dec 15, 2010 at 11:08:06AM -0200, Henrique Dallazuanna wrote:
Consider also

  strsplit("AB15E9SDF654VKBN?dvb.65", "[^.0-9][^.0-9]*")
  [[1]]
  [1] ""    "15"  "9"   "654" ".65"

PS.
#
On Dec 15, 2010, at 6:01 AM, Nick Sabbe wrote:

            
I tried figuring out how to do this from a more positive perspective,  
meaning finding a regular expression function that did not require  
negating the desired elements, but the best I could do was make a  
function that accepted a pattern and then hid the underlying negation:

 > pullchar <- function(txt, patt){
          if(grepl("\\[", patt)){pattn <- sub("\\[", "\\[\\^", patt)}  
else{
                                 pattn<- paste("[^",patt,"]", sep="")}
          gsub(pattn, "", txt)  #return   }

 > pullchar("AB15E9SDF654VKBN?dvb.65", "ABD")
[1] "ABDB"
 > pullchar("AB15E9SDF654VKBN?dvb.65", "[A-Z]")
[1] "ABESDFVKBN"
 > pullchar("AB15E9SDF654VKBN?dvb.65", "[0-9]")
[1] "15965465"

Still learning regex so if there is a "positive" strategy I'm all  
ears. ...er, eyes?
#
On Wed, Dec 15, 2010 at 01:29:16PM -0500, David Winsemius wrote:
One of the suggestions in this thread was to use an external program.
A possible solution without negation in Perl is

  @a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[0-9]/g);
  print @a, "\n";
  15965465

or

  @a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[.0-9]+/g);
  print join(" ", @a), "\n";
  15 9 654 .65

Do you mean something in this direction?

Petr Savicky.
#
Petr Savicky wrote:
Which is

 gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")

as Henrique suggested.

Dieter
#
On Thu, Dec 16, 2010 at 06:17:45AM -0800, Dieter Menne wrote:
I agree. The Perl code was a reply to a question, whether the same can be
done by describing the required elements and not by describing the ones to
be removed. This could be useful, if we want to extract elements described
by a more complex regular expression. A more accurate, although not
complete and definitely not the best, extraction of nonnegative numbers
in Perl may be done as follows

  @a = ("abcde. 11 abc 5.31e+34, (1.45)" =~ m/[0-9]+\.[0-9]+e[+-][0-9]+|[0-9]+\.[0-9]+|[0-9]+/g);
  print join(" ", @a), "\n";
  11 5.31e+34 1.45

Can something similar be done in R either specifically for numbers or
for a general regular expression?

Going back to the original question, the answer depends on the complexity of
extracting numbers in a concrete situation. If possible, using functions
within R is suggested (gsub(), strsplit(), ...). On the other hand, there
are cases, where an external tool can be helpful. See also R-intro
Chapter 7 Reading data from files, which says

  There is a clear presumption by the designers of R that you will be
  able to modify your input files using other tools, such as file editors
  or Perl to fit in with the requirements of R.

Petr Savicky.
#
In S+ strsplit() has a keep=TRUE/FALSE argument to
specify whether to return the substrings that match
the pattern or to return the substrings between
matches to the pattern (the default).  E.g.,
"AB15E9SDF654VKBN?dvb.65")
[[1]]:
[1] "11"       "5.31e+34" "1.45"    

[[2]]:
[1] "15"  "9"   "654" "65"
[[1]]:
[1] "abcde. " " abc "   ", ("     ")"      

[[2]]:
[1] "AB"        "E"         "SDF"       "VKBN?dvb."

In R and S+ gregexpr can tell you the start points
and lengths of each match, but it is a pain to
pass this information to substring() to get the
matches themselves.  Should [g]regexpr() have a
value= argument like grep has?

In R the gsubfn package can do this sort of thing.
I don't know if it worth adding more to base R's
strsplit().

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
> On Wed, Dec 15, 2010 at 11:08:06AM -0200, Henrique
> Dallazuanna wrote:
>> Try this:
    >> 
    >> gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")

    > Consider also

    >   strsplit("AB15E9SDF654VKBN?dvb.65", "[^.0-9][^.0-9]*")
    > [[1]] [1] "" "15" "9" "654" ".65"

which can be abbreviated to 

       strsplit("AB15E9SDF654VKBN?dvb.65", "[^.0-9]+")

Note: 
 R's regular expression matching capability is really very close to Perl's
 and in those cases it is not, these functions have an argument
 'perl' (default FALSE) you can switch on.

Martin


    >> On Wed, Dec 15, 2010 at 6:55 AM, Luis Felipe Parra <
>> felipe.parra at quantil.com.co> wrote:
>> 
    >> > Hello, I have stings which have all sort of characters
    >> (numbers, letters, > punctuation marks, etc) I would like
    >> to stay only with the numbers in them, > does somebody
    >> know how to do this?
    >> >
    >> > Thank you
    >> >
    >> > Felipe Parra

    > ______________________________________________
    > R-help at r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
    > read the posting guide
    > http://www.R-project.org/posting-guide.html and provide
    > commented, minimal, self-contained, reproducible code.
#
On Thu, Dec 16, 2010 at 11:42 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
Dieter's first post in this thread already answered that question.
#
On Fri, Dec 17, 2010 at 07:39:46AM -0500, Gabor Grothendieck wrote:
[...]
I am sorry for overlooking this solution using package gsubfn, although
it was pointed out repeatedly in this thread. The following solves
exactly the example i was asking for.

  library(gsubfn) 
  s <- "abcde. 11 abc 5.31e+34, (1.45)"
  strapply(s, "[0-9]+\\.[0-9]+e[+-][0-9]+|[0-9]+\\.[0-9]+|[0-9]+")[[1]]
  [1] "11"       "5.31e+34" "1.45"

Thank you for this information.

Petr Savicky.