An embedded and charset-unspecified text was scrubbed... Name: no disponible URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101215/06b4feea/attachment.pl>
Numbers in a string
14 messages · Luis Felipe Parra, Dieter Menne, Rainer Schuermann +7 more
Luis Felipe Parra wrote:
Hello, I have stings which have all sort of characters (numbers, letters, punctuation marks, etc) I would like to stay only with the numbers in them, does somebody know how to do this?
What to do if for AA3213Be45C02? A site search would have given a few hundreds hits, for example http://r-project.markmail.org/thread/3u6gxyzbnm5x3ksp Dieter
View this message in context: http://r.789695.n4.nabble.com/Numbers-in-a-string-tp3088623p3088719.html Sent from the R help mailing list archive at Nabble.com.
If your OS is Linux, you might want to look at sed or gawk. They are very good and efficient for such tasks. You need it once or as a part of program? Some samples would be helpful... Rgds, Rainer -------- Original-Nachricht --------
Datum: Wed, 15 Dec 2010 16:55:26 +0800 Von: Luis Felipe Parra <felipe.parra at quantil.com.co> An: r-help <r-help at r-project.org> Betreff: [R] Numbers in a string
Hello, I have stings which have all sort of characters (numbers, letters, punctuation marks, etc) I would like to stay only with the numbers in them, does somebody know how to do this? Thank you Felipe Parra [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
------- Windows: Just say No.
Hi Felipe,
gsub("[^0123456789]", "", "AB15E9SDF654VKBN?dvb.65")
results in "15965465".
Would that be what you are looking for?
Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36
-- Do Not Disapprove
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Rainer Schuermann
Sent: woensdag 15 december 2010 11:19
To: r-help at r-project.org
Subject: Re: [R] Numbers in a string
If your OS is Linux, you might want to look at sed or gawk. They are very
good and efficient for such tasks.
You need it once or as a part of program?
Some samples would be helpful...
Rgds,
Rainer
-------- Original-Nachricht --------
Datum: Wed, 15 Dec 2010 16:55:26 +0800 Von: Luis Felipe Parra <felipe.parra at quantil.com.co> An: r-help <r-help at r-project.org> Betreff: [R] Numbers in a string
Hello, I have stings which have all sort of characters (numbers, letters, punctuation marks, etc) I would like to stay only with the numbers in them, does somebody know how to do this? Thank you Felipe Parra [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
------- Windows: Just say No. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20101215/82d57876/attachment.pl>
On Wed, Dec 15, 2010 at 11:08:06AM -0200, Henrique Dallazuanna wrote:
Try this:
gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")
Consider also
strsplit("AB15E9SDF654VKBN?dvb.65", "[^.0-9][^.0-9]*")
[[1]]
[1] "" "15" "9" "654" ".65"
PS.
On Wed, Dec 15, 2010 at 6:55 AM, Luis Felipe Parra < felipe.parra at quantil.com.co> wrote:
Hello, I have stings which have all sort of characters (numbers, letters, punctuation marks, etc) I would like to stay only with the numbers in them, does somebody know how to do this? Thank you Felipe Parra
On Dec 15, 2010, at 6:01 AM, Nick Sabbe wrote:
Hi Felipe,
gsub("[^0123456789]", "", "AB15E9SDF654VKBN?dvb.65")
results in "15965465".
Would that be what you are looking for?
I tried figuring out how to do this from a more positive perspective,
meaning finding a regular expression function that did not require
negating the desired elements, but the best I could do was make a
function that accepted a pattern and then hid the underlying negation:
> pullchar <- function(txt, patt){
if(grepl("\\[", patt)){pattn <- sub("\\[", "\\[\\^", patt)}
else{
pattn<- paste("[^",patt,"]", sep="")}
gsub(pattn, "", txt) #return }
> pullchar("AB15E9SDF654VKBN?dvb.65", "ABD")
[1] "ABDB"
> pullchar("AB15E9SDF654VKBN?dvb.65", "[A-Z]")
[1] "ABESDFVKBN"
> pullchar("AB15E9SDF654VKBN?dvb.65", "[0-9]")
[1] "15965465"
Still learning regex so if there is a "positive" strategy I'm all
ears. ...er, eyes?
David. > > > Nick Sabbe > -- > ping: nick.sabbe at ugent.be > link: http://biomath.ugent.be > wink: A1.056, Coupure Links 653, 9000 Gent > ring: 09/264.59.36 > > -- Do Not Disapprove > > > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org > ] On > Behalf Of Rainer Schuermann > Sent: woensdag 15 december 2010 11:19 > To: r-help at r-project.org > Subject: Re: [R] Numbers in a string > > If your OS is Linux, you might want to look at sed or gawk. They are > very > good and efficient for such tasks. > You need it once or as a part of program? > Some samples would be helpful... > Rgds, > Rainer > > > -------- Original-Nachricht -------- >> Datum: Wed, 15 Dec 2010 16:55:26 +0800 >> Von: Luis Felipe Parra <felipe.parra at quantil.com.co> >> An: r-help <r-help at r-project.org> >> Betreff: [R] Numbers in a string > >> Hello, I have stings which have all sort of characters (numbers, >> letters, >> punctuation marks, etc) I would like to stay only with the numbers in >> them, >> does somebody know how to do this? >> >> Thank you >> >> Felipe Parra >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > > ------- > > Windows: Just say No. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On Wed, Dec 15, 2010 at 01:29:16PM -0500, David Winsemius wrote:
On Dec 15, 2010, at 6:01 AM, Nick Sabbe wrote:
Hi Felipe,
gsub("[^0123456789]", "", "AB15E9SDF654VKBN?dvb.65")
results in "15965465".
Would that be what you are looking for?
I tried figuring out how to do this from a more positive perspective, meaning finding a regular expression function that did not require negating the desired elements, but the best I could do was make a function that accepted a pattern and then hid the underlying negation:
pullchar <- function(txt, patt){
if(grepl("\\[", patt)){pattn <- sub("\\[", "\\[\\^", patt)}
else{
pattn<- paste("[^",patt,"]", sep="")}
gsub(pattn, "", txt) #return }
pullchar("AB15E9SDF654VKBN?dvb.65", "ABD")
[1] "ABDB"
pullchar("AB15E9SDF654VKBN?dvb.65", "[A-Z]")
[1] "ABESDFVKBN"
pullchar("AB15E9SDF654VKBN?dvb.65", "[0-9]")
[1] "15965465" Still learning regex so if there is a "positive" strategy I'm all ears. ...er, eyes?
One of the suggestions in this thread was to use an external program.
A possible solution without negation in Perl is
@a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[0-9]/g);
print @a, "\n";
15965465
or
@a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[.0-9]+/g);
print join(" ", @a), "\n";
15 9 654 .65
Do you mean something in this direction?
Petr Savicky.
Petr Savicky wrote:
One of the suggestions in this thread was to use an external program.
A possible solution without negation in Perl is
@a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[0-9]/g);
print @a, "\n";
15965465
Which is
gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")
as Henrique suggested.
Dieter
View this message in context: http://r.789695.n4.nabble.com/Numbers-in-a-string-tp3088623p3090931.html Sent from the R help mailing list archive at Nabble.com.
On Thu, Dec 16, 2010 at 06:17:45AM -0800, Dieter Menne wrote:
Petr Savicky wrote:
One of the suggestions in this thread was to use an external program.
A possible solution without negation in Perl is
@a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[0-9]/g);
print @a, "\n";
15965465
Which is
gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")
as Henrique suggested.
I agree. The Perl code was a reply to a question, whether the same can be
done by describing the required elements and not by describing the ones to
be removed. This could be useful, if we want to extract elements described
by a more complex regular expression. A more accurate, although not
complete and definitely not the best, extraction of nonnegative numbers
in Perl may be done as follows
@a = ("abcde. 11 abc 5.31e+34, (1.45)" =~ m/[0-9]+\.[0-9]+e[+-][0-9]+|[0-9]+\.[0-9]+|[0-9]+/g);
print join(" ", @a), "\n";
11 5.31e+34 1.45
Can something similar be done in R either specifically for numbers or
for a general regular expression?
Going back to the original question, the answer depends on the complexity of
extracting numbers in a concrete situation. If possible, using functions
within R is suggested (gsub(), strsplit(), ...). On the other hand, there
are cases, where an external tool can be helpful. See also R-intro
Chapter 7 Reading data from files, which says
There is a clear presumption by the designers of R that you will be
able to modify your input files using other tools, such as file editors
or Perl to fit in with the requirements of R.
Petr Savicky.
In S+ strsplit() has a keep=TRUE/FALSE argument to specify whether to return the substrings that match the pattern or to return the substrings between matches to the pattern (the default). E.g.,
strings <- c("abcde. 11 abc 5.31e+34, (1.45)",
"AB15E9SDF654VKBN?dvb.65")
number.pattern <- "[0-9]+\\.[0-9]+e[+-][0-9]+|[0-9]+\\.[0-9]+|[0-9]+" strsplit(strings, number.pattern, keep=TRUE)
[[1]]: [1] "11" "5.31e+34" "1.45" [[2]]: [1] "15" "9" "654" "65"
strsplit(strings, number.pattern, keep=FALSE)
[[1]]:
[1] "abcde. " " abc " ", (" ")"
[[2]]:
[1] "AB" "E" "SDF" "VKBN?dvb."
In R and S+ gregexpr can tell you the start points
and lengths of each match, but it is a pain to
pass this information to substring() to get the
matches themselves. Should [g]regexpr() have a
value= argument like grep has?
In R the gsubfn package can do this sort of thing.
I don't know if it worth adding more to base R's
strsplit().
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Petr Savicky Sent: Thursday, December 16, 2010 8:42 AM To: r-help at r-project.org Subject: Re: [R] Numbers in a string On Thu, Dec 16, 2010 at 06:17:45AM -0800, Dieter Menne wrote:
Petr Savicky wrote:
One of the suggestions in this thread was to use an
external program.
A possible solution without negation in Perl is
@a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[0-9]/g);
print @a, "\n";
15965465
Which is
gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")
as Henrique suggested.
I agree. The Perl code was a reply to a question, whether the
same can be
done by describing the required elements and not by
describing the ones to
be removed. This could be useful, if we want to extract
elements described
by a more complex regular expression. A more accurate, although not
complete and definitely not the best, extraction of
nonnegative numbers
in Perl may be done as follows
@a = ("abcde. 11 abc 5.31e+34, (1.45)" =~
m/[0-9]+\.[0-9]+e[+-][0-9]+|[0-9]+\.[0-9]+|[0-9]+/g);
print join(" ", @a), "\n";
11 5.31e+34 1.45
Can something similar be done in R either specifically for numbers or
for a general regular expression?
Going back to the original question, the answer depends on
the complexity of
extracting numbers in a concrete situation. If possible,
using functions
within R is suggested (gsub(), strsplit(), ...). On the other
hand, there
are cases, where an external tool can be helpful. See also R-intro
Chapter 7 Reading data from files, which says
There is a clear presumption by the designers of R that you will be
able to modify your input files using other tools, such as
file editors
or Perl to fit in with the requirements of R.
Petr Savicky.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Petr Savicky <savicky at cs.cas.cz>
on Wed, 15 Dec 2010 14:21:37 +0100 writes:
> On Wed, Dec 15, 2010 at 11:08:06AM -0200, Henrique
> Dallazuanna wrote:
>> Try this:
>>
>> gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")
> Consider also
> strsplit("AB15E9SDF654VKBN?dvb.65", "[^.0-9][^.0-9]*")
> [[1]] [1] "" "15" "9" "654" ".65"
which can be abbreviated to
strsplit("AB15E9SDF654VKBN?dvb.65", "[^.0-9]+")
Note:
R's regular expression matching capability is really very close to Perl's
and in those cases it is not, these functions have an argument
'perl' (default FALSE) you can switch on.
Martin
>> On Wed, Dec 15, 2010 at 6:55 AM, Luis Felipe Parra <
>> felipe.parra at quantil.com.co> wrote:
>>
>> > Hello, I have stings which have all sort of characters
>> (numbers, letters, > punctuation marks, etc) I would like
>> to stay only with the numbers in them, > does somebody
>> know how to do this?
>> >
>> > Thank you
>> >
>> > Felipe Parra
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do
> read the posting guide
> http://www.R-project.org/posting-guide.html and provide
> commented, minimal, self-contained, reproducible code.
On Thu, Dec 16, 2010 at 11:42 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
On Thu, Dec 16, 2010 at 06:17:45AM -0800, Dieter Menne wrote:
Petr Savicky wrote:
One of the suggestions in this thread was to use an external program.
A possible solution without negation in Perl is
? @a = ("AB15E9SDF654VKBN?dvb.65" =~ m/[0-9]/g);
? print @a, "\n";
? 15965465
Which is
?gsub("[^0-9]", "", "AB15E9SDF654VKBN?dvb.65")
as Henrique suggested.
I agree. The Perl code was a reply to a question, whether the same can be
done by describing the required elements and not by describing the ones to
be removed. This could be useful, if we want to extract elements described
by a more complex regular expression. A more accurate, although not
complete and definitely not the best, extraction of nonnegative numbers
in Perl may be done as follows
?@a = ("abcde. 11 abc 5.31e+34, (1.45)" =~ m/[0-9]+\.[0-9]+e[+-][0-9]+|[0-9]+\.[0-9]+|[0-9]+/g);
?print join(" ", @a), "\n";
?11 5.31e+34 1.45
Can something similar be done in R either specifically for numbers or
for a general regular expression?
Dieter's first post in this thread already answered that question.
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Fri, Dec 17, 2010 at 07:39:46AM -0500, Gabor Grothendieck wrote:
On Thu, Dec 16, 2010 at 11:42 AM, Petr Savicky <savicky at cs.cas.cz> wrote:
[...]
Can something similar be done in R either specifically for numbers or for a general regular expression?
Dieter's first post in this thread already answered that question.
I am sorry for overlooking this solution using package gsubfn, although it was pointed out repeatedly in this thread. The following solves exactly the example i was asking for. library(gsubfn) s <- "abcde. 11 abc 5.31e+34, (1.45)" strapply(s, "[0-9]+\\.[0-9]+e[+-][0-9]+|[0-9]+\\.[0-9]+|[0-9]+")[[1]] [1] "11" "5.31e+34" "1.45" Thank you for this information. Petr Savicky.