Dear experts in regexpr.
I have this
dput(test[500:510])
c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3",
"pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1",
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
and I want something like this
gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510])
[1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
[11] "6,13"
but with 10,04 values instead of 0,04.
I tried
gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])
or other variations but without any success.
Please help.
Regards
Petr
regular expression strikes again
7 messages · PIKAL Petr, Jan T. Kim, Peter Dalgaard +2 more
On Jul 9, 2013, at 11:45 , PIKAL Petr wrote:
Dear experts in regexpr.
I have this
dput(test[500:510])
c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3",
"pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1",
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
and I want something like this
gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510])
[1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
[11] "6,13"
but with 10,04 values instead of 0,04.
I tried
gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])
or other variations but without any success.
Presumably the ^.* is too greedy. Perhaps add a space? I.e.,
gsub("^.* ([[:di......
Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Thanks, it works to some extent. The test comes from some file which is not filled propperly. If I use your suggestion I get correct values for those 2 digit numbers before "," but I get some other values which do not have space before numbers.
dput(test[c(1:10,500:510)])
c("Cl Tio2 ph 5,8 1", "Cl Tio2 ph 5,8 2", "Cl Tio2 ph 5,8 3",
"pH5,57 1", "pH5,57 2", "pH5,57 3", "pH4,8 1", "pH4,8 2", "pH4,8 3",
"pH4,12 1", "pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2",
"pH 9,66 3", "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1",
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
gsub("^.* ([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[c(1:10,500:510)])
[1] "5,8" "5,8" "5,8" "pH5,57 1" "pH5,57 2" "pH5,57 3" [7] "pH4,8 1" "pH4,8 2" "pH4,8 3" "pH4,12 1" "9,36" "9,36" [13] "9,66" "9,66" "9,66" "10,04" "10,04" "10,04" [19] "6,13" "6,13" "6,13"
Basically I would like to get one or two digits before comma and two digits after comma. Thanks anyway Petr
-----Original Message----- From: peter dalgaard [mailto:pdalgd at gmail.com] Sent: Tuesday, July 09, 2013 11:59 AM To: PIKAL Petr Cc: r-help Subject: Re: [R] regular expression strikes again On Jul 9, 2013, at 11:45 , PIKAL Petr wrote:
Dear experts in regexpr.
I have this
dput(test[500:510])
c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3",
"pH
10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1", "RGLP
144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
and I want something like this
gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510]) [1]
"9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
[11] "6,13"
but with 10,04 values instead of 0,04.
I tried
gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])
or other variations but without any success.
Presumably the ^.* is too greedy. Perhaps add a space? I.e.,
gsub("^.* ([[:di......
--
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School Solbjerg Plads 3,
2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On Tue, Jul 09, 2013 at 09:45:55AM +0000, PIKAL Petr wrote:
Dear experts in regexpr.
I have this
dput(test[500:510])
c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3",
"pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1",
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
and I want something like this
gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510])
[1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
[11] "6,13"
but with 10,04 values instead of 0,04.
I tried
gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])
or other variations but without any success.
Please help.
The "1" in "10,04" is matched by ".*". In your example, all floating comma numbers you're trying to extract are preceded by "pH ", so replacing ".*" with ".*pH " should do what you want. I'd be wary about that variation of having "RGLP 144006" in some cases, though, it might be better to clean up this rubbish earlier on (and it would be ideal to never have it generated in the first place). Regular expressions can be useful to separate some chaff from the wheat, but relying on that too much comes with a risk of extracting something that is valid in some syntactic / technical sense but not correct semantically. If you can't be 100% certain that the number you want is (1) always preceded by "pH ", (2) always a floating comma number and (3) will always contain an integer and a fractional part (i.e. you'll never get ",09" rather than "0,09", or "10" rather than "10,0"), you have to be prepared for more difficulties, and you may want to consider a more systematic approach to parsing your input. Best regards, Jan
+- Jan T. Kim -------------------------------------------------------+ | email: jttkim at gmail.com | | WWW: http://www.jtkim.dreamhosters.com/ | *-----=< hierarchical systems are for files, not for humans >=-----*
On Jul 9, 2013, at 12:19 , PIKAL Petr wrote:
Thanks, it works to some extent. The test comes from some file which is not filled propperly. If I use your suggestion I get correct values for those 2 digit numbers before "," but I get some other values which do not have space before numbers.
dput(test[c(1:10,500:510)])
c("Cl Tio2 ph 5,8 1", "Cl Tio2 ph 5,8 2", "Cl Tio2 ph 5,8 3",
"pH5,57 1", "pH5,57 2", "pH5,57 3", "pH4,8 1", "pH4,8 2", "pH4,8 3",
"pH4,12 1", "pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2",
"pH 9,66 3", "pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1",
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
gsub("^.* ([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[c(1:10,500:510)])
[1] "5,8" "5,8" "5,8" "pH5,57 1" "pH5,57 2" "pH5,57 3" [7] "pH4,8 1" "pH4,8 2" "pH4,8 3" "pH4,12 1" "9,36" "9,36" [13] "9,66" "9,66" "9,66" "10,04" "10,04" "10,04" [19] "6,13" "6,13" "6,13"
Basically I would like to get one or two digits before comma and two digits after comma.
Then maybe
gsub("^.*[^[:digit:]]([[:digit:]]+,[[:digit:]]*).*$", "\\1", x)
[1] "5,8" "5,8" "5,8" "5,57" "5,57" "5,57" "4,8" "4,8" "4,8" [10] "4,12" "9,36" "9,36" "9,66" "9,66" "9,66" "10,04" "10,04" "10,04" [19] "6,13" "6,13" "6,13"
Peter Dalgaard, Professor Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Hi,
May be this helps:
? gsub(".*\\w+\\s+(.*)\\s+.*","\\1",test)
?#[1] "9,36"? "9,36"? "9,66"? "9,66"? "9,66"? "10,04" "10,04" "10,04" "6,13"
#[10] "6,13"? "6,13"
A.K.
----- Original Message -----
From: PIKAL Petr <petr.pikal at precheza.cz>
To: r-help <r-help at r-project.org>
Cc:
Sent: Tuesday, July 9, 2013 5:45 AM
Subject: [R] regular expression strikes again
Dear experts in regexpr.
I have this
dput(test[500:510])
c("pH 9,36 2", "pH 9,36 3", "pH 9,66 1", "pH 9,66 2", "pH 9,66 3",
"pH 10,04 1", "pH 10,04 2", "pH 10,04 3", "RGLP 144006 pH 6,13 1",
"RGLP 144006 pH 6,13 2", "RGLP 144006 pH 6,13 3")
and I want something like this
gsub("^.*([[:digit:]],[[:digit:]]*).*$", "\\1", test[500:510])
[1] "9,36" "9,36" "9,66" "9,66" "9,66" "0,04" "0,04" "0,04" "6,13" "6,13"
[11] "6,13"
but with 10,04 values instead of 0,04.
I tried
gsub("^.*([[:digit:]]+,[[:digit:]]*).*$", "\\1", test[500:510])
or other variations but without any success.
Please help.
Regards
Petr
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130709/2b8716d2/attachment.pl>