Skip to content

Regular Expressions in grep

8 messages · Noia Raindrops, R. Michael Weylandt, arun +2 more

#
Dear r-help members,

I have a number in the form of a string, say:

a<-"-01020.909200"

I'd like to extract "1020." as well as ".9092"

Front<-grep(pattern="[1-9]+[0-9]*\\.", value=TRUE, x=a, fixed=FALSE)
End<-grep(pattern="\\.[0-9]*[1-9]+", value=TRUE, x=a, fixed=FALSE)

However, both strings give "-01020.909200", exactly a.
Could you please point me to what is wrong?

Thanks and best regards
H. van Lishaut
#
grep() returns the matches. You want regexpr() and regmatches()

-- Bert

On Tue, Aug 21, 2012 at 12:24 PM, Dr. Holger van Lishaut
<H.v.Lishaut at gmx.de> wrote:

  
    
#
'grep' does not change strings. Use 'gsub' or 'regmatches':

# gsub
Front <- gsub("^.*?([1-9][0-9]*\\.).*?$", "\\1", a)
End <- gsub("^.*?(\\.[0-9]*[1-9]).*?$", "\\1", a)
# regexpr and regmatches (R >= 2.14.0)
Front <- regmatches(a, regexpr("[1-9][0-9]*\\.", a))
End <- regmatches(a, regexpr("\\.[0-9]*[1-9]", a))

Front
## [1] "1020."
End
## [1] ".9092"
#
You're misreading the docs: from grep,

   value: if ?FALSE?, a vector containing the (?integer?) indices of
          the matches determined by ?grep? is returned, and if ?TRUE?,
          a vector containing the matching elements themselves is
          returned.

Since there's a match somewhere in a[1], all of a[1] is returned (it
is a matching element), not just the matching bit: grep(x, value =
TRUE) is something like x[grepl(x)] to my mind.

I think you want ?regexpr or possibly just substitute out the
non-match with gsub.

Cheers,
Michael

On Tue, Aug 21, 2012 at 2:24 PM, Dr. Holger van Lishaut
<H.v.Lishaut at gmx.de> wrote:
#
HI,
Try this:
gsub("^-\\d(\\d{4}.).*","\\1",a)
#[1] "1020."
gsub("^.*(.\\d{5}).","\\1",a)
#[1] ".90920"
A.K.



----- Original Message -----
From: Dr. Holger van Lishaut <H.v.Lishaut at gmx.de>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Tuesday, August 21, 2012 3:24 PM
Subject: [R]  Regular Expressions in grep

Dear r-help members,

I have a number in the form of a string, say:

a<-"-01020.909200"

I'd like to extract "1020." as well as ".9092"

Front<-grep(pattern="[1-9]+[0-9]*\\.", value=TRUE, x=a, fixed=FALSE)
End<-grep(pattern="\\.[0-9]*[1-9]+", value=TRUE, x=a, fixed=FALSE)

However, both strings give "-01020.909200", exactly a.
Could you please point me to what is wrong?

Thanks and best regards
H. van Lishaut

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Dear all,

regmatches works.

And, since this has been asked here before:

SignifStellen<-function(x){
     strx=as.character(x)
     nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.[0-9]*[1-9]",strx)))-1
}

returns the significant figures of a number. Perhaps this can help someone.

Thanks & best regards
H. van Lishaut
#
...

On Wed, Aug 22, 2012 at 12:46 PM, Dr. Holger van Lishaut
<H.v.Lishaut at gmx.de> wrote:
except that ?signif already does this, no?

-- Bert

  
    
#
Am 22.08.2012, 21:46 Uhr, schrieb Dr. Holger van Lishaut  
<H.v.Lishaut at gmx.de>:
Sorry, to work, it must read:

SignifStellen<-function(x){
   strx=as.character(x)
   intFront <- nchar(regmatches(strx, regexpr("[1-9][0-9]*\\.", strx)))
   intEnd <- nchar(regmatches(strx, regexpr("\\.[0-9]*[1-9]", strx)))
   intFront+intEnd-2
}

Best regards
H. van Lishaut