Skip to content
Prev 274603 / 398506 Next

Which function to use: grep, replace, substr etc.?

You can use the 2 character sequences "\\<" and "\\>" to match
the beginning and end of a "word" (where the match takes up zero
characters):
  > dataset <- c("corn", "cornmeal", "corn on the cob", "popcorn", "this corn is sweet")
  > grep("^corn$|^corn | corn$", dataset)
  [1] 1 3
  > grep("\\<corn\\>", dataset)
  [1] 1 3 5
  > gsub("\\<corn\\>", "CORN", dataset)
  [1] "CORN"              
  [2] "cornmeal"          
  [3] "CORN on the cob"   
  [4] "popcorn"           
  [5] "this CORN is sweet"

If your definition of a "word" is more expansive it gets complicated.
E.g., if words might include letters, numbers, and periods but not
underscores or anything else, you could use:
  > gsub("(^|[^.[:alpha:][:digit:]])?corn($|[^.[:alpha:][:digit:]])?",
      "\\1CORN.BY.ITSELF\\2",
      c("corn.1", "corn_2", " corn", "4corn", "1.corn"))
  [1] "corn.1"          
  [2] "CORN.BY.ITSELF_2"
  [3] " CORN.BY.ITSELF" 
  [4] "4corn"           
  [5] "1.corn"
Moving to perl regular expressions would probably make this simpler.    

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com