removing characters from a string
On Tue, 2005-04-12 at 05:54 -0700, Vivek Rao wrote:
Is there a simple way in R to remove all characters from a string other than those in a specified set? For example, I want to keep only the digits 0-9 in a string. In general, I have found the string handling abilities of R a bit limited. (Of course it's great for stats in general). Is there a good reference on this? Or should R programmers dump their output to a text file and use something like Perl or Python for sophisticated text processing? I am familiar with the basic functions such as nchar, substring, as.integer, print, cat, sprintf etc.
Something like the following should work:
x <- paste(sample(c(letters, LETTERS, 0:9), 50, replace = TRUE),
collapse = "")
x
[1] "QvuuAlSJYUFpUpwJomtCir8TfvNQyV6O7W7TlXSXlLHocCdtnV"
gsub("[^0-9]", "", x)
[1] "8677" The use of gsub() here replaces any characters NOT in 0:9 with a "", therefore leaving only the digits. See ?gsub for more information. HTH, Marc Schwartz