removing characters from a string
Martin Maechler wrote:
"Vivek" == Vivek Rao <rvivekrao at yahoo.com> on Tue, 12 Apr 2005 05:54:55 -0700 (PDT) writes:
Vivek> Is there a simple way in R to remove all characters
Vivek> from a string other than those in a specified set? For
Vivek> example, I want to keep only the digits 0-9 in a
Vivek> string.
Vivek> In general, I have found the string handling abilities
Vivek> of R a bit limited. (Of course it's great for stats in
Vivek> general). Is there a good reference on this? Or should
Vivek> R programmers dump their output to a text file and use
Vivek> something like Perl or Python for sophisticated text
Vivek> processing?
Vivek> I am familiar with the basic functions such as nchar,
Vivek> substring, as.integer, print, cat, sprintf etc.
It depends on your "etc":
The above is pretty trivial using gsub(),
but since you sound sophisticated enough to proclaim missing R
abilities, I leave the exercise to you.
Part of the problem here is our help system. gsub is documented within the grep topic, so when you look at the keyword==character topics, you don't see it explicitly. (You do see "pattern matching and replacement", which should have been a hint.) And if you were looking for "string handling" under the programming category, you're completely out of luck. Another reason some people might see R's string handling as limited is that it is sometimes more cumbersome to manipulate strings in R than in other languages. For example, I vaguely recall that there's a good reason why R doesn't use "+" to concatenate strings, but I can't remember what it is. And sometimes I'd like to strip whitespace or pad things to a given width; I generally need to define my own functions to do that each time. R is capable of concatenation, stripping and padding, but is sometimes a little obscure in how it does them. Duncan Murdoch