Skip to content
Back to formatted view

Raw Message

Message-ID: <1113311634.6103.9.camel@horizons.localdomain>
Date: 2005-04-12T13:13:54Z
From: Marc Schwartz
Subject: removing characters from a string
In-Reply-To: <20050412125455.86170.qmail@web31301.mail.mud.yahoo.com>

On Tue, 2005-04-12 at 05:54 -0700, Vivek Rao wrote:
> Is there a simple way in R to remove all characters
> from a string other than those in a specified set? For
> example, I want to keep only the digits 0-9 in a
> string.
> 
> In general, I have found the string handling abilities
> of R a bit limited. (Of course it's great for stats in
> general). Is there a good reference on this? Or should
> R programmers dump their output to a text file and use
> something like Perl or Python for sophisticated text
> processing?
> 
> I am familiar with the basic functions such as nchar,
> substring, as.integer, print, cat, sprintf etc.

Something like the following should work:

> x <- paste(sample(c(letters, LETTERS, 0:9), 50, replace = TRUE),
             collapse = "")

> x
[1] "QvuuAlSJYUFpUpwJomtCir8TfvNQyV6O7W7TlXSXlLHocCdtnV"

> gsub("[^0-9]", "", x)
[1] "8677"

The use of gsub() here replaces any characters NOT in 0:9 with a "",
therefore leaving only the digits.

See ?gsub for more information.

HTH,

Marc Schwartz