Skip to content

removing characters from a string

8 messages · Vivek Rao, Marc Schwartz, John Fox +5 more

#
Is there a simple way in R to remove all characters
from a string other than those in a specified set? For
example, I want to keep only the digits 0-9 in a
string.

In general, I have found the string handling abilities
of R a bit limited. (Of course it's great for stats in
general). Is there a good reference on this? Or should
R programmers dump their output to a text file and use
something like Perl or Python for sophisticated text
processing?

I am familiar with the basic functions such as nchar,
substring, as.integer, print, cat, sprintf etc.
#
On Tue, 2005-04-12 at 05:54 -0700, Vivek Rao wrote:
Something like the following should work:
collapse = "")
[1] "QvuuAlSJYUFpUpwJomtCir8TfvNQyV6O7W7TlXSXlLHocCdtnV"
[1] "8677"

The use of gsub() here replaces any characters NOT in 0:9 with a "",
therefore leaving only the digits.

See ?gsub for more information.

HTH,

Marc Schwartz
#
Dear Vivek,

Actually, I think R has reasonably good facilities for manipulating strings.
See ?gsub etc.; for example:

gsub("[^0-9]", "", "XKa0&*1jk2")
[1] "012"

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
#
Vivek> Is there a simple way in R to remove all characters
    Vivek> from a string other than those in a specified set? For
    Vivek> example, I want to keep only the digits 0-9 in a
    Vivek> string.

    Vivek> In general, I have found the string handling abilities
    Vivek> of R a bit limited. (Of course it's great for stats in
    Vivek> general). Is there a good reference on this? Or should
    Vivek> R programmers dump their output to a text file and use
    Vivek> something like Perl or Python for sophisticated text
    Vivek> processing?

    Vivek> I am familiar with the basic functions such as nchar,
    Vivek> substring, as.integer, print, cat, sprintf etc.

It depends on your "etc":

The above is pretty trivial using gsub(),
but since you sound sophisticated enough to proclaim missing R
abilities, I leave the exercise to you.

Martin
#
look at "?gsub()", e.g.,

string <- "ab03def10-523rtf"
string
gsub("[^0-9]", "", string)
gsub("[0-9]", "", string)


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm


----- Original Message ----- 
From: "Vivek Rao" <rvivekrao at yahoo.com>
To: <r-help at stat.math.ethz.ch>
Sent: Tuesday, April 12, 2005 2:54 PM
Subject: [R] removing characters from a string
#
Hi

Try


  gsub("[^0-9]","","1111af-456utaDFasswe34534%^&%*$h567890ersdfg")
[1] "111145634534567890"



HTH

rksh
On Apr 12, 2005, at 01:54 pm, Vivek Rao wrote:

            
--
Robin Hankin
Uncertainty Analyst
Southampton Oceanography Centre
European Way, Southampton SO14 3ZH, UK
  tel  023-8059-7743
#
Using help.start() and searching on keyword "character" or using
help.search(keyword="character") will show you what you have missed.

As others have pointed out, you have missed the power of regular 
expressions (despite that being how these things are done in Perl).
Also, strsplit() can be very powerful.
On Tue, 12 Apr 2005, Vivek Rao wrote:

            
Your exploration of them seems more than a bit limited.

  
    
#
Martin Maechler wrote:
Part of the problem here is our help system.  gsub is documented within 
the grep topic, so when you look at the keyword==character topics, you 
don't see it explicitly.  (You do see "pattern matching and 
replacement", which should have been a hint.)  And if you were looking 
for "string handling" under the programming category, you're completely 
out of luck.

Another reason some people might see R's string handling as limited is 
that it is sometimes more cumbersome to manipulate strings in R than in 
other languages.  For example, I vaguely recall that there's a good 
reason why R doesn't use "+" to concatenate strings, but I can't 
remember what it is.  And sometimes I'd like to strip whitespace or pad 
things to a given width; I generally need to define my own functions to 
do that each time.  R is capable of concatenation, stripping and 
padding, but is sometimes a little obscure in how it does them.

Duncan Murdoch