Skip to content

UNIX-like "cut" command in R

10 messages · Mike Miller, Andrew Robinson, Gabor Grothendieck +3 more

#
The R "cut" command is entirely different from the UNIX "cut" command. 
The latter retains selected fields in a line of text.  I can do that kind 
of manipulation using sub() or gsub(), but it is tedious.  I assume there 
is an R function that will do this, but I don't know its name.  Can you 
tell me?

I'm also guessing that there is a web page somewhere that will tell me how 
to do a lot of common GNU/UNIX/Linux "text util" commmand-line kinds of 
things in R.  By that I mean by using R functions, not by making system 
calls.  Does anyone know of such a web page?

Thanks in advance.

Mike

--
Michael B. Miller, Ph.D.
Minnesota Center for Twin and Family Research
Department of Psychology
University of Minnesota
#
Hi Mike,

try substr()

Cheers

Andrew
On Mon, May 02, 2011 at 03:53:58PM -0500, Mike Miller wrote:

  
    
#
On Tue, 3 May 2011, Andrew Robinson wrote:

            
OK.  Apparently, it allows things like this...
[1] "bcd"

...which is like this:

echo "abcdef" | cut -c2-4

But that doesn't use a delimiter, it only does character-based cutting, 
and it is very limited.  With "cut -c" I can do stuff this:

echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17-

abclmnoqrstuvwxyz

It extracts characters 1 to 3, 12 to 15 and 17 to the end.

That was a great tip, though, because it led me to strsplit, which can do 
what I want, however somewhat awkwardly:
[1] "a b c l m n o q r s t u v w x y z"

That gives me what I want, but it is still a little awkward.  I guess I 
don't quite get what I'm doing with lists.  I'm not clear on how this 
would work with a vector of strings.

Mike
#
On Mon, May 2, 2011 at 10:32 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
Try this:
V1   V3         V5
1 abc lmno qrstuvwxyz
#
On Mon, 2 May 2011, Gabor Grothendieck wrote:

            
That gives me a few more functions to study.  Of course the new code 
(using read.fwf() and textConnection()) is not doing what was requested 
and it requires some work to compute the widths from the given numbers 
(c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)).

Mike
#
Mike Miller wrote:
Use str_sub() in the stringr package:

require(stringr)  # install first if necessary
s <- "abcdefghijklmnopqrstuvwxyz"

str_sub(s, c(1,12,17), c(3,15,-1))
#[1] "abc"        "lmno"       "qrstuvwxyz"


Peter Ehlers
#
On Mon, 2 May 2011, P Ehlers wrote:

            
Thanks.  That's very close to what I'm looking for, but it seems to 
correspond to "cut -c", not to "cut -f".  Can it work with delimiters or 
only with character counts?

Mike
#
x <- "this is a string"
unlist(strsplit(x," "))[c(1,4)]

HTH Christian
#
On Tue, 3 May 2011, Christian Schulz wrote:

            
Thanks.  I did figure that one out a couple of messages back, but to get 
it do behave like "cut -d' ' -f1,4", I had to add a paste command to 
reassemble the parts:

paste(unlist(strsplit(x," "))[c(1,4)], collapse=" ")

Then I wasn't sure if I could do this to every element of a vector of 
strings without looping -- I have to think not.

Mike
#
On Tue, May 03, 2011 at 01:39:49AM -0500, Mike Miller wrote:
[...]
Try the following

  x <- c("this is a string", "this is a numeric")

  reassemble <- function(x, ind) paste(x[ind], collapse=" ")

  vapply(strsplit(x," "), reassemble, "character", c(1, 4))

  [1] "this string"  "this numeric"

Hope this helps.

Petr Savicky.