The R "cut" command is entirely different from the UNIX "cut" command. The latter retains selected fields in a line of text. I can do that kind of manipulation using sub() or gsub(), but it is tedious. I assume there is an R function that will do this, but I don't know its name. Can you tell me? I'm also guessing that there is a web page somewhere that will tell me how to do a lot of common GNU/UNIX/Linux "text util" commmand-line kinds of things in R. By that I mean by using R functions, not by making system calls. Does anyone know of such a web page? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota
UNIX-like "cut" command in R
10 messages · Mike Miller, Andrew Robinson, Gabor Grothendieck +3 more
Hi Mike, try substr() Cheers Andrew
On Mon, May 02, 2011 at 03:53:58PM -0500, Mike Miller wrote:
The R "cut" command is entirely different from the UNIX "cut" command. The latter retains selected fields in a line of text. I can do that kind of manipulation using sub() or gsub(), but it is tedious. I assume there is an R function that will do this, but I don't know its name. Can you tell me? I'm also guessing that there is a web page somewhere that will tell me how to do a lot of common GNU/UNIX/Linux "text util" commmand-line kinds of things in R. By that I mean by using R functions, not by making system calls. Does anyone know of such a web page? Thanks in advance. Mike -- Michael B. Miller, Ph.D. Minnesota Center for Twin and Family Research Department of Psychology University of Minnesota
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Andrew Robinson Program Manager, ACERA Department of Mathematics and Statistics Tel: +61-3-8344-6410 University of Melbourne, VIC 3010 Australia (prefer email) http://www.ms.unimelb.edu.au/~andrewpr Fax: +61-3-8344-4599 http://www.acera.unimelb.edu.au/ Forest Analytics with R (Springer, 2011) http://www.ms.unimelb.edu.au/FAwR/ Introduction to Scientific Programming and Simulation using R (CRC, 2009): http://www.ms.unimelb.edu.au/spuRs/
On Tue, 3 May 2011, Andrew Robinson wrote:
try substr()
OK. Apparently, it allows things like this...
substr("abcdef",2,4)
[1] "bcd" ...which is like this: echo "abcdef" | cut -c2-4 But that doesn't use a delimiter, it only does character-based cutting, and it is very limited. With "cut -c" I can do stuff this: echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- abclmnoqrstuvwxyz It extracts characters 1 to 3, 12 to 15 and 17 to the end. That was a great tip, though, because it led me to strsplit, which can do what I want, however somewhat awkwardly:
y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)
[1] "a b c l m n o q r s t u v w x y z" That gives me what I want, but it is still a little awkward. I guess I don't quite get what I'm doing with lists. I'm not clear on how this would work with a vector of strings. Mike
On Mon, May 2, 2011 at 10:32 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
On Tue, 3 May 2011, Andrew Robinson wrote:
try substr()
OK. ?Apparently, it allows things like this...
substr("abcdef",2,4)
[1] "bcd" ...which is like this: echo "abcdef" | cut -c2-4 But that doesn't use a delimiter, it only does character-based cutting, and it is very limited. ?With "cut -c" I can do stuff this: echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- abclmnoqrstuvwxyz It extracts characters 1 to 3, 12 to 15 and 17 to the end. That was a great tip, though, because it led me to strsplit, which can do what I want, however somewhat awkwardly:
y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)
[1] "a b c l m n o q r s t u v w x y z" That gives me what I want, but it is still a little awkward. ?I guess I don't quite get what I'm doing with lists. ?I'm not clear on how this would work with a vector of strings.
Try this:
read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL"))
V1 V3 V5 1 abc lmno qrstuvwxyz
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
On Mon, 2 May 2011, Gabor Grothendieck wrote:
On Mon, May 2, 2011 at 10:32 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
On Tue, 3 May 2011, Andrew Robinson wrote:
try substr()
OK. ?Apparently, it allows things like this...
substr("abcdef",2,4)
[1] "bcd" ...which is like this: echo "abcdef" | cut -c2-4 But that doesn't use a delimiter, it only does character-based cutting, and it is very limited. ?With "cut -c" I can do stuff this: echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- abclmnoqrstuvwxyz It extracts characters 1 to 3, 12 to 15 and 17 to the end. That was a great tip, though, because it led me to strsplit, which can do what I want, however somewhat awkwardly:
y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)
[1] "a b c l m n o q r s t u v w x y z" That gives me what I want, but it is still a little awkward. ?I guess I don't quite get what I'm doing with lists. ?I'm not clear on how this would work with a vector of strings.
Try this:
read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL"))
V1 V3 V5 1 abc lmno qrstuvwxyz
That gives me a few more functions to study. Of course the new code (using read.fwf() and textConnection()) is not doing what was requested and it requires some work to compute the widths from the given numbers (c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)). Mike
Mike Miller wrote:
On Mon, 2 May 2011, Gabor Grothendieck wrote:
On Mon, May 2, 2011 at 10:32 PM, Mike Miller <mbmiller+l at gmail.com> wrote:
On Tue, 3 May 2011, Andrew Robinson wrote:
try substr()
OK. Apparently, it allows things like this...
substr("abcdef",2,4)
[1] "bcd" ...which is like this: echo "abcdef" | cut -c2-4 But that doesn't use a delimiter, it only does character-based cutting, and it is very limited. With "cut -c" I can do stuff this: echo "abcdefghijklmnopqrstuvwxyz" | cut -c-3,12-15,17- abclmnoqrstuvwxyz It extracts characters 1 to 3, 12 to 15 and 17 to the end. That was a great tip, though, because it led me to strsplit, which can do what I want, however somewhat awkwardly:
y <- "a b c d e f g h i j k l m n o p q r s t u v w x y z" paste(unlist(strsplit(y, delim))[c(1:3,12:15,17:26)], collapse=delim)
[1] "a b c l m n o q r s t u v w x y z" That gives me what I want, but it is still a little awkward. I guess I don't quite get what I'm doing with lists. I'm not clear on how this would work with a vector of strings.
Try this:
read.fwf(textConnection("abcdefghijklmnopqrstuvwxyz"), widths = c(3, 8, 4, 1, 10), colClasses = c(NA, "NULL"))
V1 V3 V5 1 abc lmno qrstuvwxyz
That gives me a few more functions to study. Of course the new code (using read.fwf() and textConnection()) is not doing what was requested and it requires some work to compute the widths from the given numbers (c(1:3, 12:15, 17:26) has to be converted to c(3, 8, 4, 1, 10)). Mike
Use str_sub() in the stringr package: require(stringr) # install first if necessary s <- "abcdefghijklmnopqrstuvwxyz" str_sub(s, c(1,12,17), c(3,15,-1)) #[1] "abc" "lmno" "qrstuvwxyz" Peter Ehlers
On Mon, 2 May 2011, P Ehlers wrote:
Use str_sub() in the stringr package: require(stringr) # install first if necessary s <- "abcdefghijklmnopqrstuvwxyz" str_sub(s, c(1,12,17), c(3,15,-1)) #[1] "abc" "lmno" "qrstuvwxyz"
Thanks. That's very close to what I'm looking for, but it seems to correspond to "cut -c", not to "cut -f". Can it work with delimiters or only with character counts? Mike
On Mon, 2 May 2011, P Ehlers wrote:
Use str_sub() in the stringr package: require(stringr) # install first if necessary s <- "abcdefghijklmnopqrstuvwxyz" str_sub(s, c(1,12,17), c(3,15,-1)) #[1] "abc" "lmno" "qrstuvwxyz"
Thanks. That's very close to what I'm looking for, but it seems to correspond to "cut -c", not to "cut -f". Can it work with delimiters or only with character counts? Mike
x <- "this is a string" unlist(strsplit(x," "))[c(1,4)] HTH Christian
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Tue, 3 May 2011, Christian Schulz wrote:
On Mon, 2 May 2011, P Ehlers wrote:
Use str_sub() in the stringr package: require(stringr) # install first if necessary s <- "abcdefghijklmnopqrstuvwxyz" str_sub(s, c(1,12,17), c(3,15,-1)) #[1] "abc" "lmno" "qrstuvwxyz"
Thanks. That's very close to what I'm looking for, but it seems to correspond to "cut -c", not to "cut -f". Can it work with delimiters or only with character counts? Mike
x <- "this is a string" unlist(strsplit(x," "))[c(1,4)]
Thanks. I did figure that one out a couple of messages back, but to get it do behave like "cut -d' ' -f1,4", I had to add a paste command to reassemble the parts: paste(unlist(strsplit(x," "))[c(1,4)], collapse=" ") Then I wasn't sure if I could do this to every element of a vector of strings without looping -- I have to think not. Mike
On Tue, May 03, 2011 at 01:39:49AM -0500, Mike Miller wrote:
On Tue, 3 May 2011, Christian Schulz wrote:
[...]
x <- "this is a string" unlist(strsplit(x," "))[c(1,4)]
Thanks. I did figure that one out a couple of messages back, but to get it do behave like "cut -d' ' -f1,4", I had to add a paste command to reassemble the parts: paste(unlist(strsplit(x," "))[c(1,4)], collapse=" ") Then I wasn't sure if I could do this to every element of a vector of strings without looping -- I have to think not.
Try the following
x <- c("this is a string", "this is a numeric")
reassemble <- function(x, ind) paste(x[ind], collapse=" ")
vapply(strsplit(x," "), reassemble, "character", c(1, 4))
[1] "this string" "this numeric"
Hope this helps.
Petr Savicky.