An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111003/c362478b/attachment.pl>
Parsing variable-length delimited strings into a matrix
3 messages · Benjamin Wright, R. Michael Weylandt, jim holtman
Well how do you want it be made into a matrix if the rows are all different lengths? Methinks you are finding this tricky for a reason... Michael
On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright <bjw78 at well.ox.ac.uk> wrote:
I'm struggling to find a way of parsing a vector of data in this sort of form:
A,B,C
B,B
A,AA,C
A,B,BB,BBB,B,B
into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes.
Is there a reasonably quick method of achieving this in a vector-oriented way?
Cheers,
Ben
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Will this do it for you:
x <- readLines(textConnection("A,B,C
+ B,B + A,AA,C + A,B,BB,BBB,B,B"))
closeAllConnections() x.s <- strsplit(x, ',') # determine max length x.max <- max(sapply(x.s, length)) # create character matrix x.mat <- matrix(
+ sapply(x.s, function(a) c(a, rep(NA, x.max - length(a)))) + , byrow = TRUE + , ncol = x.max + )
x.mat
[,1] [,2] [,3] [,4] [,5] [,6] [1,] "A" "B" "C" NA NA NA [2,] "B" "B" NA NA NA NA [3,] "A" "AA" "C" NA NA NA [4,] "A" "B" "BB" "BBB" "B" "B"
On Mon, Oct 3, 2011 at 11:40 AM, Benjamin Wright <bjw78 at well.ox.ac.uk> wrote:
I'm struggling to find a way of parsing a vector of data in this sort of form:
A,B,C
B,B
A,AA,C
A,B,BB,BBB,B,B
into a matrix (or data frame). The catch is that I don't know a priori how many entries there will be in each element, nor how many characters there will be. strsplit(vec,",") gets me a list, but I can't find a way of turning the list into a matrix. unlistlst) destroys the length data and do.call("rbind", lst) fails because of the uneven lengths. It is possible to go through the vector element by element, but that has proved too slow for my purposes.
Is there a reasonably quick method of achieving this in a vector-oriented way?
Cheers,
Ben
? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?