Convert COLON separated format
If you want something that is fast, read the file in, strip off the colon/data, write it out to a temp and then read it back in. Here is a 355K line file:
temp <- tempfile()
input <- readLines('/temp/colon.txt')
length(input)
[1] 355212
system.time(input <- gsub("(:[0-9]+)", "", input))
user system elapsed 0.72 0.00 0.74
head(input)
[1] "1 5 27 345" "1 5 27 345" "1 5 27 345" "1 5 27 345" "1 5 27 345" "1 5 27 345"
writeLines(input, temp) system.time(newInput <- read.table(temp))
user system elapsed 1.08 0.02 1.13
dim(newInput)
[1] 355212 4
head(newInput)
V1 V2 V3 V4 1 1 5 27 345 2 1 5 27 345 3 1 5 27 345 4 1 5 27 345 5 1 5 27 345 6 1 5 27 345
On Tue, Oct 9, 2012 at 12:56 AM, Noah Silverman <noahsilverman at ucla.edu> wrote:
I have a bunch of data sets that were created for the libsvm tool. They are in "colon separated sparse format". i.e. 1 5:1 27:3 345:10 Is a row with the label of "1" and only has values in columns 5, 27, and 345. I want to read these into a data.frame in R. Is there a simple way to do this? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences Building Los Angeles, CA 90095
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.