textConnection taking a long time to open a big string
Two alternate ways to the same result: x.1 <- scan(file=, what=rep(list(0),17), fill=T, multi.line=F) incomplete.lines <- seq(length(x.1[[17]]))[ is.na(x.1[[17]] ] x.1 <- scan(file=, what='') x.2 <- strsplit(x.1, "[\\t ]") incomplete.lines <- seq(length(x.1))[ unlist(lapply(x.2, length)) < 17 ] Please read the help for these functions. HTH - tom blackwell - u michigan medical school - ann arbor -
On Wed, 30 Apr 2003 james.holtman at convergys.com wrote:
I was using 'textConnection' to read in a file with about 11,000 lines so I could detect lines with incomplete data and delete them and then read them in with 'scan'. I am using 1.7.0 on Windows. Here is the output from the script and it was using 51 seconds just to do the textConnection. Is there a limit on how large a text object can be to be used with 'textConnection'? ######## script output ################
x.1 <- scan("/mpstat.ssgdbsv4.030430.txt",what='',sep='\n')
Read 11299 items
str(x.1)
chr [1:11299] "8.3155 32 71 4 1907 122 0 1130 105 167 216 0 3686 32 13 37 18" ...
unix.time(x.in <- textConnection(x.1)) # this takes a long time
[1] 51.96 0.01 53.20 NA NA
sum(nchar(x.1)) # total number of characters in the vector
[1] 944525
unix.time(x.c <- count.fields(x.in)) # this goes pretty fast
[1] 0.14 0.00 0.14 NA NA
table(x.c) # detect incomplete lines
x.c
3 6 17
1 1 11297