How long does skipping in read.table take
On Sat, Oct 23, 2010 at 10:07 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
I just tried it:
for(i in 11:16){ #i<-11
?start<-Sys.time()
?print(start)
?flush.console()
?filename<-paste("skipped millions- ",i,".txt",sep="")
?mydata<-read.csv.sql("myfilel.txt", sep="|", eol="\r\n", sql =
"select * from file limit 1000000, (1000000*i-1)")
The SQL statement does not know anything about R variables. You would need something like this:
i <- 1
s <- sprintf("select from file limit 10, %d", 10*1-1)
s
[1] "select from file limit 10, 9"
read.csv.sql(..., sql = s, ...)
Also if you just want to read it in as chunks reading from a
connection in R would be sufficient:
k <- 5000 # no of rows per chunk
first <- TRUE
con <- file('myfile.csv', "r")
repeat {
# skip header
if (first) hdgs <- readLines(con, 1)
first <- FALSE
x <- readLines(con, k)
if (length(x) == 0) break
DF <- read.csv(textConnection(x), header = FALSE)
# process chunk -- we just print last row here
print(tail(DF, 1))
}
close(con)
Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com