Skip to content
Prev 239075 / 398502 Next

How long does skipping in read.table take

On Sat, Oct 23, 2010 at 10:52 AM, Dimitri Liakhovitski
<dimitri.liakhovitski at gmail.com> wrote:
Try smaller chunks.  Presumably R cannot handle chunks that large.

Also, you could use RSQLite or sqldf to set up a database and then
read from it.  Again, don't use chunks larger than what R can handle.
Here is a self contained example that you can copy and paste into an R
session.  It works on my Windows system but you might need to change
the eol if you are working on a different platform.  Reading the file
into the database is the slowest part but once its there the rest
should be reasonably fast.  Again be sure not to read such large
chunks at a time that R cannot handle them.

library(sqldf)

## create test file
numStr <- as.character(1:25)
DF <- data.frame(a = 1:25, 101:125)
write.table(DF, file = "myfile.csv", quote = FALSE, sep = ",",
	row.names = FALSE)

## define connection with attributes
myfile <- file("myfile.csv")
attr(myfile, "file.format") <- list(header = TRUE, sep = ",", eol = "\r\n")

## create new sqlite database
sqldf("attach 'mydb' as new")

## read file into mytab table of mydb database
sqldf("create table mytab as select * from myfile", dbname = "mydb")

## check that its there
sqldf("select * from sqlite_master", dbname = "mydb")
sqldf("select count(*) from mytab", dbname = "mydb")

# Read in 5 lines after skipping 10 rows.
sqldf("select * from mytab limit 5 offset 10", dbname = "mydb")