I've spent some time trying to wrap my head around reading in large csv
files with the ff-package. I think I know how to do it, but am bumping
into some problems. I've tried to recreate the issues as best as I can
with a smaller example and maybe someone can help explain the problems.
The following code just creates a csv file with an integer column,
character column and logical column.
-------------------------------------------------
library(ff)
#Create data
size = 2000
fake.data =
data.frame("Integer"=round(100000*runif(size)),"Character"=sample(LETTERS,size,replace=T),"Logical"=sample(c(T,F),size,replace=T))
#Write to csv
write.csv(fake.data,"data.csv",row.names=F)
-------------------------------------------------
Now to read it in as a 'ffdf' class, I can do the following:
-------------------------------------------------
data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",")
-------------------------------------------------
That works. But with my current large data set, read.csv.ffdf is debating
with me about the classes it's importing. I was also messing around with
the first.rows/next.rows, but that's a question for another time. So I'll
try to load the data in, specifying the column types (same exact command,
except with specifying colClasses):
-------------------------------------------------
data = read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows =
500, next.rows = 1005,sep=",",colClasses =
c("integer","integer","logical"))Error in scan(file, what, nmax,
sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'an integer', got '"J"'> data =
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses =
c("integer","character","logical"))Error in ff(initdata = initdata,
length = length, levels = levels, ordered = ordered, :
vmode 'character' not implemented> data =
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses = rep("character",3))Error in
ff(initdata = initdata, length = length, levels = levels, ordered =
ordered, :
vmode 'character' not implemented> data =
read.csv.ffdf(x=NULL,file="data.csv",nrows=1001,first.rows = 500,
next.rows = 1005,sep=",",colClasses = rep("raw",3))Error in scan(file,
what, nmax, sep, dec, quote, skip, nlines, na.strings, :
scan() expected 'a raw', got '8601'
-------------------------------------------------
I just can't find a combination of classes that will result in this reading
in. I really don't understand why the classes 'character' won't work for
all of them. Any thoughts as to why? I appreciate the help and time.
[[alternative HTML version deleted]]