Hello,
I'm trying to read a file rows at a time, so as to not read the entire file into memory.? When reading the "connections" and "readLines" help, and "R help archive," it seems this should be possible with read.csv and a file connection, making use of the "nrows" argument, and checking where the "nrow()" of the new batch is zero rows.
From certain posts, it seemed that read.csv should return "character(0)" when the end of file is reached, and there are no more rows to read.? Instead, I get an error there are "no lines available for input." Have I made a mistake with the file, or calling read.csv??
What is the proper way to check the end-of-file condition with read.csv, such that I could break a while loop reading the data in?
#example, make a test file
con <- file("test.csv","wt")
cat("a,b,c\n", "1,2,3\n", "4,5,6\n", "7,6,5\n", "4,3,2\n", "3,2,1\n",file=con)
unlink(con)
#show the file is valid
con <- file("test.csv","rt")
read.csv(con,header=T)
unlink(con)
#show that readLines ends with "character(0)", like expected
con <- file("test.csv","rt")
readLines(con,n=10)
readLines(con,n=10)
unlink(con)
#show that read.csv end with error
con <- file("test.csv","rt")
read.csv(con,header=T,nrows=10)
read.csv(con,header=F,nrows=10)
unlink(con)
Sincerely,
Stephen Collins
Predictive Modeler
Allstate Insurance Company
sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252??
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C?????????????????????????
[5] LC_TIME=English_United States.1252???
attached base packages:
[1] stats???? graphics? grDevices utils???? datasets? methods?? base????
loaded via a namespace (and not attached):
[1] tools_2.15.0
Stephen Collins
Predictive Modeler
Quantitative Research & Analytics
Allstate Insurance Company
2775 Sanders Road, Suite D2W
Northbrook, IL 60062
t: 1+ 847 402 1465
e: stephen.collins at allstate.com
On 21-01-2013, at 16:56, "Collins, Stephen" <Stephen.Collins at allstate.com> wrote:
Hello,
I'm trying to read a file rows at a time, so as to not read the entire file into memory. When reading the "connections" and "readLines" help, and "R help archive," it seems this should be possible with read.csv and a file connection, making use of the "nrows" argument, and checking where the "nrow()" of the new batch is zero rows.
From certain posts, it seemed that read.csv should return "character(0)" when the end of file is reached, and there are no more rows to read. Instead, I get an error there are "no lines available for input." Have I made a mistake with the file, or calling read.csv?
What is the proper way to check the end-of-file condition with read.csv, such that I could break a while loop reading the data in?
#example, make a test file
con <- file("test.csv","wt")
cat("a,b,c\n", "1,2,3\n", "4,5,6\n", "7,6,5\n", "4,3,2\n", "3,2,1\n",file=con)
unlink(con)
#show the file is valid
con <- file("test.csv","rt")
read.csv(con,header=T)
unlink(con)
#show that readLines ends with "character(0)", like expected
con <- file("test.csv","rt")
readLines(con,n=10)
readLines(con,n=10)
unlink(con)
#show that read.csv end with error
con <- file("test.csv","rt")
read.csv(con,header=T,nrows=10)
read.csv(con,header=F,nrows=10)
unlink(con)
How about:
con <- file("test.csv","rt")
while( length(tmp <- readLines(con,n=10)) > 0 ) {
qq <- read.csv(text=tmp, header=TRUE)
# do something with qq
}
unlink(con)
qq
Berend
Hello,
I'm trying to read a file rows at a time, so as to not read the entire file into memory. When reading the "connections" and "readLines" help, and "R help archive," it seems this should be possible with read.csv and a file connection, making use of the "nrows" argument, and checking where the "nrow()" of the new batch is zero rows.
From certain posts, it seemed that read.csv should return "character(0)" when the end of file is reached, and there are no more rows to read. Instead, I get an error there are "no lines available for input." Have I made a mistake with the file, or calling read.csv?
What is the proper way to check the end-of-file condition with read.csv, such that I could break a while loop reading the data in?
#example, make a test file
con <- file("test.csv","wt")
cat("a,b,c\n", "1,2,3\n", "4,5,6\n", "7,6,5\n", "4,3,2\n", "3,2,1\n",file=con)
unlink(con)
I don't think this is causing your problem, but unlink() seems like the
wrong function to use here. Don't you mean close()?
#show the file is valid
con <- file("test.csv","rt")
read.csv(con,header=T)
unlink(con)
#show that readLines ends with "character(0)", like expected
con <- file("test.csv","rt")
readLines(con,n=10)
readLines(con,n=10)
unlink(con)
#show that read.csv end with error
con <- file("test.csv","rt")
read.csv(con,header=T,nrows=10)
read.csv(con,header=F,nrows=10)
unlink(con)
See the Value section of ?read.csv. In particular,
"Empty input is an error unless col.names is specified, when a 0-row
data frame is returned: similarly giving just a header line if header =
TRUE results in a 0-row data frame. Note that in either case the columns
will be logical unless colClasses was supplied."
Duncan Murdoch