Skip to content

how to bread while loop reading from connection with read.csv

3 messages · Collins, Stephen, Berend Hasselman, Duncan Murdoch

#
Hello,

I'm trying to read a file rows at a time, so as to not read the entire file into memory.? When reading the "connections" and "readLines" help, and "R help archive," it seems this should be possible with read.csv and a file connection, making use of the "nrows" argument, and checking where the "nrow()" of the new batch is zero rows.
What is the proper way to check the end-of-file condition with read.csv, such that I could break a while loop reading the data in?  

#example, make a test file
con <- file("test.csv","wt")
cat("a,b,c\n", "1,2,3\n", "4,5,6\n", "7,6,5\n", "4,3,2\n", "3,2,1\n",file=con)
unlink(con)

#show the file is valid
con <- file("test.csv","rt")
read.csv(con,header=T)
unlink(con)

#show that readLines ends with "character(0)", like expected
con <- file("test.csv","rt")
readLines(con,n=10)
readLines(con,n=10)
unlink(con)

#show that read.csv end with error
con <- file("test.csv","rt")
read.csv(con,header=T,nrows=10)
read.csv(con,header=F,nrows=10)
unlink(con)



Sincerely,

Stephen Collins
Predictive Modeler
Allstate Insurance Company
R version 2.15.0 (2012-03-30)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252?? 
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C????????????????????????? 
[5] LC_TIME=English_United States.1252??? 

attached base packages:
[1] stats???? graphics? grDevices utils???? datasets? methods?? base???? 

loaded via a namespace (and not attached):
[1] tools_2.15.0






Stephen Collins
Predictive Modeler
Quantitative Research & Analytics

Allstate Insurance Company
2775 Sanders Road, Suite D2W
Northbrook, IL 60062
t: 1+ 847 402 1465
e: stephen.collins at allstate.com
#
On 21-01-2013, at 16:56, "Collins, Stephen" <Stephen.Collins at allstate.com> wrote:

            
How about:

con <- file("test.csv","rt")
while( length(tmp <- readLines(con,n=10)) > 0 ) {
    qq <- read.csv(text=tmp, header=TRUE)
   # do something with qq
}
unlink(con)
qq


Berend
#
On 13-01-21 10:56 AM, Collins, Stephen wrote:
I don't think this is causing your problem, but unlink() seems like the 
wrong function to use here.  Don't you mean close()?
See the Value section of ?read.csv.  In particular,

"Empty input is an error unless col.names is specified, when a 0-row 
data frame is returned: similarly giving just a header line if header = 
TRUE results in a 0-row data frame. Note that in either case the columns 
will be logical unless colClasses was supplied."

Duncan Murdoch