Skip to content

read.table: how to ignore errors?

7 messages · Duncan Murdoch, R. Michael Weylandt, Rolf Turner +2 more

#
I get this error from read.table():
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 234 did not have 8 elements
The error is genuine (an extra field separator between 1st and 2nd element).

1. is there a way to see this bad line 234 from R without diving into the file?

2. is there a way to ignore the bad lines and get the data from the good
lines only (I do want to see the bad lines, but I don't want to stop all
work until some issue which causes 1% of data is resolved).

thanks.

Oh, yeah, a reproducible example:

read.csv from
=====
a,b
1,2
3,4
5,,6
7,8
=====
I want to be able to extract the data frame
  a b
1 1 1
2 3 4
3 7 8

and a list of strings of length 1 containing "5,,6".
#
On 24/01/2012 3:45 PM, Sam Steingold wrote:
You could use readLines.  Skip 233 lines, read one.
I think you would have to read the first part up to line 233, then read 
the part after line 234, then use rbind to join the two parts.  The 
latter might be tricky if you need a header line; it may be easiest to 
rewrite the file to a tempfile().

Duncan Murdoch
#
Given your domain name, you might also get some use out of the
system() and system2() commands which allow the passing of strings to
the OS command line (and thus the use of tools like grep/sed/awk
within R)

E.g., an idiom I use pretty frequently for interactive data analysis:
(not really related, but I think it makes a good example)

FunctionToAnalyzeSomething <- function(...){
    pdf("junk.pdf")

    # plot stuff

    dev.off()
    system(paste("open", getwd(), "junk.pdf", sep = " "))
    if(readline("Keep?") == "y") system("cp junk.pdf FileOutput.pdf")
    unlink("junk.pdf") # or system("rm junk.pdf")
}

I would imagine you could use tryCatch + as.character() to get the bad
line number, and then make a temp file without that line with Unix
tools, and read that in. Some sort of determined.read.table() wrapper
to read.table()...

Musing out loud...
Michael

On Tue, Jan 24, 2012 at 4:00 PM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:
#
On 25/01/12 09:45, Sam Steingold wrote:
Try:

xxx <- readLines("<filename>")
hhh <- read.csv(textConnection(xxx[1]),header=FALSE)
yyy <- hhh[-1,]
names(yyy) <- hhh[1,]
bad <- list()
j <- 0
for(i in 2:length(xxx)) {
     tmp <- read.csv(textConnection(xxx[i]),header=FALSE)
     if(ncol(tmp)==ncol(yyy)) yyy <- rbind(yyy,tmp) else {
         j <- j+1
         bad[[j]] <- tmp
     }
}
closeAllConnections()

HTH

     cheers,

         Rolf Turner
#
This is no good.
What if the data is compressed (or coming from a socket)?
What if the line is 233,000,000?
How do I extract that 234 number from the error message? is there an
exception object or something?
this is awkward. what if there are many errors there?

https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14793
#
The previous line should be '1 1 2', right?
Have you tried using count.fields to remove the lines
in the file with the wrong number of fields?  E.g.,
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  line 1 did not have 3 elements
a b
1 1 2
2 3 4
3 7 8

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
of course, thanks!
sounds good, thanks a lot!