Yesterday I changed the headers for a couple of columns in data text files and removed hyphens from within character strings, too. When I tried to re-read these data sources using read.table() I encountered an issue I've not before seen. Both files were read almost instantly until yesterday's wording changes. Now both files seem to cause R to hang. Rather than having the prompt immediately returned nothing happens. In emacs the 'working' symbol appears but the read.table() function does not complete. What might cause this? Rich
read.table() Issue
5 messages · Rich Shepard, William Dunlap
On Wed, 1 Aug 2012, Rich Shepard wrote:
What might cause this?
I restored these two files from last Friday and they are read into R with no problems. So, I'll make one change at a time and see where things break. Will post results when I have them. Rich
On Wed, 1 Aug 2012, Rich Shepard wrote:
What might cause this?
Must be computers acting like computers. Restored files from backup, made changes one at a time, and there are no problems reading them into R data frames. My apologies for taking up space here. Rich
An unmatched quote can make read.table run very slowly when there are lots of lines in the file. E.g.,
z <- rep("A B C", 10^6)
z[2] <- "A \"B C" # unmatched quote on line 2
tf <- tempfile()
cat(file=tf, sep="\n", z)
system.time(z2 <- read.table(tf, skip=2)) # skip bad line
user system elapsed 0.860 0.028 0.887
str(z2)
'data.frame': 999998 obs. of 3 variables: $ V1: Factor w/ 1 level "A": 1 1 1 1 1 1 1 1 1 1 ... $ V2: Factor w/ 1 level "B": 1 1 1 1 1 1 1 1 1 1 ... $ V3: Factor w/ 1 level "C": 1 1 1 1 1 1 1 1 1 1 ...
system.time(z1 <- read.table(tf, skip=1))
[ no return for several minutes on a 64-bit Linux machine ] On smaller files it quickly gives the error "line 1 did not have 4 elements", along with a warning "incomplete final line found by readTableHeader ...". Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rich Shepard Sent: Wednesday, August 01, 2012 10:52 AM To: r-help at r-project.org Subject: [R] read.table() Issue Yesterday I changed the headers for a couple of columns in data text files and removed hyphens from within character strings, too. When I tried to re-read these data sources using read.table() I encountered an issue I've not before seen. Both files were read almost instantly until yesterday's wording changes. Now both files seem to cause R to hang. Rather than having the prompt immediately returned nothing happens. In emacs the 'working' symbol appears but the read.table() function does not complete. What might cause this? Rich
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Wed, 1 Aug 2012, William Dunlap wrote:
An unmatched quote can make read.table run very slowly when there are lots of lines in the file. E.g.,
Bill, Yes. Turns out that there was no closing quote on a changed header. I found this by an error message on one data file; the other data file didn't generate an error for me to see. Thanks very much, Rich