Skip to content

Could "incomplete final line found" be more serious than a warning?

3 messages · Michael Bärtl, Peter Dalgaard, Zhou Fang

#
Dear all,

I've been successfully reading Web of Science-data from tab-delimited 
text files into a data.frame using an R-script based on readLines().
With new data I just downloaded I suddenly get this warning:

   incomplete final line found

I know this warning has already been discussed numerous times but none 
of the previously suggested solutions worked for me, unfortunately; so 
please bear with me:

I shut the warning down using "warn = FALSE", but the data still won't 
get read so this seems to be more serious than a warning.

Adding a blank line or two at the end of the file did NOT help, i.e. R 
still does not read the file.

But my old files still work properly, though.
So I opened the text files using Notepad++ and saw that the last lines 
of both old text files (i.e. working) as well as new text files (i.e. 
the ones that don't work for some reason) always end with a tab stop 
followed by a line break. Personally I couldn't tell any difference 
between the ways these files ended. Their endings looked identical to me.

I was using R 2.14.0 (64 bit) on Windows when I dioscovered the problem. 
So I upgraded to 2.15.0 (64-bit) but the problem persists.

You can see small examples of an old and new file at 
https://www.dropbox.com/s/2joadjo9ce86rij/WoS-old.txt and 
https://www.dropbox.com/s/lp9l1exx4mfws1s/WoS-new.txt, respectively.

Does anybody happen to have an idea of what could cause these problems 
for me?

Thank you very much for your consideration!
#
(Original below)

Looks like someone had the bright idea of changing it to 16-bit UTF, so every 2nd byte is NUL. It works for me with 

x <- readLines(file("~/Downloads/WoS-new.txt", encoding="UTF-16"))

(except that for some reason, x won't print properly although each individual line prints fine. Never mind, who cares as long as it reads...)

-pd

PS: The reason the printing is wacky is that one line has 148934 characters in it and the print routines pad all lines to the maximum length. Not sure what the point is in that.
On May 22, 2012, at 18:26 , Michael B?rtl wrote:

            

  
    
#
If you look at the new file in raw mode, you'll see that it's chock full of
ASCII nuls, while the old file has none. This is probably what's giving you
the problems, because R does not allow strings containing embedded nul
characters. (I believe this is because Nul in strings is pretty dangerous in
programming, because they are often used to delimit the end of strings, and
so allowing you to read it in directly can be used for various code
injection exploits.)

To read the new data files, you need some way of dealing with the file as a
raw stream, and stripping out all the nul characters before converting back
to character. Investigate ?readBin...

Zhou

--
View this message in context: http://r.789695.n4.nabble.com/Could-incomplete-final-line-found-be-more-serious-than-a-warning-tp4630932p4630944.html
Sent from the R help mailing list archive at Nabble.com.