Skip to content

End of File for binary files

9 messages · rn00b, jim holtman, Duncan Murdoch +1 more

#
Hello,

I'm new to R and I'm writing a function to read binary tables (the binary
version of read.table essentially). I'm having trouble figuring out how to
determine when I reach end-of-file. Can anybody please help?

thanks!
#
If you are reading a 'binary' file, then the end of file is when you
read the last byte; the system will tell you.  Exactly what are you
trying do to?  When you do a read, you typically request the number of
bytes to read and then the system returns the number of bytes read.
But since you did not give an indication of what/how you are trying to
do this, it is hard to make an exact suggestion.  Did you have some
experience that you think you did not get an end of file indication?
More information is required.
On Sat, Jan 23, 2010 at 9:40 PM, rn00b <forzatogo at gmail.com> wrote:

  
    
#
I am using readBin to continuously read characters from the binary file. I'm
trying to figure out how many characters are in the file. What I would like
to do is something like
(while! EOF)
{
charRead <-.Internal(readBin(con,"character",1L,NA,TRUE,swap))
i++
}

I'm not clear on how to determine the EOF condition in this case.
#
Are you really trying to read in binary?  You are asking for
characters which would be a null terminated string.  If you are trying
to read in binary zeroes, this will not work.  What you need to do is
to use 'raw'.  Actually you should create a R script to test out the
various conditions you want.  If you use 'raw', this will read in the
actual bytes in the file.  You can check to see if the length of the
vector is what you requested.  If it is not, then you have reached an
end of file.  It is easy enough to try out the various options to see
what happens.  So if you are trying to read binary, then use 'raw'.
If you are reading null terminated strings, then 'character' will
work.  If your file consists of binary integers, then use 'integer'
and make sure you know if the 'endian' of the data.  There are lots of
other cases/conditions depending on what you are trying to do.
On Sat, Jan 23, 2010 at 10:40 PM, rn00b <forzatogo at gmail.com> wrote:

  
    
#
On 23/01/2010 10:40 PM, rn00b wrote:
You should not be calling .Internal.  It's for internal use, subject to 
change, etc.

Using readBin(...)  you can detect EOF by reading fewer than n items 
when you ask for n.  So the loop would be something like

while (length(charRead <- readBin(con, "character")) > 0) {
   i <- i + 1
}

Duncan Murdoch
#
On Sun, Jan 24, 2010 at 3:56 AM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
Is this safe?  Is EOF the only case where readBin() returns fewer
elements than you requested?  Does it depend on the type of connection
you are reading from?  Does it depend on OS?

The help("readBin") does not say much about this, but it says:

"If readBin(what = character()) is used incorrectly on a file which
does not contain C-style character strings, warnings (usually many)
are given. From a file or connection, the input will be broken into
pieces of length 10000 with any final part being discarded."

which seems to suggest that you (at least in special cases) can get
fewer items than requested without hitting EOF.  EOF behavior might be
document elsewhere in R.

/Henrik
#
If you are using readBin and want to read binary (and not 'character')
data, then use 'raw' which will return a value if it encounters a
binary zero and not think it is the end of a character string.  It is
easy enough to setup a test file and try out a couple of the options
to see what happen.  Not enough information was presented in the
original posting to conclude what was really wanted.  Does the binary
file consist of just bytes, it is integer (32 or 64 bit), is it
floating point, etc.  Once you understand the structure of the data,
is it not difficult to read in the data.  Binary and 'character' are
not necessarily the same information when trying to read from a file.
On Sun, Jan 24, 2010 at 12:15 PM, Henrik Bengtsson <hb at stat.berkeley.edu> wrote:

  
    
#
On 24/01/2010 12:15 PM, Henrik Bengtsson wrote:
I read the passage above to say I might get more than n strings from a 
file containing n of them if they are too long; I don't see it saying 
that I would ever get fewer than n if there are n properly terminated 
strings remaining in the file.

EOF is not the only case in general where you'd get fewer than n strings 
(e.g. a non-blocking connection will only return what's in the buffer), 
but in the usual case of reading from a file, it should be safe.

If you are worried about some particular case, check the source code. 
It's always the final arbiter.

Duncan Murdoch
#
On 24/01/2010 2:12 PM, jim holtman wrote:
I'm not sure I understand what you mean by reading binary data, but you 
should also think about using readChar if you don't intend to read 
null-terminated strings.

Duncan Murdoch

  It is