seek(), skip by bits (not by bytes) in binary file
If the structure really changes day by day, then you have to decipher how it is constructed in order to find the correct bit to go to.
If you think you already know which bit to go to, then the way you know is "the 3rd bit of the 71st byte", which means that the existing seek function should be sufficient to get that byte and pick apart the bits to get the ones you want.
I recommend using the hexBin package for this kind of task.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
Ben quant <ccquant at gmail.com> wrote:
Other people at my firm who know a lot about binary files couldn't figure out the parts of the file that I am skipping over. Part of the issue is that there are several different files (dbs extension files) like this that I have to process and the structures do change depending on the source of these files. In short, the problem is over my head and I was hoping to go right to the correct bit and read, which would make things much easier. I guess not... Thanks for your help though. Anyone else? thanks, ben On Tue, Jun 19, 2012 at 10:10 AM, jim holtman <jholtman at gmail.com> wrote:
I am not sure why reading through 'bit-by-bit' gets you to where you want to be. I assume that the file has some structure, even though
it
may be changing daily. You mentioned the various types of data that it might contain; are they all in 'byte' sized chucks? If you really have data that begins in the middle of a byte and then extends over several bytes, you will have to write some functions that will pull out this data and then reconstruct it into an object (e.g., integer, numeric, ...) that R understands. Can you provide some more definition of what the data actually looks like and how you would
find
the "pattern" of the data. Almost all systems read at the lowest level byte sized chucks, and if you really have to get down to the
bit
level to reconstruct the data, then you have to write the unpack/pack functions. This can all be done once you understand the structure of the data. So some examples would be useful if you want someone to propose a solution. On Tue, Jun 19, 2012 at 11:54 AM, Ben quant <ccquant at gmail.com>
wrote:
Hello, Has a function been built that will skip to a certain bit in a
binary
file?
As of 2009 the answer was 'no': http://r.789695.n4.nabble.com/read-binary-file-seek-td900847.html https://stat.ethz.ch/pipermail/r-help/2009-May/199819.html If you feel I don't need to (like in the links above), please
provide
some
help. (Note this is my first time working with binary files.) I'm still working on the script, but here is where I am right now.
The
for
loop is being used because: 1) I have to get down to correct position then get the info I
want/need.
The stuff I am reading through (x) is not fully understood and it
is a
mix
of various chars, floats, integers, etc. of various sizes etc. so I
don't
know who many bytes to read in unless I read them bit by bit. (The information and structure of the information changes daily so I'm
skipping
over it.) 2) If I skip all in one readBin() my 'n' value is often up to 20
times
too
big (I get an error) and/or R won't let me "allocate a vector of
size...."
etc. So I split it up into chunks (divide by 20 etc.) and read each
chuck
then trash each part that is readBin()'d. Then the last line I get
the
data
that I want (data1). Here is my working code: # I have to read 'junk' bits from the to.read file which is huge
integer
so
I divide it up and loop through to.read in parts (jb_part). divr = 20 mod = junk %% divr jb_part = as.integer(junk/divr) jb_part_mod = jb_part + mod # catch the remainder/modulus to.read = file(paste(dbs_path,"/",dbs_file,sep=""),"rb") # connect
to
the
binary file
# loop in chunks to where I want to be
for(i in 1:(divr-1)){
x = readBin(to.read,"raw",n=jb_part,size=1)
x = NULL # trash the result b/c I don't want it
}
# read a a little more to include the remainder/modulus bits left
over by
dividing by 20 above x = readBin(to.read,'raw',n=jb_part_mod,size=1) x = NULL # trash it # finally get the data that I want data1 = readBin(to.read,double(),n=some_number,size=size_to_use) This works, but it is SLOW! Any ideas on how to get down to the
correct
bit a bit quicker (pun intended). :)
Thanks for any help!
Ben
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.