column-binary data
Thanks for the replies. That's not quite what I meant. These data are multipunched to allow more than one variable to be coded in the same column. For example, the first 7 columns of the first card of the data I'm trying to read contain the following: Column Rows Description ------------------------------- 1-5 Serial number 6 Card number 7 Y,X Sex of respondent 7 0-3 Marital status 7 4-9 Occupational status I happen to know that the actual punches for the first respondent are 00001,1,Y,1,4. When I use
ip <- readBin(ff,what="raw",n=14,signed=FALSE)
I get
08 00 08 00 08 00 08 00 04 00 04 00 24 20
for these seven columns. When I use raw2bin from package caTools I get:
binip <- raw2bin(ip,"integer",size=2) 8 8 8 8 4 4 8228
Now I can see that the relationship between binary numbers and punches is this:
binary punch binary punch
------------------ ------------------------
1 3 256 9
2 2 512 8
4 1 1024 7
8 0 2048 6
16 X 4096 5
32 Y 8192 4
64 16384
128 32768
I can also see that the binary value for column 7 (8228) is equal to the sum of the values for each of the three punches in that column (Y=32 + 1=4 + 4=8192), but what I don't get is how I can get R to work out the punches either from the raw values or from the binary values. If anyone can suggest anything I would be very, very grateful!
David
-----Original Message-----
From: Ted Harding [mailto:Ted.Harding at nessie.mcc.ac.uk]
Sent: Fri 9/16/2005 14:31
To: E-Mail
Cc: David Barron
Subject: Re: [R] column-binary data
On 16-Sep-05 jim holtman wrote:
Each card column had 12 rows, so as binary it comes in as 12 bits. The question is does this come as a 16 bit integer, or a string of 12 bits that I have to extract from. Either case is not that difficult to do.
Indeed ... as an example of how one could proceed, I "deconstruct" my example below (see at end).
On 9/16/05, Ted Harding <Ted.Harding at nessie.mcc.ac.uk> wrote:
On 16-Sep-05 David Barron wrote:
I have a number of datasets that are multipunch column-binary format. Does anyone have any advice on how to read this into R? Thanks. David
Do you mean something like the old HOLLERITH PUNCHED CARD BINARY FORMAT? 1111111110111111101111011111101111110 0000000001000000010000100000010000001 0000010100110000000010000001100010011 1111001010001010000000001100100101001 0111100100011001100001000100001101011 0100010000001100001010010101001110001 0100101000010101001100001010100101101 (here "1" = hole in card, binary representation of 7-bit ASCII encoding, high-order bit on top).
#First, construct a vector ASCII consiting of the printable
#characters:
ASCII<-c(" ","!","\"","#","$","%","&","'","(",")",
"*","+",",","-",".","/","0","1","2","3",
"4","5","6","7","8","9",":",";","<","=",
">","?","@","A","B","C","D","E","F","G",
"H","I","J","K","L","M","N","O","P","Q",
"R","S","T","U","V","W","X","Y","Z","[",
"\\","]","^","_","`","a","b","c","d","e",
"f","g","h","i","j","k","l","m","n","o",
"p","q","r","s","t","u","v","w","x","y",
"z","{","|","}","~")
#Next, a vector of powers of 2:
rad<-2^(6:0)
#Read in the data from stdin():
M<-t(matrix(as.integer(unlist((strsplit(scan(stdin(),
what="character"),split="")))),ncol=7))
#(read 7 lines from stdin by copy&paste:
#1: 1111111110111111101111011111101111110
#2: 0000000001000000010000100000010000001
#3: 0000010100110000000010000001100010011
#4: 1111001010001010000000001100100101001
#5: 0111100100011001100001000100001101011
#6: 0100010000001100001010010101001110001
#7: 0100101000010101001100001010100101101
#8:
#Read 7 items
#and convert the columns to ASCII codes:
R<-rad%*%M
#and see what you've got:
paste(ASCII[R-31],collapse="")
#[1] "HOLLERITH PUNCHED CARD BINARY FORMAT?"
The above can be adapted to whatever your binary data represent
and to how they are laid out in the input.
Others may find a slicker way of doing this.
The only fly in the above ointment is that I haven't located
in R a character-vector constant which consists of the printable
ASCII characters, or a function to convert numerical ASCII code
to characters, so I made my own.
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 16-Sep-05 Time: 22:26:16
------------------------------ XFMail ------------------------------