Skip to content
Prev 284869 / 398502 Next

Getting codebook data into R

I didn't have time at work to look at this, but here is one possible approach.  I did not look at how the code book file was actually structured; I just took what you presented above, cleaned it up a bit (like this) 

'caseid',1,12,int
'nbrnaliv',22,22,int
'babysex',56,56,int
'birthwgt_lb',57,58,int
'birthwgt_oz',59,60,int
'prglength',275,276,int
'outcome',277,277,int
'birthord',278,279,int
'agepreg',284,287,int
'finalwgt',423,440,float

and copied it to the clipboard.  Then read it in using the following syntax

## read in data layout
codebook <- read.table('clipboard', sep=',', as.is=TRUE)

I will leave it to you to determine how you want to get the code book into your R session.  Having done this, one can compute the fields widths and the numbers of columns to skip between fields and then build a command to read in the data.  Something like this should get you started

## get number of rows in code book
nr <- nrow(codebook)
## provide names for codebook layout data frame
names(codebook) <- c('variable','begin','end','type')

## compute number of columns to read (and skip) for each variable
## store in the vector read.col
# compute field widths
codebook$width <- codebook$end - codebook$begin + 1

# compute columns to skip between end of one field and 
# beginning of next field
codebook$skip <- c(codebook$begin[-1]-codebook$end[-nr]-1,0)

## create zero length numeric vector for holding column widths
## (required by read.fwf) to read and skip, and populate the vector
read.col <- numeric()
for(i in 1:nr){
  read.col <- c(read.col,codebook$width[i])
  if(codebook$skip[i] > 0) read.col <- c(read.col,-codebook$skip[i])
}

## recode type values to R classes
codebook$Rtype <- ifelse(codebook$type %in% c('int','float'),'numeric', 'character')

## now read in the data
fwfdata <- read.fwf('c:/tmp/testpreg.txt', col.names=codebook$variable, 
                     widths=read.col, colClasses=codebook$Rtype)


The code is clearly not bullet proof and there is no error checking, etc.  However, it does the job, given the information you provided is accurate.  If you wanted, you could wrap it all up in a function and pass the data filename and code book name as parameters.


Hope this is helpful,

Dan

Daniel Nordlund
Bothell, WA USA