Skip to content
Prev 79085 / 398502 Next

how to import such data to R?

On Sat, 2005-10-15 at 23:54 +0800, ronggui wrote:
There may be an easier way, but here is one possible approach:

First, use scan to read in the data. Set the 'what' argument to a list
of atomic data types, based upon your specs above. Also, set the
'na.names' argument to '.'.

This will read in the multiple lines for each record, into a single
record based upon there being 23 elements per record. That is based upon
'length(what)'.  Note also the 'multi.line' argument in scan().

data <- scan("data.txt", 
             what = c(rep(list(numeric(0)), 19), 
                      list(character(0)), 
                      rep(list(numeric(0)), 3)), 
             na.strings = ".")


'data' is now a list of values, where each list element is a proper
column from your original data file. Now use as.data.frame(), which will
take each list element and turn it into a column in a data frame.
preserving the data types.

data <- as.data.frame(data)


Now, read in the column names for the data frame from a text file,
containing your field names above, and set the data frame column names
to these.

Names <- scan("names.txt", what = character(0))
names(data) <- Names


Now review the structure of 'data':
year  apps top25 ver500 mth500 stufac bowl btitle finfour    lapps
1 1992  6245    49     NA     NA     20    1      0       0 8.739536
2 1993  7677    58     NA     NA     15    1      0       0 8.945984
3 1992 13327    57     36     58     16    0      0       0 9.497547
4 1993 19860    57     36     58     16    1      1       0 9.896463
5 1992 10422    37     28     58     20    0      0       0 9.251675
  d93 avg500 cfinfour    clapps cstufac cbowl cavg500 cbtitle  lapps_1
1   0     NA       NA        NA      NA    NA      NA      NA       NA
2   1     NA        0 0.2064476      -5     0      NA       0 8.739536
3   0     47       NA        NA      NA    NA      NA       0       NA
4   1     47        0 0.3989162       0     1       0       1 9.497547
5   0     43       NA        NA      NA    NA      NA      -1       NA
         school ctop25 bball cbball
1       alabama     NA     0     NA
2       alabama      9     0      0
3       arizona     NA     0     NA
4       arizona      0     1      1
5 arizona state     NA     0     NA


HTH,

Marc Schwartz