On Fri, 11 Apr 2008, Zev Ross wrote:
Hi All,
Can anyone direct me to a read function in R that will allow me to only
read in rows of a text file that begin with a particular value such as
the data below. I would read the entire file in and then limit, but the
files were constructed such that the first two letters determine how
many variables are in the row (different letters mean different numbers
of columns and different column names/types).
I can do this in SAS, but I'd prefer to use R. The approximate SAS code
is below with the key piece of code being "if rectype='RD'" then do.
Thoughts?
If your data are in 'tmp.dat':
txt <- readLines( "tmp.dat" ) con <- textConnection( grep( "^RD",
txt, value=TRUE ) )
dat <- read.csv( con, sep='|', header=FALSE)
close(con)
summary( dat[ , 1:3 ] )
V1 V2 V3
RD:6 I:6 Min. :1
1st Qu.:1
Median :1
Mean :1
3rd Qu.:1
Max. :1
Alternatively, if you have 'grep' in your system and in the path:
con2 <- pipe( 'grep "^RD" tmp.dat' )
dat2 <- read.csv( con2, sep='|', header=FALSE)
See
?connection
?textConnection
?grep
HTH,
Chuck
Zev
RD|I|01|073|0023|68103|5|7|017|810|20070103|00:00|0.6||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070106|00:00|9.5||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070109|00:00|2.5||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070112|00:00|13.7||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070115|00:00|7.3||3|||||||||||||
RA|I|01|073|0023|A334|5|7|017|810|20070118|00:00|3.7||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070121|00:00|6.9||3|||||||||||||
RC|I|01|073|0023|Quer|5|7|017|810|20070124|00:00|1.8||3|||||||||||||
infile 'C:\junk\RD_501_88101_2006-0.txt'
dlm='|' firstobs=3 missover;
rectype $2. @;
if rectype = 'RD' then do;
--
Zev Ross
ZevRoss Spatial Analysis
303 Fairmount Ave
Ithaca, NY 14850
607-277-0004 (phone)
866-877-3690 (fax, toll-free)
zev at zevross.com