Skip to content

Read text file subsetting rows

3 messages · Charles C. Berry, Zev Ross

#
Hi All,

Can anyone direct me to a read function in R that will allow me to only 
read in rows of a text file that begin with a particular value such as 
the data below. I would read the entire file in and then limit, but the 
files were constructed such that the first two letters determine how 
many variables are in the row (different letters mean different numbers 
of columns and different column names/types).

I can do this in SAS, but I'd prefer to use R. The approximate SAS code 
is below with the key piece of code being "if rectype='RD'" then do.

Thoughts?

Zev


RD|I|01|073|0023|68103|5|7|017|810|20070103|00:00|0.6||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070106|00:00|9.5||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070109|00:00|2.5||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070112|00:00|13.7||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070115|00:00|7.3||3|||||||||||||
RA|I|01|073|0023|A334|5|7|017|810|20070118|00:00|3.7||3|||||||||||||
RD|I|01|073|0023|68103|5|7|017|810|20070121|00:00|6.9||3|||||||||||||
RC|I|01|073|0023|Quer|5|7|017|810|20070124|00:00|1.8||3|||||||||||||


infile 'C:\junk\RD_501_88101_2006-0.txt'
 dlm='|' firstobs=3 missover;
 rectype $2. @;
if rectype = 'RD' then do;
#
On Fri, 11 Apr 2008, Zev Ross wrote:

            
If your data are in 'tmp.dat':
V1    V2          V3
  RD:6   I:6   Min.   :1
               1st Qu.:1
               Median :1
               Mean   :1
               3rd Qu.:1
               Max.   :1

Alternatively, if you have 'grep' in your system and in the path:
See
 	?connection
 	?textConnection
 	?grep

HTH,

Chuck
Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
#
Chuck,

Thanks so much, these both work like a charm. The first method, though, 
is very, very slow for a large dataset (<100,000) while the second is 
reasonable in terms of speed. If you or anyone have any ideas for 
speeding up the import send them my way otherwise the:

con2 <- pipe( 'grep "^RD" tmp.dat' )
dat2 <- read.csv( con2, sep='|', header=FALSE)

works well!

Thank you,

Zev
Charles C. Berry wrote: