Regular Expressions for "Large" Data Set

Marc Schwartz · 2011-06-07T21:13:44Z

On Jun 7, 2011, at 3:55 PM, Abraham Mathew wrote: > I'm running R 2.13 on Ubuntu 10.10 > > I have a data set which is comprised of character strings. > > site = readLines('http://www.census.gov/tiger/tms/gazetteer/zips.txt') > > dat dat > > I want to loop through the data and construct a data frame with the zip > code, > state abbreviation, and city name in seperate columns. Given the size of > this > data set, I was wonder

Marc Schwartz

Tue, Jun 7, 2011 2:13 PM

On Jun 7, 2011, at 3:55 PM, Abraham Mathew wrote:

Since the original text file is a CSV file (without a header), just use:

user  system elapsed 
  0.385   0.033   1.832

'data.frame':	29470 obs. of  8 variables:
 $ V1: int  1 1 1 1 1 1 1 1 1 1 ...
 $ V2: int  35004 35005 35006 35007 35010 35014 35016 35019 35020 35023 ...
 $ V3: Factor w/ 51 levels "AK","AL","AR",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ V4: Factor w/ 16698 levels "02821","04465",..: 150 168 180 7710 10434 348 547 812 1250 7044 ...
 $ V5: num  86.5 87 87.2 86.8 86 ...
 $ V6: num  33.6 33.6 33.4 33.2 32.9 ...
 $ V7: int  6055 10616 3205 14218 19942 3062 13650 1781 40549 39677 ...
 $ V8: num  0.001499 0.002627 0.000793 0.003519 0.004935 ...

V1    V2 V3         V4       V5       V6    V7       V8
1  1 35004 AL      ACMAR 86.51557 33.58413  6055 0.001499
2  1 35005 AL ADAMSVILLE 86.95973 33.58844 10616 0.002627
3  1 35006 AL      ADGER 87.16746 33.43428  3205 0.000793
4  1 35007 AL   KEYSTONE 86.81286 33.23687 14218 0.003519
5  1 35010 AL   NEW SITE 85.95109 32.94145 19942 0.004935
6  1 35014 AL     ALPINE 86.20893 33.33116  3062 0.000758


HTH,

Marc Schwartz

Regular Expressions for "Large" Data Set

Thread (2 messages)