Skip to content
Back to formatted view

Raw Message

Message-ID: <5C21D545-92EF-4240-B9FD-6450238C7EB4@comcast.net>
Date: 2012-08-13T07:28:53Z
From: David Winsemius
Subject: named character question
In-Reply-To: <CACxE24=ED_d8DtF1ZqjoX=wHbhxTaqpWnyWDukL-2TVU-2em9w@mail.gmail.com>

On Aug 12, 2012, at 8:33 PM, Erin Hodgess wrote:

> Dear R People:
>
> Here is a goofy question:
>
> I want to extract the zip code from an address and here is my work  
> so far:
>
>> add1
>                  results.formatted_address
> "200 W Rosamond St, Houston, TX 77076, USA"
>> add1[1][32:36]
> <NA> <NA> <NA> <NA> <NA>
>  NA   NA   NA   NA   NA
>> str(add1)
> Named chr "200 W Rosamond St, Houston, TX 77076, USA"
> - attr(*, "names")= chr "results.formatted_address"

 > ttt <- "200 W Rosamond St, Houston, TX 77076, USA"

 > sub("^.+,.+,\\s[[:alpha:]]*\\s([[:digit:]]{5}).+", "\\1", ttt)
[1] "77076"

You will need to determine if all you addresses have two commas before  
the two letter state designation. You may not need as specific a  
pattern as this. An alternate pattern.

 > sub("^.+\\s[[:alpha:]]{2}\\s([[:digit:]]{5}).+", "\\1", ttt)
[1] "77076"

-- 

David Winsemius, MD
Alameda, CA, USA