Skip to content

named character question

7 messages · Erin Hodgess, R. Michael Weylandt, Joshua Wiley +3 more

#
Dear R People:

Here is a goofy question:

I want to extract the zip code from an address and here is my work so far:
results.formatted_address
"200 W Rosamond St, Houston, TX 77076, USA"
<NA> <NA> <NA> <NA> <NA>
  NA   NA   NA   NA   NA
Named chr "200 W Rosamond St, Houston, TX 77076, USA"
 - attr(*, "names")= chr "results.formatted_address"
What am I not seeing, please?

Thanks,
Erin
#
It's best if you make these things available to us using dput() in the future.

You're probably looking for the substr() function.

Since _strings_ (not characters) in R are "primitive" (Not in the
primitive/internal sense: just in the primordial sense) you can't
subset them with the brackets operators: what you're doing is
something closer to

x <- 1:5

x[30: 35]

Cheers,
Michael
On Sun, Aug 12, 2012 at 10:33 PM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:
#
Hi Erin,

The first element of the character vector is a string.  You cannot
extract specifically characters from a string; try something like
?nchar

or perhaps better use regular expressions to extract things between
commas after two characters (or whatever logical rule accurately gets
the zip code).

Cheers,

Josh
On Sun, Aug 12, 2012 at 8:33 PM, Erin Hodgess <erinm.hodgess at gmail.com> wrote:

  
    
#
HI,

Try this:
add11<-strsplit(add1,split=",")
?gsub("TX","",add11[[1]][3])
#[1] "? 77076"
A.K.


----- Original Message -----
From: Erin Hodgess <erinm.hodgess at gmail.com>
To: R help <r-help at stat.math.ethz.ch>
Cc: 
Sent: Sunday, August 12, 2012 11:33 PM
Subject: [R]  named character question

Dear R People:

Here is a goofy question:

I want to extract the zip code from an address and here is my work so far:
? ? ? ? ? ? ? ? ? results.formatted_address
"200 W Rosamond St, Houston, TX 77076, USA"
<NA> <NA> <NA> <NA> <NA>
? NA?  NA?  NA?  NA?  NA
Named chr "200 W Rosamond St, Houston, TX 77076, USA"
- attr(*, "names")= chr "results.formatted_address"
What am I not seeing, please?

Thanks,
Erin
#
On Aug 12, 2012, at 8:33 PM, Erin Hodgess wrote:

            
> ttt <- "200 W Rosamond St, Houston, TX 77076, USA"

 > sub("^.+,.+,\\s[[:alpha:]]*\\s([[:digit:]]{5}).+", "\\1", ttt)
[1] "77076"

You will need to determine if all you addresses have two commas before  
the two letter state designation. You may not need as specific a  
pattern as this. An alternate pattern.

 > sub("^.+\\s[[:alpha:]]{2}\\s([[:digit:]]{5}).+", "\\1", ttt)
[1] "77076"
#
You are treating add1 as a vector of characters. If you want the zipcode and
you know what positions it is within the string use

substr(add1[1], 32, 36)

If you don't know, you could use (but it will get any 5 digit number):

regmatches(add1, regexpr("[[:digit:]]{5}", add1))

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
HI,

One more method to extract the code:
add1<-"200 W Rosamond St, Houston, TX 77076, USA"
?sub(".*\\s.*\\s.*\\s.*\\s.*\\s.*\\s([[:digit:]]{5}).*","\\1",add1)
#[1] "77076"

#or,
?sub(".*\\s+([[:digit:]]{5}).*","\\1",ttt)
#[1] "77076"


A.K.





----- Original Message -----
From: Erin Hodgess <erinm.hodgess at gmail.com>
To: R help <r-help at stat.math.ethz.ch>
Cc: 
Sent: Sunday, August 12, 2012 11:33 PM
Subject: [R]  named character question

Dear R People:

Here is a goofy question:

I want to extract the zip code from an address and here is my work so far:
? ? ? ? ? ? ? ? ? results.formatted_address
"200 W Rosamond St, Houston, TX 77076, USA"
<NA> <NA> <NA> <NA> <NA>
? NA?  NA?  NA?  NA?  NA
Named chr "200 W Rosamond St, Houston, TX 77076, USA"
- attr(*, "names")= chr "results.formatted_address"
What am I not seeing, please?

Thanks,
Erin