Back to formatted view
Raw Message

Message-ID: <CAP01uRmCYviO7ab=c1Ea=fjUEC6Xh4QLQs08gMrQHFxtOxNwEg@mail.gmail.com>
Date: 2013-01-14T10:47:16Z
From: Gabor Grothendieck
Subject: Grabbing Specific Words from Content (basic text mining)
In-Reply-To: <CAGuusR-ivuv9F-f5g2mxrhzb3pLCUxcgY3qVt54bRFJ6UzAT3A@mail.gmail.com>

On Mon, Jan 14, 2013 at 4:30 AM, Sachinthaka Abeywardana
<sachin.abeywardana at gmail.com> wrote:
> Hi all,
>
> Suppose I have a data frame with mixed content (name age and address).
>
> a<-"Name: John Smith Age: 35 Address: 32, street, sub, something"
> b<-data.frame(a)
>
> 1. The question is I want to extract the name age and
> address separately from this data frame (containing potentially more
> people).
>
> 2. Also just incase I have to deal with it how would the syntax change if I
> had "Name" as opposed to "Name:" (without the colon).
>

Try this:


> library(gsubfn)
>
> a <- "Name: John Smith Age: 35 Address: 32, street, sub, something"
> b <- data.frame(a)
> strapplyc(as.character(b$a), "Name: (.*) Age: (.*) Address: (.*)")
[[1]]
[1] "John Smith"                 "35"
[3] "32, street, sub, something"
>
>
> a. <- "Name John Smith Age 35 Address 32, street, sub, something"
> b. <- data.frame(a.)
> strapplyc(as.character(b.$a.), "Name (.*) Age (.*) Address (.*)")
[[1]]
[1] "John Smith"                 "35"
[3] "32, street, sub, something"

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com