An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130114/07c91711/attachment-0001.pl>
Grabbing Specific Words from Content (basic text mining)
5 messages · Sachinthaka Abeywardana, Oliver Keyes, Manjusha Joshi +2 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130114/f387aa89/attachment.pl>
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20130114/23c9daf1/attachment.pl>
On Mon, Jan 14, 2013 at 4:30 AM, Sachinthaka Abeywardana
<sachin.abeywardana at gmail.com> wrote:
Hi all, Suppose I have a data frame with mixed content (name age and address). a<-"Name: John Smith Age: 35 Address: 32, street, sub, something" b<-data.frame(a) 1. The question is I want to extract the name age and address separately from this data frame (containing potentially more people). 2. Also just incase I have to deal with it how would the syntax change if I had "Name" as opposed to "Name:" (without the colon).
Try this:
library(gsubfn) a <- "Name: John Smith Age: 35 Address: 32, street, sub, something" b <- data.frame(a) strapplyc(as.character(b$a), "Name: (.*) Age: (.*) Address: (.*)")
[[1]] [1] "John Smith" "35" [3] "32, street, sub, something"
a. <- "Name John Smith Age 35 Address 32, street, sub, something" b. <- data.frame(a.) strapplyc(as.character(b.$a.), "Name (.*) Age (.*) Address (.*)")
[[1]] [1] "John Smith" "35" [3] "32, street, sub, something" -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
HI,
YOu could do either:
Lines<-readLines(textConnection("Name: John Smith Age: 35 Address: 32, street, sub, something
Name Adam Grey Age: 25 Address: 26, street, sub, something"))??
?Lines[-grep("Name\\:",Lines)]<-gsub("Name","Name:",Lines[-grep("Name\\:",Lines)])
?Name<-gsub("Name\\: (.*) Age\\: (.*) Address\\: (.*)","\\1",Lines)
?age<-gsub("Name\\: (.*) Age\\: (.*) Address\\: (.*)","\\2",Lines)
?Address<-gsub("Name\\: (.*) Age\\: (.*) Address\\: (.*)","\\3",Lines)
?dat1<-data.frame(Name,age,Address,stringsAsFactors=FALSE)
?dat1
dat1
?# ????? Name age??????????????????? Address
#1 John Smith? 35 32, street, sub, something
#2? Adam Grey? 25 26, street, sub, something
#or
?Lines[-grep("Name\\:",Lines)]<-gsub("Name","Name:",Lines[-grep("Name\\:",Lines)])
res<-read.table(text=gsub("Name|Age|Address","",Lines),sep=":",stringsAsFactors=F)[-1]
res[sapply(res,is.character)]<-do.call(cbind,lapply(res[sapply(res,is.character)],function(x) sub("^[[:space:]]*(.*?)[[:space:]]*$","\\1",x)))
?str(res)
#'data.frame':??? 2 obs. of? 3 variables:
# $ V2: chr? "John Smith" "Adam Grey"
# $ V3: num? 35 25
# $ V4: chr? "32, street, sub, something" "26, street, sub, something"
A.K.
----- Original Message -----
From: Sachinthaka Abeywardana <sachin.abeywardana at gmail.com>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc:
Sent: Monday, January 14, 2013 4:30 AM
Subject: [R] Grabbing Specific Words from Content (basic text mining)
Hi all,
Suppose I have a data frame with mixed content (name age and address).
a<-"Name: John Smith Age: 35 Address: 32, street, sub, something"
b<-data.frame(a)
1. The question is I want to extract the name age and
address separately from this data frame (containing potentially more
people).
2. Also just incase I have to deal with it how would the syntax change if I
had "Name" as opposed to "Name:" (without the colon).
Any thoughts are much appreciated.
Thanks,
Sachin
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.