Problem with Extracting Hash Tagged Words from Tweets
"The presence of these numbers in square brackets is reporting error." You mean the square brackets that show up on the left hand side when you do something like x <- 1:100 print(x) ? Don't worry -- those aren't part of x -- they're just added on printing to make things easier for the user to see where he is in the vector. They won't be included in any analysis. If you need control over the printing to avoid them, take a look at cat() Michel
On Tue, May 22, 2012 at 11:02 AM, Sarah Goslee <sarah.goslee at gmail.com> wrote:
Hi, On Tue, May 22, 2012 at 10:55 AM, Adedoyin-Olowe Mariam <mariamolowe2008 at yahoo.com> wrote:
Hi Sarah, Thanks for your help. I'm sorry my question is not clear enough. Maybe what I should ask for is how to remove the downloaded tweet numbers in x <- list (ie.[[1]], [1], [[2]], [2].....) before > sapply(x, str_extract_all, "#\\<.*?\\>").
Those aren't part of the tweets. Those are the numbers R uses when displaying portions of a list.
The presence of these numbers in square brackets is reporting error.
What error? You'll need to give us an actual reproducible example, since what you are describing is unclear. Although I suppose it's possible that you simply want:
unlist(sapply(x, str_extract_all, "#\\<.*?\\>"))
[1] "#dayatthenews" "#pompeyhacks" ?"#portsmouth" ? "#southsea" [5] "#Portsmouth" ? "#portsmouth" It's impossible for me to tell precisely what the problem is. Sarah
Thanks. Mariam
________________________________
From: Sarah Goslee <sarah.goslee at gmail.com>
To: Adedoyin-Olowe Mariam <mariamolowe2008 at yahoo.com>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Tuesday, 22 May 2012, 13:53
Subject: Re: [R] Problem with Extracting Hash Tagged Words from Tweets
Hi,
A small reproducible bit of your data would have been nice, and I have
no idea what "manually remove all regular expressions" might mean, but
take a look at this:
x <- list("marymaryw: Get an insight into how journalists operate at
The News by following #dayatthenews today #pompeyhacks #portsmouth
#southsea", "VouchAR_Ports: ?5 instead of ?60 for 1 month of unlimited
fitness classes at Outdoor Fitness Leeds - get bikini...
http://t.co/BUrkjtCh #Portsmouth", "BillieRaePhoto: RT @vintagesecret:
My dad has just sent me this picture. Looks like @GunwharfQuays is on
fire?! #portsmouth http://t.co/HbAV7Hw0")
sapply(x, str_extract_all, "#\\<.*?\\>")
[[1]]
[1] "#dayatthenews" "#pompeyhacks"? "#portsmouth"? "#southsea"
[[2]]
[1] "#Portsmouth"
[[3]]
[1] "#portsmouth"
Sarah
On Tue, May 22, 2012 at 7:00 AM, Adedoyin-Olowe Mariam
<mariamolowe2008 at yahoo.com> wrote:
Hello All,
Can anyone help me solve this problem.
Am trying to extract hash-tagged words from tweets downloaded from
twitteR.
I can extract hash-tagged words from single tweet using
(stringr)?str_extract_all(tweets, "#[a-z//A-Z//0-9]+")
but cannot with more than one tweet at a time except I manually remove all
regular expressions and tweets numbers such as [[1]] and [1.]
I want to automatically extract all #words in large number of tweets at a
go.
This is what I have done so far by removing all regular expressions
manually:
searchTwitter("#Portsmouth", n=20) [[1]]
[1] "marymaryw: Get an insight into how journalists operate at The News by
following #dayatthenews today #pompeyhacks #portsmouth #southsea"
[[2]]
[1] "VouchAR_Ports: ?5 instead of ?60 for 1 month of unlimited fitness
classes at Outdoor Fitness Leeds - get bikini... http://t.co/BUrkjtCh
#Portsmouth"
[[3]]
[1] "BillieRaePhoto: RT @vintagesecret: My dad has just sent me this
picture. Looks like @GunwharfQuays is on fire?! #portsmouth
http://t.co/HbAV7Hw0"
[[4]]
[1] "xangma: RT @vintagesecret: My dad has just sent me this picture.
Looks like @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0"
[[5]]
[1] "vintagesecret: My dad has just sent me this picture. Looks like
@GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0"
[[6]]
[1] "i_amnik: RT @BBCRadioSolent: Can you see the #GunwharfQuays fire?
Eye-witnesses please call - 0845 30 30 961. #Portsmouth."
[[7]]
[1] "vickiredmond: RT @dan_germain: RT @MatMacAulay: Best pic of #Gunwharf
on fire I have seen http://t.co/8LNAiqiD #portsmouth"
[[8]]
[1] "EmilieRosa: Highs of 25 degrees on the island this week!! Beach time
after exams I think! ;) #Portsmouth"
[[9]]
[1] "MrYiff: RT @dan_germain: RT @MatMacAulay: Best pic of #Gunwharf on
fire I have seen http://t.co/8LNAiqiD #portsmouth"
[[10]]
[1] "otbsaad: RT @BBCRadioSolent: BREAKING NEWS - Reports of a large fire
at #GunwharfQuays in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM"
[[11]]
[1] "PN_Newsdesk: #Portsmouth: Ferryspeed looks to build on its past
successes http://t.co/CmDglDkg"
[[12]]
[1] "PN_Newsdesk: #Portsmouth: More room for stalls at top Southsea school
- A SOUTHSEA primary school still has room for people to se...
http://t.co/ucbYWjPR"
[[13]]
[1] "VouchAR_Ports: ?14 instead of ?30 for a pedicure with foiled transfer
at Forever Young, Stoke-on-Trent - get... http://t.co/P7gJBcl8 #Portsmouth"
[[14]]
[1] "TelArnott: Looking forward to #K1 today! #gym01 #portsmouth"
[[15]]
[1] "dan_germain: RT @MatMacAulay: Best pic of #Gunwharf on fire I have
seen http://t.co/8LNAiqiD #portsmouth"
[[16]]
[1] "dan_germain: RT @portsmouthnews: News: Large fire at Gunwharf Quays -
http://t.co/s9RWpY0i #portsmouth #southsea"
[[17]]
[1] "i_amnik: RT @BBCRadioSolent: BREAKING NEWS - Reports of a large fire
at #GunwharfQuays in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM"
[[18]]
[1] "solentmotorcars: RT @BBCRadioSolent: BREAKING NEWS - Reports of a
large fire at #GunwharfQuays in #Portsmouth. Latest updates on
@BBCRadioSolent 96.1FM"
[[19]]
[1] "HantsChiefAlex: RT @BBCRadioSolent: BREAKING NEWS - Reports of a
large fire at #GunwharfQuays in #Portsmouth. Latest updates on
@BBCRadioSolent 96.1FM"
[[20]]
[1] "BBCRadioSolent: Can you see the #GunwharfQuays fire? Eye-witnesses
please call - 0845 30 30 961. #Portsmouth."
tweets <-c("marymaryw: Get an insight into how journalists operate at The
News by following #dayatthenews today #pompeyhacks #portsmouth #southsea
VouchAR_Ports ?5 instead of ?60 for 1 month of unlimited fitness classes at
Outdoor Fitness Leeds - get bikini... http://t.co/BUrkjtCh #Portsmouth
BillieRaePhoto RT @vintagesecret My dad has just sent me this picture. Looks
like @GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0 xangma: RT
@vintagesecret My dad has just sent me this picture. Looks like
@GunwharfQuays is on fire?! #portsmouth http://t.co/HbAV7Hw0 vintagesecret
My dad has just sent me this picture. Looks like @GunwharfQuays is on fire?!
#portsmouth http://t.co/HbAV7Hw0iamnik: RT @BBCRadioSolent Can you see the
#GunwharfQuays fire? Eye-witnesses please call - 0845 30 30 961.
#Portsmouth. vickiredmond @MatMacAulay Best pic of#Gunwharf on fire I have
seen http://t.co/8LNAiqiD #portsmouth EmilieRosa: Highs of 25 degrees on the
island
?this week!! Beach time after exams I think!) #Portsmouth mYiff RT
@dan_germain: RT @MatMacAulay Best pic of #Gunwharf on fire I have seen
http://t.co/8LNAiqiD #portsmouth otbsaad RT @BBCRadioSolent: BREAKING NEWS -
Reports of a large fire at #GunwharfQuays in #Portsmouth. Latest updates on
@BBCRadioSolent 96.1FM PN_Newsdesk #Portsmouth: Ferryspeed looks to build on
its past successes http://t.co/CmDglDkg PN_Newsdesk #Portsmouth More room
for stalls at top Southsea school - A SOUTHSEA primary school still has room
for people to se... http://t.co/ucbYWjPR VouchAR_Ports ?14 instead of ?30
for a pedicure with foiled transfer at Forever Young, Stoke-on-Trent -
get... http://t.co/P7gJBcl8 #Portsmouth TelArnott Looking forward to #K1
today! #gym01 #portsmouth Best pic of #Gunwharf on fire I have seen
http://t.co/8LNAiqiD #portsmouth dangermain RT @portsmouthnews News Large
fire at Gunwharf Quays - http://t.co/s9RWpY0i #portsmouth #southsea iamnik
RT
?@BBCRadioSolent BREAKING NEWS - Reports of a large fire at #GunwharfQuays
in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM solentmotorcars RT
@BBCRadioSolent: BREAKING NEWS - Reports of a large fire at #GunwharfQuays
in #Portsmouth. Latest updates on @BBCRadioSolent 96.1FM HantsChiefAlex RT
@BBCRadioSolent BREAKING NEWS - Reports of a large fire at #GunwharfQuays in
#Portsmouth. Latest updates on @BBCRadioSolent 96.1FM BBCRadioSolent Can you
see the #GunwharfQuays fire? Eye-witnesses please call - 0845 30 30 961.
#Portsmouth")
str_extract_all(tweets, "#[a-z//A-Z//0-9]+")
[[1]]
?[1] "#dayatthenews" ?"#pompeyhacks" ? "#portsmouth" ? ?"#southsea"
?"#Portsmouth" ? ?"#portsmouth" ? ?"#portsmouth"
?[8] "#portsmouth" ? ?"#GunwharfQuays" "#Portsmouth" ? ?"#Gunwharf"
?"#portsmouth" ? ?"#Portsmouth" ? ?"#Gunwharf"
[15] "#portsmouth" ? ?"#GunwharfQuays" "#Portsmouth" ? ?"#Portsmouth"
?"#Portsmouth" ? ?"#Portsmouth" ? ?"#K1"
[22] "#gym01" ? ? ? ? "#portsmouth" ? ?"#Gunwharf" ? ? ?"#portsmouth"
?"#portsmouth" ? ?"#southsea" ? ? ?"#GunwharfQuays"
[29] "#Portsmouth" ? ?"#GunwharfQuays" "#Portsmouth" ? ?"#GunwharfQuays"
"#Portsmouth" ? ?"#GunwharfQuays" "#Portsmouth"
Please I need help.
Mariam
-- Sarah Goslee http://www.functionaldiversity.org
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.