Hi Vladimir, This may fix the NA problem: vdat<-read.table(text="numberoftweet,tweet,locations,badwords 1,My cat is asleep,London,glum 2,My cat is flying,Paris,dashed 3,My cat is dancing,Berlin,mopey 4,My cat is singing,Rome,ill 5,My cat is reading,Budapest,sad 6,My cat is eating,Amsterdam,annoyed 7,My cat is hiding,Copenhagen,crazy 8,My cat is fluffy,Vilnius,terrified 9,My cat is annoyed,Athens,sick 10,My cat is exercising,Ankara,mortified 11,My cat is dreaming,Kracow,irked 12,My cat is mopey,Vienna,uneasy 13,My cat is glum,Brussels,upset 14,My cat is swinging,Madrid, 15,My cat is crazy,Ljubljana,", sep=",",header=TRUE,stringsAsFactors=FALSE) vdat$badwords[!nchar(vdat$badwords)]<-NA badwords<-paste(vdat$badwords[!is.na(vdat$badwords)],collapse="|") names(unlist(sapply(vdat$tweet,grep,pattern=badwords))) Jim
On Sun, Aug 7, 2016 at 6:43 PM, ???? ????????? <v.grabarnik at gmail.com> wrote:
Hi Jim! That is exactly what I mean. Your example does the job I was looking for. If I refer to your example, my badwords column is not completed for all rows, like yours. For example it has only 10 values, but there are much more rows. When I try to introduce NA for blanks and write badwords<-paste(vdat$badwords,collapse="|") it collapses all values and writes smth like: word|word|NA|NA and if I dont introduce NAs when reading data, the outcome is still like: word|word|word|word|||||||||||||||| and when I try to names(unlist(sapply(vdat$tweet,grep,pattern=badwords))) there is a mistake. I had this question before but do you know by any chance how to separate just those words in a column badwords and not include NA's or blanks. Thank you, Vladimir 2016-08-07 0:19 GMT+01:00 Jim Lemon <drjimlemon at gmail.com>:
Hi Vladimir, Do you want something like this? vdat<-read.table(text="numberoftweet,tweet,locations,badwords 1,My cat is asleep,London,glum 2,My cat is flying,Paris,dashed 3,My cat is dancing,Berlin,mopey 4,My cat is singing,Rome,ill 5,My cat is reading,Budapest,sad 6,My cat is eating,Amsterdam,annoyed 7,My cat is hiding,Copenhagen,crazy 8,My cat is fluffy,Vilnius,terrified 9,My cat is annoyed,Athens,sick 10,My cat is exercising,Ankara,mortified 11,My cat is dreaming,Kracow,irked 12,My cat is mopey,Vienna,uneasy 13,My cat is glum,Brussels,upset", sep=",",header=TRUE,stringsAsFactors=FALSE) badwords<-paste(vdat$badwords,collapse="|") names(unlist(sapply(vdat$tweet,grep,pattern=badwords))) Jim On Sat, Aug 6, 2016 at 12:07 AM, ???? ????????? <v.grabarnik at gmail.com> wrote:
Dear R command,
I was wondering if I could ask you recommendations on my problem if that
is
fine with you.
Basically, I have a data frame with 5 columns and 10 000 tweets
recorded(rows). Those columns are: numberofatweet(number), tweet (actual
textual tweet), locations(from where tweet sent), badwords(words that
should not be used on twitter, that is just a column irrespective the
number of a tweet and it contains only 80 rows with one word recorded in
one cell.
My question is whether it is possible to select only the rows which
would
contain such tweets, where in column "tweet"(actual text) there was one
of
those words from badwords column present. I tried to use grep and grepl,
but nothing seems to be working.
Thank you in advance,
Vladimir
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- ? ?????????, ?????? ?????????