Skip to content
Prev 377016 / 398502 Next

Matching multiple search criteria (Unlisting a nested dataset, take 2)

I do not have your command of base r, Bert. That is a herculean effort! Here?s what I spent my night putting together:

## Create search terms
## dput(st)
st <- structure(list(word1 = c("technique", "me", "me", "feel", "feel"
), word2 = c("olympic", "abused", "hurt", "hopeless", "alone"
), word3 = c("lifts", "depressed", "depressed", "depressed",
"depressed")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-5L))

## Create tweets
## dput(th)
th <- structure(list(status_id = c("x1047841705729306624", "x1046966595610927105",
"x1047094786610552832", "x1046988542818308097", "x1046934493553221632",
"x1047227442899775488", "x1048126008941981696", "x1047798782673543173",
"x1048269727582355457", "x1048092408544677890"), created_at = c("2018-10-04T13:31:45Z",
"2018-10-02T03:34:22Z", "2018-10-02T12:03:45Z", "2018-10-02T05:01:35Z",
"2018-10-02T01:26:49Z", "2018-10-02T20:50:53Z", "2018-10-05T08:21:28Z",
"2018-10-04T10:41:11Z", "2018-10-05T17:52:33Z", "2018-10-05T06:07:57Z"
), text = c("technique is everything with olympic lifts ! @ body by john ",
"@subtronics just went back and rewatched ur fblice with ur cdjs and let me tell you man. you are the fucking messiah",
"@ic4rus1 opportunistic means short-game. as in getting drunk now vs. not being hung over tomorrow vs. not fucking up your life ten years later.",
"i tend to think about my dreams before i sleep.", "@michaelavenatti @senatorcollins so if your client was in her 20s attending parties with teenagers doesnt that make her at the least immature as hell or at the worst a pedophile and a person contributing to the delinquency of minors?",
"i wish i could take credit for this", "i woulda never imagined. #lakeshow ",
"@philipbloom @blackmagic_news its ok phil! i feel your pain! ",
"sunday ill have a booth in katy at the real craft wives of katy fest @nolabelbrewco cmon yall!everything is better when you top it with tias!order today we ship to all 50 ",
"dolly is so baddd"), lat = c(43.6835853, 40.284123, 37.7706565,
40.431389, 31.1688935, 33.9376735, 34.0207895, 44.900818, 29.7926,
32.364145), lng = c(-70.3284118, -83.078589, -122.4359785, -79.9806895,
-100.0768885, -118.130426, -118.4119065, -89.5694915, -95.8224,
-86.2447285), county_name = c("Cumberland County", "Delaware County",
"San Francisco County", "Allegheny County", "Concho County",
"Los Angeles County", "Los Angeles County", "Marathon County",
"Harris County", "Montgomery County"), fips = c(23005L, 39041L,
6075L, 42003L, 48095L, 6037L, 6037L, 55073L, 48201L, 1101L),
state_name = c("Maine", "Ohio", "California", "Pennsylvania",
"Texas", "California", "California", "Wisconsin", "Texas",
"Alabama"), state_abb = c("ME", "OH", "CA", "PA", "TX", "CA",
"CA", "WI", "TX", "AL"), urban_level = c("Medium Metro",
"Large Fringe Metro", "Large Central Metro", "Large Central Metro",
"NonCore (Nonmetro)", "Large Central Metro", "Large Central Metro",
"Small Metro", "Large Central Metro", "Medium Metro"), urban_code = c(3L,
2L, 1L, 1L, 6L, 1L, 1L, 4L, 1L, 3L), population = c(277308L,
184029L, 830781L, 1160433L, 4160L, 9509611L, 9509611L, 127612L,
4233913L, 211037L), linenumber = 1:10), row.names = c(NA,
10L), class = "data.frame")

## Clean tweets - basically just remove everything we don?t need from the text including punctuation and urls
th %>%
mutate(linenumber = row_number(),
text = str_remove_all(text, "[^\x01-\x7F]"),
text = str_remove_all(text, "\n"),
text = str_remove_all(text, ","),
text = str_remove_all(text, "'"),
text = str_remove_all(text, "&"),
text = str_remove_all(text, "<"),
text = str_remove_all(text, ">"),
text = str_remove_all(text, "http[s]?://[[:alnum:].\\/]+"),
text = tolower(text)) -> th

## Create search function that looks for each search term in the provided string, evaluates if all three search terms have been found, and returns a logical
srchr <- function(df) {
str_detect(df, "olympic") -> a
str_detect(df, "technique") -> b
str_detect(df, "lifts") -> c
ifelse(a == TRUE & b == TRUE & c == TRUE, TRUE, FALSE)
}

## Evaluate tweets for presence of search term
th %>%
mutate(flag = map_chr(text, srchr)) -> th_flagged

As far as I can tell, this works. I have to manually enter each set of search terms into the function, which is not ideal. Also, this only generates a True/False for each tweet based on one search term - I end up with an evaluatory column for each search term that I would then have to collapse together somehow. I?m sure there?s a more elegant solution.

--

Nate Parsons
Pronouns: He, Him, His
Graduate Teaching Assistant
Department of Sociology
Portland State University
Portland, Oregon

503-725-9025
503-725-3957 FAX
On Oct 16, 2018, 7:20 PM -0700, Bert Gunter <bgunter.4567 at gmail.com>, wrote: