Skip to content
Prev 377019 / 398502 Next

Matching multiple search criteria (Unlisting a nested dataset, take 2)

If you wish to use R, you need to at least understand its basic data
structures and functionality. Expecting that mimickry of code in special
packages will suffice is, I believe, an illusion. If you haven't already
done so, you should go through a basic R tutorial or two (there are many on
the web; some recommendations, by no means necessarily "the best",  can be
found here:
https://www.rstudio.com/online-learning/#r-programming).

Having said that, I realized that my previous "solution" using regular
expressions was more complicated than it needed to be and somewhat foolish
( so much for all my "expertise"). A simpler and better approach is simply
to break up both the tweet texts and your search phrases into vectors of
their "words" (i.e. character strings surrounded by spaces) using
strplit(), and then using R's built-in matching capabilities with %in%.
This is quite straightforward, pretty robust (no regex's to wrestle with),
and does not require "herculean efforts" to understand. The only wrinkle is
some bookkeeping with the "apply" family of functions. These are, as you
may know, the functional programming way of handling iteration (loops), but
they are what I would consider part of "basic" R functionality and worth
spending the time to learn about.

Herewith my better, simpler proposal, using your example data as before:

getwords <- function(x)strsplit(tolower(x),split = " +")
## split text into a vector of lower-cased "words"

phrasewords <- structure(getwords(st$terms), names = st$terms)
## named list of your search word vectors

tweets <- getwords(c(th$text, " i xxxx worthless yxxc ght feel"))
## the tweets + one additional that should match the last phrase

ans <- lapply(phrasewords, function(x) apply(sapply(tweets,function(y)x
%in% y), 2, all))
## a list indexed by the search phrases,
## with each component a vector of logicals with vec[i] == TRUE iff
## the ith tweet contains all the words in the search phrase
$`me abused depressed`
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$`me hurt depressed`
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$`feel hopeless depressed`
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$`feel alone depressed`
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$`i feel helpless`
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

$`i feel worthless`
[1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

-- Bert

On Wed, Oct 17, 2018 at 9:20 AM Nathan Parsons <nathan.f.parsons at gmail.com>
wrote: