Skip to content
Prev 358 / 585 Next

Identify medicines names

Hi Gianpaolo,

It works now, thank you!

But it is not what I need exactly.
I will explain better.

Your solution is good. To identify what is antibiotic and for this my
solution solved too:

######################################################
matches  <- unlist(sapply(patterns, function(p) grep(p, df$name,
                                                     value = FALSE,
                                                     ignore.case = TRUE)
                          )
                   )
anti <- df[matches,]
########################################################


But what I need, beyond identifying what is an antibiotic:
- Create a new variable (when the medicine is antibiotic - into the
patterns object) with the name from patterns name.
I did this with the code below - fuzzyjoin::regex_left_join() function:

#########################################################
#List of medicines that - object called patterns.
patterns <-  c("Oritavancina", "Oxacilina", "Pefloxacino", "Penicilina",
              "Pexiganan",  "Piperacilina-tazobactam","Tazobactam",
              "Pirazinamida", "Plazomicina", "Polimixina B",
              "Posilozid","Piperacilina")
patterns <- toupper(patterns)

# Sample Data frame where I need to find the names from the list above.
df <- data.frame(name =
                     c("CLORETO DE POTASSIO DRAGEA 600MG",
                       "CLORETO DE SODIO 0,9% SERINGA PREENCHIDA 5ML",
                       "CLORETO DE SODIO SOLUCAO INJETAVEL 0,9% 10ML",
                       "CODEINA FOSFATO SOLUCAO ORAL 3MGML 10ML ISCMPA @",
                       "CODEINA FOSFATO SOLUCAO ORAL 3MGML 5ML ISCMPA @",
                       "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
                       "DipiRONA SOLUCAO INJETAVEL 500MGML 2ML",
                       "FUROSEMIDA SOLUCAO INJETAVEL 10MGML 2ML",
                       "HIDROCORTISONA SUCCINATO SODICO PO LIOFILO
INJETAVEL 100MG",
                       "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
4ML",
                       "ONDANSETRONA CLORIDRATO SOLUCAO INJETAVEL 2MGML
4ML",
                       "Penicilina G BENZATINA PO LIOFILO INJETAVEL
1200000UI",
                       "Penicilina G BENZATINA PO LIOFILO INJETAVEL
1200000UI",
                       "PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G PO
LIOFILO INJETAVEL"))


df <- df %>% mutate(name = toupper(name))
patterns <- data.frame(name = patterns)
results <- fuzzyjoin::regex_left_join(df,
                                      patterns,
                           by = "name")
results
#########################################################
Notice, from results object, when the name of medicine is double
(PIPERACILINA SODICA 4G + TAZOBACTAM SODICA 0,5G PO LIOFILO INJETAVEL"),
the solution doesn't find "PIPERACILINA-TAZOBACTAM"
The code created two new lines PIPERACILINA and othe with TAZOBACTAM.

I think that this explanation was more clear.










Em ter., 6 de abr. de 2021 ?s 03:55, Gianpaolo Romeo <
gianpaolo.romeo at gmail.com> escreveu: