Skip to content

purrr::map and xml2:: read_xml

2 messages · maicel at infomed.sld.cu, Ulrik Stervbo

#
Hi List, I am trying to extract the key words from 1403 papers in xml  
format. I programmed such codes but they do not work but they only do  
with the modification showed below. But that variation is not the one  
I need because the 1403 xml files do not match to those in my folder.  
Could you please tell me where are the mistakes in the codes list (A  
or B) to help me to correct them? The data frame columns are an id and  
the paths.

A-Does not work, but it is the one I need.

keyword <-
   muestra %>%
   select(path) %>%
   read_xmlmap(.f = function(x) { read_xml(x) %>%
        xml_find_all( ".//kwd") %>%
        xml_text(trim=T) })

B-It works but only with a small number of papers.

keyword <-
   muestra %>%
   select(path) %>%
    dplyr::sample_n(50) %>%
    unlist() %>%
   map(.f = function(x) { read_xml(x) %>%
        xml_find_all( ".//kwd") %>%
        xml_text(trim=T) })

Thank you,
Maicel Monzon MD, PHD


----------------------------------------------------------------




--
Este mensaje le ha llegado mediante el servicio de correo electronico que ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema Nacional de Salud. La persona que envia este correo asume el compromiso de usar el servicio a tales fines y cumplir con las regulaciones establecidas

Infomed: http://www.sld.cu/
#
Hi Maicel,

I'm guessing that B works on 50 files, and that A fails because there is no
function called 'read_xmlmap'. If the function that you map work well,
removing 'dplyr::sample_n(50)' from 'B' should solve the problem.

If that is not the case, we need a bit more information.

HTH
Ulrik
On Fri, 6 Jan 2017 at 17:08 <maicel at infomed.sld.cu> wrote: