An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20121026/31e84f25/attachment.pl>
Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?
2 messages · Frederic Fournier, Duncan Temple Lang
Hi Frederic
Perhaps the simplest way to profile the individual functions in your
handlers is to write the individual handlers as regular
named functions, i.e. assigned to a variable in your work space (or function body)
and then two write the handler functions as wrapper functions that call these
by name
startElement = function(name, attr, ...) {
# code you want to run when we encounter the start of an XML element
}
myText = function(...) {
# code
}
Now, when calling xmlEventParse()
xmlEventParse(filename,
handlers = list(.startElement = function(...) startElement(...),
.text = function(...) myText(...)))
Then the profiler will see the calls to startElement and myText.
There is small overhead of the extra layers, but you will get the profile information.
D.
On 10/26/12 9:49 AM, Frederic Fournier wrote:
Hello everyone,
I'm trying to parse a very large XML file using SAX with the XML package
(i.e., mainly the xmlEventParsing function). This function takes as an
argument a list of other functions (handlers) that will be called to handle
particular xml nodes.
If when I use Rprof(), all the handler functions are lumped together under
the <anonymous> label, and I get something like this:
$by.total
total.time total.pct self.time self.pct
"system.time" 151.22 99.99 0.00 0.00
"MyParsingFunction" 149.38 98.77 0.00 0.00
"xmlEventParse" 149.38 98.77 0.00 0.00
".Call" 149.32 98.73 3.04 2.01
"<Anonymous>" 146.74 97.02 141.26 93.40 <---
!!
"xmlValue" 3.04 2.01 0.46 0.30
"xmlValue.XMLInternalNode" 2.58 1.71 0.14 0.09
"standardGeneric" 2.12 1.40 0.50 0.33
"gc" 1.86 1.23 1.86 1.23
...
Is there a way to make Rprof() identify the different handler functions, so
I can know which one might be a bottleneck? Is there another profiling tool
that would be more appropriate in a case like this?
Thank you very much for your help!
Frederic
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.