Greetings Folks, When R code (as entered or read from a courced file) is executed, is it interpreted from the input form every time having once been read in, or do subsequent invocations use an "intermediate" (pre-interpreted) form? Or, putting it another way, is the execution of R code faster second time time round (and later) because the pre-interpretation has already been done once and for all? [And, for seconds, what is the corresponding situation for S-plus?] With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 25-Jun-03 Time: 10:19:00 ------------------------------ XFMail ------------------------------
Execution of R code
4 messages · (Ted Harding), Brian Ripley, Peter Dalgaard
I am not sure I fully understand the Qs. There are two phases. 1) The source code is parsed. 2) The parsed code is evaluated. If you run code from source() or a file or the command line, it is parsed and evaluated. However, evaluating a function assignment makes an function object containing the parsed code for the body of a function. Running code a second time is often faster because of caching of memory (in the chip's caches and in RAM ratehr than VM). In S-PLUS there are more layers of caching going on: objects are retrieved from disc and (usually) cached in memory, and memory allocated for objects can be re-used rather than re-allocated. There is no form of pre-compiling to intermediate code on first use (as some Java implementations use), although things like that are in Luke Tierney's long-term plans. I hope that actually answers your questions.
On Wed, 25 Jun 2003 Ted.Harding at nessie.mcc.ac.uk wrote:
Greetings Folks, When R code (as entered or read from a courced file) is executed, is it interpreted from the input form every time having once been read in, or do subsequent invocations use an "intermediate" (pre-interpreted) form? Or, putting it another way, is the execution of R code faster second time time round (and later) because the pre-interpretation has already been done once and for all? [And, for seconds, what is the corresponding situation for S-plus?] With thanks, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 25-Jun-03 Time: 10:19:00 ------------------------------ XFMail ------------------------------
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On 25-Jun-03 Prof Brian Ripley wrote:
I am not sure I fully understand the Qs. [...] I hope that actually answers your questions.
Thanks, Brian! You have exactly understood, and fully answered, my questions. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 25-Jun-03 Time: 12:34:09 ------------------------------ XFMail ------------------------------
Prof Brian Ripley <ripley at stats.ox.ac.uk> writes:
I am not sure I fully understand the Qs. There are two phases. 1) The source code is parsed. 2) The parsed code is evaluated. If you run code from source() or a file or the command line, it is parsed and evaluated. However, evaluating a function assignment makes an function object containing the parsed code for the body of a function. Running code a second time is often faster because of caching of memory (in the chip's caches and in RAM ratehr than VM). In S-PLUS there are more layers of caching going on: objects are retrieved from disc and (usually) cached in memory, and memory allocated for objects can be re-used rather than re-allocated. There is no form of pre-compiling to intermediate code on first use (as some Java implementations use), although things like that are in Luke Tierney's long-term plans. I hope that actually answers your questions.
One might add that although we don't byte-compile like in Java and
emacs-lisp, the parse tree storage that we use is somewhat more
pre-cooked than the tokenized storage of the ROM BASIC found on early
PCs and their precursors.
One often considers the parsing stage as two processes: Lexical
analysis (the tokenizer) which recognises elementary items such as
keywords, operators, variable names, and constants; and the actual
code tree generation which knows about syntactical structures like for
loops, functions, and compound expressions.
A code tree for a simple expression like
while ( i < 10 ) i <- i + 1
could be represented as
while
/ \
< <-
/ \ / \
i 10 i +
/ \
i 1
(apologies to those with proportional screen fonts...) In this
representation, everything is basically functions and arguments: "while"
has two arguments: the loop condition and the body, and those are
calls to a comparison and an assignment function respectively, and so
forth.
In compiled languages, parsing is followed by a step that converts the
code tree to machine instructions, but in languages like R it is
easier to interpret the tree directly. One particular aspect of R-like
languages is that you can replace or modify functions programmatically
in between running them, which means that you won't get the gain of an
up-front optimization effort unless you impose special restrictions.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907