The stages of standard function evaluation

Thu, May 3, 2018 5:06 AM

First of all, your message is a little hard to read because you posted 
in HTML.  This list removes the HTML, and often mangles messages, so you 
should always post in plain text.  But in this case your message was 
still pretty readable.

On 02/05/2018 11:04 PM, Andrew Hoerner wrote:

Dear R Help folks --

I have been trying to put together a list of the steps or stages of R
function evaluation, with particular focus on those that have "standard" or
"nonstandard" forms. This is both for my own edification and also because I
am thinking of joining the world of R bloggers and have been trying to put
together some draft posting that might be useful. I seem to have an
affirmative genius for finding incorrect interpretations of R's evaluation
rules; I'm trying to make that an asset.

I am hoping that you can tell me:


    1. Is this list complete, or are there additional stages I am missing?
    2. Have I inserted one or more imaginary stages?
    3. Are the terms I use below to name each stage appropriate, or are
    there other terms more widely used or recognizable?
    4. Is the order correct?

I begin each name with ?Standard,? to express my belief that each of these
things has a usual or default form, but also that (unless I am mistaken)
almost none of them exist only in a single form true of all R functions. (I
have marked with an asterisk a few evaluation steps that I think may always
be followed).

It is my ultimate goal (which I do not feel at all close to accomplishing)
to determine a way to mechanically test for ?standardness? along each of
these dimensions, so that each function could be assigned a logical vector
showing the ways that it is and is not standard. One thing I think is
conceptually or procedurally difficult about this project is that I think
?standardness? should be determined by what a function does, rather than by
how it does it, so that a primitive function that takes unevaluated
arguments positionally could still have standard matching, scoping, etc.,
by internal emulation. A related goal is to identify which evaluation steps
most often use an alternative form, and perhaps determine if there is more
than one such alternative. Finally, an easier short-term goal is simply to
find instances of one or more function with standard and non-standard
evaluation for each evaluation step.

For the most part below I am treating the evaluation of closures as the
standard from which ?nonstandard? is defined. However, I do not assume that
other kinds of functions are automatically nonstandard on any particular
dimension below. Most of this comes from the R Language Definition, but
there are numerous places where I am by no means certain that my
interpretation is correct. I have highlighted some of these below with a
????.

I look forward to learning from you.

Warmest regards,

J. Andrew Hoerner


** Standard function recognition:* recognizing some or all of a string code
as a function. (Part of code line parsing)

*Standard environment construction:* construction of the execution
environment, and of pointers to the calling and enclosing environments.

*Standard function identification:* Get the name of the function, if any

This may be mangling, but it's really hard to tell whether the 3 
paragraphs above are supposed to be steps, headings, or what.  Assuming 
they are steps, the first one is wrong.

The parser looks at a string and breaks it down into tokens and 
subexpressions, making what you later call an AST.  The first step in 
function evaluation is recognizing that something is a function call, 
not recognizing it as a function.  For example, "mean" is the name of a 
function and also an expression evaluating to a function, "mean(1:10)" 
is a function call.

Once you have a function call, the next step looks at the expression 
used to specify the function.  In "mean(1:10)", that expression is 
"mean", but it could be an arbitrary R expression.  If it is a name like 
"mean" (or a string), then R looks for an object of mode "function" of 
that name in the current evaluation frame, or its parent frames.  These 
are not "constructed"; the current evaluation frame is always known, and 
contains a pointer to its parent.  If the function is specified by a 
more complex expression (e.g. in "fn[[1]](1:10)", the expression is 
"fn[[1]]") then that expression is evaluated.  It needs to return a 
function object or an error will be generated.

So these work:

mean(1:10)
list(mean)[[1]](1:10)
"mean"(1:10)

and these don't:

list("mean")(1:10)
c("mean")(1:10)

So now we have the function.  Its name is irrelevant.

Functions have at least 3 parts, not 2.  They have formals, a body, and 
an environment.  Nowadays they will often have bytecode as well; this is 
a compiled version of the body used in its place during evaluation.

It is only parsed once.

I have no idea what you are saying in this paragraph.  Positional versus 
named matching has no effect on scoping.  Arguments specified in the 
call are scoped in the calling frame; default values for arguments are 
scoped in the evaluation frame.

You missed a step.  As evaluation starts, a new environment is created, 
the evaluation frame.  Its parent is the environment of the function; it 
is initialized with the formal arguments to the function as promises.

This is true for both standard and non-standard functions.  All 
arguments are parsed, standard or not, producing promises.  They are 
placed in the evaluation frame, not "passed into the body".

No, parse first, match second, put into evaluation frame third.

No.  Each formal is bound to a promise in the evaluation frame. 
Promises contain an expression (an AST in your terms) and an 
environment.  As previously mentioned, the environment will be the 
calling frame for arguments passed in the call, the evaluation frame for 
arguments specified via defaults.

No.  Arguments are all treated as promises, i.e. un-evaluated 
expressions with an attached environment.  No search is done until later 
when they are evaluated.

No, promises contain expressions, and references (pointers) to environments.

That sounds correct.

No.  The body is just an expression.  Typically it's a compound 
statement enclosed in braces, but not necessarily.  No substitutions are 
done.  Later when it is evaluated, symbols in that expression will be 
looked up in the evaluation frame.

This is unnecessarily complex.  Evaluation of the body expression is 
just like evaluation of any other expression.  What is special is that 
the evaluation frame is set as the current frame, and some of the 
objects in it are promises, which have their own special rules.

Again, unnecessary.

I would recommend separating observations like 1) from rules like 2). 
The rules are pretty simple.  The consequences of them can be more complex.

3) is just wrong.  Promises have environments where their expressions 
are evaluated.

Again, this is unnecessary.  The body is just an expression that is 
evaluated in the evaluation frame.

The basic difference between standard evaluation and nonstandard 
evaluation is whether the function looks at the expression in promises, 
or only looks at the value when it is evaluated.  substitute()  is the 
usual way to look at the expression, but packages like rlang define others.

Other issues that you haven't touched on that probably belong in a 
writeup like this are a description of how ... is handled, the rarely 
used ..1, ..2, etc., and the super-assignment operator <<-.

Duncan Murdoch

The stages of standard function evaluation

Thread (2 messages)