Skip to content

An iteration protocol

7 messages · Duncan Murdoch, Peter Meilstrup, Lionel Henry +2 more

#
Hi all,

A while back, Hadley and I explored what an iteration protocol for R
might look like. We worked through motivations, design choices, and edge
cases, which we documented here:
https://github.com/t-kalinowski/r-iterator-ideas

At the end of this process, I put together a patch to R (with tests) and
would like to invite feedback from R Core and the broader community:
https://github.com/r-devel/r-svn/pull/130/files?diff=unified&w=1

In summary, the overall design is a minimal patch. It introduces no
breaking changes and essentially no new overhead. There are two parts.

1.  Add a new `as.iterable()` S3 generic, with a default identity
    method. This provides a user-extensible mechanism for selectively
    changing the iteration behavior for some object types passed to
    `for`. `as.iterable()` methods are expected to return anything that
    `for` can handle directly, namely, vectors or pairlists, or (new) a
    closure.

2.  `for` gains the ability to accept a closure for the iterable
    argument. A closure is called repeatedly for each loop iteration
    until the closure returns an `exhausted` sentinel value, which it
    received as an input argument.

Here is a small example of using the iteration protocol to implement a
sequence of random samples:

``` r
SampleSequence <- function(n) {
  i <- 0
  function(done = NULL) {
    if (i >= n) {
      return(done)
    }
    i <<- i + 1
    runif(1)
  }
}

for(sample in SampleSequence(2)) {
  print(sample)
}

# [1] 0.7677586
# [1] 0.355592
```

Best,
Tomasz
#
1. I'm not sure I see the need for the syntax change.  Couldn't this all 
be done in a while or repeat loop?  E.g. your example could keep the 
same definition of SampleSequence, then

  iterator <- SampleSequence(2)
  repeat {
    sample <- iterator()
    if (is.null(sample)) break
    print(sample)
  }

Not as simple as yours, but I think a little clearer because it's more 
concrete, less abstract.

2. It's not clear to me how the for() loop chooses a value to pass to 
the iterator function. (Sorry, I couldn't figure it out from your 
patch.) Is "exhausted" a unique value produced each time for() is 
called?  Is it guaranteed to be unique?  What does a user see if they 
look at it?

Duncan Murdoch
On 2025-08-11 3:23 p.m., Tomasz Kalinowski wrote:
#
Hello,

A couple of comments:

- Regarding the closure + sentinel approach, also implemented in coro
  (https://github.com/r-lib/coro/blob/main/R/iterator.R), it's more
robust for the
  sentinel to always be a temporary value. If you store the sentinel
in a list or
  a namespace, it might inadvertently close iterators when iterating over that
  collection. That's why the coro sentinel is created with `coro::exhausted()`
  rather than exported from the namespace as a constant object. The sentinel can
  be equivalently created with `as.symbol(".__exhausted__.")`, the main thing to
  ensure robustness is to avoid storing it and always create it from scratch.

  The approach of passing the sentinel by argument (which I see in the example
  in your mail but not in the linked documentation of approach 3) also
works if the
  iterator loop passes a unique sentinel. Having a default of `NULL` makes it
  likely to get unexpected exhaustion of iterators when a sentinel is not passed
  in though.

- It's very useful to _close_ iterators for resource cleanup. It's the
responsibility of an iterator loop (e.g. `for` but could be other custom tools
invoking the iterator) to close them. See https://github.com/r-lib/coro/pull/58
for an interesting application of iterator closing, allowing robust support of
`on.exit()` expressions in coro generators.

  To implement iterator closing with the closure approach, an iterator may
  optionally take a `close` argument. A `true` value is passed on exit,
  instructing the iterator to clean up resources.

Best,
Lionel
On Mon, Aug 11, 2025 at 3:24?PM Tomasz Kalinowski <kalinowskit at gmail.com> wrote:
#
Passing the sentinel value as an argument to the iteration method is
the approach taken in my package `iterors` on CRAN. If the sentinel
value argument is evaluated lazily, this lets you pass calls to things
like 'stop', 'break' or 'return,' which will be called to signal end
of iteration. This makes for some nice compact and performant
iteration idioms:

iter <- as.iteror(obj)
total <- 0
repeat {total <- total + nextOr(iter, break)}

Note that iteror is just a closure with one optional argument and a
class attribute, so you can skip using s3 nextOr method and call it
directly:

nextElem <- as.iteror(obj)
repeat {total <- total + nextElem(break)}

For backward compatibility with the iterators package, the default
sentinel value for iterors is `stop("StopIteration")`.

Note that it is trivial to create a unique sentinel value -- any newly
created closure (i.e. function() NULL) will do, as it will only
compare identical() with itself.

sigil <- \() NULL
next <- as.iteror(obj)
while (!identical(item <-next(sigil), sigil)) {
  doStuff(item)
}

Peter Meilstrup

On Mon, Aug 11, 2025 at 5:56?PM Lionel Henry via R-devel
<r-devel at r-project.org> wrote:
#
Clever! If going for non-local returns, probably best for ergonomics to pass in
a closure (see e.g. `callCC()`). If only to avoid accidental jumps while
debugging.

But... do we need more lazy evaluation tricks in the language or fewer? It's
probably more idiomatic to express non-local returns with condition signals
like `stopIteration()`.

There's something to be said for explicit and simple control flow though, via
handling of returned values.
Until you try that in the global env right? Then the risk of collision slightly
increases. Unless you make your closure more unique via `body()`, but then might
as well use a conventional sentinel.

Best,
Lionel

On Tue, Aug 12, 2025 at 1:45?AM Peter Meilstrup
<peter.meilstrup at gmail.com> wrote:
#
Thank you Lionel, Peter, and Duncan!
Some responses inline below:
Indeed, that?s the trade-off! Explicit and verbose vs. simple,
concise, and abstracted away. There are certainly times when I prefer
the former, but the latter is not even an option today. Particularly
in a teaching context, I think the concept of iteration is more
intuitive and faster to teach than the precise mechanics of iteration.
The opportunity to make `for` usable with a broader set of object
types is icing on the cake. (Some of these arguments are fleshed out
further in the README linked in the first email.)
In the draft patch, `for` creates a unique sentinel object, a bare
`OBJSXP`. The iterator closure is called with this sentinel as the
argument, and the closure must return exactly it to indicate
exhaustion.

This approach neatly achieves a few design goals. It introduces no
persistent symbols, keeping the API surface small, and avoids
introducing the ugly edge case of a potential false-positive
exhaustion detection. It has less overhead than a signal. Compared to
a signal, it should also encourage a more local coding style, making
code easier to reason about. Treating errors as values is one idea
that Rust has proven the value of to me, and this value-sentinel
approach is a close cousin of that.

The example `SampleSequence` iterator in the initial email had a
default sentinel value of `NULL`. This was to allow convenient manual
iteration with something like:

```r
it <- SampleSequence(9)
it(); it(); it(); ...
```

Or, if you prefer a more explicit approach:

```r
it <- SampleSequence(9)
repeat { val <- it() %||% break; ... }
```

Or:

```r
repeat { val <- it(break); ... }
```

Or:

```r
while (!is.null(val <- it())) { ... }
```

Or, for maximum robustness:

```r
done_sentinel <- new.env(parent = emptyenv())
while (!identical(done_sentinel, val <- it(done_sentinel))) { ... }
```

This enables a variety of usage patterns with different trade-offs
between convenience and robustness, with `for` able to take the most
robust approach, while allowing the iterator?s default sentinel to
prioritize convenience.
This is interesting and, to be honest, not a use case we had considered.

Would using `reg.finalizer()` be sufficient for your use case? It
gives less control over timing than `on.exit()`, but can close
resources with something like:

```r
Stream <- function() {
  r <- open_resource()
  reg.finalizer(environment(), \(e) r$close())
  \(done) r$get_next() %||% done
}
```
On Tue, Aug 12, 2025 at 5:20?AM Lionel Henry <lionel at posit.co> wrote:
#
Great stuff, and I like the use of a sentinel as a terminator symbol.

One aspect of this I would like to explore is that of a lazy sequence as a more fundamental language primitive. Generators in for loops are great, but generators returned by lapply() and friends would enable lazy functional transformations and efficient combination of processing steps. At the lowest level I can see this being facilitated by an ALTREP protocol with a similar API to what you propose. 

One big pain point of course is parallel processing. A two level design splitting the iterator index and data generation (like C++ does)  could be a better fit if parallelization is desired. Curious to hear your thoughts. 

Best, 

Taras