Interrupts (was Re: [Rd] X11 protocol errors ...)
On Thu, 23 Aug 2001, Luke Tierney wrote:
I'm talking about something related but different: controlling the point at which an asynchronous signal is brought into the system (and turned into an exception if we have a proper exception system.) R currently has on.exit, and Robert Gentleman and I proposed a more structured exception mechanism for possible addition to R in the neas future. [I sent a posting about the proposed mechanism a while back. So far we have received little feedback, so here is another request: Please have a look at http://www.stat.umn.edu/~luke/R/exceptions/simpcond.html and let us know if you have any comments/suggestions] But that is not the issue here. The issue is whether we allow a SIGINT signal in UNIX (and whatever its analog is on other systems) to interrupt the current calculation immediately, no matter where it might be, or whether we impose more structure. Windows/Mac pretty much force more structure at the C level, since their analogs have to arrive through mechnisms that require explicit polling. So on Windows you know that an expression like x = malloc(n) will not get interrupted between the malloc call and the assignment to x (unless some very low level tricks are involved). On UNIX, the signal can arrive in between those two operations.
That sounds like two problems--- the first being how to make sure that the allocation and assignment happen as an atomic operation (that's not difficult--- we just throw a critical section around it) but also in the event of an interrupt, what to do with things like allocated memory in the case of an interrupt (which was Duncan's point in the earlier message that got snipped). Unfortunately, I think the transactional (is that a word?) database people may be the only ones with a good handle on that particular problem and thats expensive... Though I suppose if you're wrapping everything in an environment and interrupt could just ensure the destruction of the environment, but it doesn't handle global variables. You'd need some sort of stack on each variable that kept weak references to previous values (since R doesn't usually access things by reference unless you force it, right?) until they're garbage collected at which point their weak references will also be removed from the stack--- "Undo Capability, Limited Time Only!" :-)
The safe thing to do on UNIX is to have the signal handler just set a
flag which is then checked at appropriate points. This is the
approach that John Eaton mentioned, and is used by most Scheme systems
I've looked at. I suspect Python and Perl do this as well, but I'll
have to check. This is also the way Java handles thread interrupts.
It would make the UNIX behavior identical to the WIndows behavior.
The drawback for systems like R and Octave is that we rely on being
able to use chunks of C/Fortran that can potentially run for a long
time (forever if they happen to get into infinite loops occasionally)
and where it is either impractical or impossible to insert flag
checking code. For those situations it is nice to be able to use a
signal handler to force a jump out of that code. We live without this
ability on Windows/Mac, and don't do too badly there, but it would be
nice not to compltely loose this facility on UNIX. Most numerical code
tends to not behave too badly when exited by a longjmp, but there are
no guarantees. For example, if a piece of C code does something like this:
static inited = FALSE;
if (! inited) {
inited = TRUE;
... initialize a table needed for computations ...
}
... use the table ..
and a Control-C arrives in the first call after inited=TRUE is executad
but before the table is fully initialized, then future calls to this
function will happily return nonsense.
Step 1. Have anyone who uses statics burned at the stake for not writing threadsafe code ;-)
One option would be to tag routines at library regestration time as safe for LONGJMP's or not. That way we can disable LONGJUMP interrupts everywhere except in explicitly marked .C or .Fortran calls (and blocking IO operations). This will insure that no internal R state gets messed up by asynchronous signals that arrive at on inopportune time.
I think you'll want finer grain control as well--- my guess would be to have external calls execute within a critical section (like Java, if my understanding is correct) unless explicitly marked otherwise but still allow for a critical section to be entered (and left) within the code block. An example would be something like reading from a URL where I may spend some time blocked waiting for a connection where you would want the ability to break out without having to wait for a timeout, but if for some reason once the transfer is started it must be allowed to complete I would want to engage the critical section later in the function.
But this only addresses the C level. On Windows/Mac, the place where a user break is turned into an R exception is (mainly) in the internal eval, where every 1000 calculatins (or some such number) the flag is checked and a jump is done if the flag is set. UNIX would work the same way. Since the internals know exactly where this jump can occur, unlike jumps out of a signal handler, they can make sure all internal state is consistent before checking the flag.
From the R level things look different: the 1000'th eval step can
happen anywhere, so a piece of R code that does
file <- file(file, "w")
on.exit(close(file))
... do something with file ...
has a race condition: an interrupt that arrives between the creation
of the file and the registration of the on.exit handler will leave
the file open. Something along the lines of
without.interrupts({
file <- file(file, "w")
on.exit(close(file))
with.interrupts(... do something with file ...)
})
would be safe but is too awkward in this form. [Using a structured
exception handling mechanism, some sort of try/finally construct,
would make this code cleaner but would not resolve the race
condition.]
There are no easy solutions I think, but we need to look at a range of
options and see what works best.
[Threads add the additional problem that an interrupted thread might
be holding a lock, and failure to release the lock could cause
deadlock. Using a structured exception handling mechanism to manage
lock release helps, but race conditions are still potentially an issue
with asynchronous interrupts.]
I'm just sort of pulling stuff out of thin air and I don't expect this stuff to be easy to implement, but here goes: :-) Say, we have an ideal world where R executes in a bytecode VM like Java or something else---I've noticed that this idea pops up every now and then for performance reasons. In this case, why not just take it one step further and have the R environment actually be something of a lightweight operating system (they don't have to be bloaty and the VM's nature means it can be fairly abstracted---no need for 'device drivers' in the traditional sense and whatnot) that manages each of R's user-level threads as a distinct process. The 'OS' then handles the context switching and preemption that we'll need anyway but also handles interrupt cases by trapping them from the operating system (using signal handlers or the particular OS's analogue). The interrupt handler would then be able to forcibly shutdown whatever shared resources and memory allocated to the 'process' in the same way it happens in a real OS (this would obviously require that the I/O system be abstracted away in internal and C calls--- but I think we've already got a good start on that with the connection mechanism and we want it for other reasons as well). My thought is that this sort of set up would also change the flavour of the native threading as well since it actually becomes more analogous to developing an SMP operating system, which people already know how to do, though you would want to keep the 'OS' bits to an absolute minimum more like an RTOS than a UNIX or something like that (read: primitive :-)).
luke -- Luke Tierney University of Minnesota Phone: 612-625-7843 School of Statistics Fax: 612-624-8868 313 Ford Hall, 224 Church St. S.E. email: luke@stat.umn.edu Minneapolis, MN 55455 USA WWW: http://www.stat.umn.edu -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Byron Ellis (bellis@hsph.harvard.edu) "Oook" - The Librarian -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._