X11 protocol errors after all x11 devices are closed (PR#1065) - R-devel

Wed, Aug 22, 2001 5:35 AM #

>> x11()
    >> dev.off()
    BDR> null device
    BDR> 1
    >> x11()
    >> plot.new()
    BDR> Warning messages:
    BDR> 1: X11 protocol error: BadAccess (attempt to access private resource denied)
    BDR> 2: X11 protocol error: BadAccess (attempt to access private resource denied)

    BDR> This only happens if all x11 devices are shut down, so the X11 connection
    BDR> is restarted.  We had problems with this when the event handlers
    BDR> were changed prior to 1.3.0, and it looks as if the logic is still
    BDR> incorrect.

    BDR> As far as I can see subsequent plots are correct.

    BDR> Carrying on:

    >> dev.off()
    BDR> null device
    BDR> 1
    >> x11()
    >> plot.new()
    BDR> Warning message:
    BDR> X11 protocol error: BadAccess (attempt to access private resource denied)
    BDR> (only one error this time).

Just this morning,
I found (again!, we had something close to this before) 
the following related bugous behavior :
After interrupting a plot (which would have taken a few minutes and was
"wrong" anyway), starting another plot, interrupting again [with C-c],
and maybe the same once more,
R started just giving a ">" prompt
but did not react any further at all.
(C-c would return the prompt, but no other reaction was possible)
Only killing the R process helped.

I may try to reproduce more exactly later today.
Martin

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Luke Tierney

Wed, Aug 22, 2001 5:32 PM #

Martin wrote:

I'm surprised we don't get more of these sorts of things on UNIX.  Our
current UNIX interrupt handling approach takes an immediate LONGJMP
out of the signal handler no matter where the signal occurs (except
for two places where signals are suspended).  Any place where an
invariant is temporarily broken, any place where an assignment is not
yet complete, is a potential trouble spot.

I've been meaning to raise this issue at some point: I think we will
need to eventually spend some time re-examining how we want to handle
interrupts.  Right now on Windows/Mac interrupts are only processed at
special points in the evaluation process, which is a bit restrictive
(but perhaps unavoidable due to OS limitations).  On UNIX on the other
hand we LONGJUMP out of the signal handler except for the very few
places where the signal gets masked temporarily (GC and one place in
graphics I believe).  The UNIX approach is much too loose even now,
and it becomes even more untenable if we add any kind of threading
support.

We will I think have to come up with a cleaner model for very
selectively enabling interrupt processing, perhaps with some
integration with the external function registration mechanism Duncan
added recently (e.g. marking a function as one where LONGJMP's are
safe).  We will also need some means of controlling interrupt
behaviour from R, such as having some sort of without.interrupts and
with.interrupts functions to be able to program reliable interaction
with interrupts at the R level. (Another sort of thing we might
consider is suspending interrupts during on.exit processing.)

It's a farily big can of worms, and probably not urgent for now, but
we will need to look at it eventually.

luke

Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke@stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

John W. Eaton

Wed, Aug 22, 2001 7:56 PM #

On 22-Aug-2001, Luke Tierney <luke@nokomis.stat.umn.edu> wrote:

| We will I think have to come up with a cleaner model for very
| selectively enabling interrupt processing, perhaps with some
| integration with the external function registration mechanism Duncan
| added recently (e.g. marking a function as one where LONGJMP's are
| safe).

FWIW, Octave doesn't do this correctly either, it just does a longjmp,
same as R, which can result in leaking memory, and possibly other bad
things.

The way Bash handles this is to only set a flag in the handler for
SIGINT, then at safe places in the code, there are

  QUIT;

statements.  These are macros that expand to some code that may
longjump somewhere depending on the interrupt state.  This method
avoids the problems of jumping out of the signal handler, which may
result in memory leaks or inconsistencies in global state.  The hard
part is inserting the (many!) QUIT statements, though it is probably
somewhat easier if this type of choice is made early on in the life of
the program instead of late.

I only mention Bash because it is a program that is expected to
handle a lot of interrupts without leaking memory or crashing, and
seems to do so reasonably well.

Packages like R and Octave also have to be able to handle interrupts
in calls Fortran or C code that may run for a long time, and which
cannot be modified (at least not easily) to add the equivalent QUIT
calls.  I'm not sure what the right solution is for cases like that.

jwe

www.octave.org        | Unfortunately we were hoplessly optimistic in 1954
www.che.wisc.edu/~jwe | about the problems of debugging FORTRAN programs.
                      |                                       -- J. Backus
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Duncan Murdoch

Thu, Aug 23, 2001 5:16 AM #

On Wed, 22 Aug 2001 19:32:51 -0500, you wrote:

Delphi protects against these kind of errors with "try ... except ..."
and "try ... finally ..." blocks to deal with exceptions.  The first
one executes special code if an exception occurs, the second
guarantees execution of cleanup code.

These are implemented as a linked list of records on the stack which
the exception handler knows how to interpret.  When a particular kind
of exception occurs, all "finally" blocks are executed in the
appropriate order (and the stack base pointer is moved to simulate
exits from all active routines) until an "except" block handling the
particular kind of exception is reached.

C doesn't have these statements built in, but presumably someone has
written macros to do the same sort of thing.  Adding them would be a
lot of work, but would be worthwhile.

Duncan Murdoch
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Luke Tierney

Thu, Aug 23, 2001 7:39 AM #

On Thu, Aug 23, 2001 at 08:16:09AM -0400, Duncan Murdoch wrote:

I'm talking about something related but different: controlling the
point at which an asynchronous signal is brought into the system (and
turned into an exception if we have a proper exception system.)  R
currently has on.exit, and Robert Gentleman and I proposed a more
structured exception mechanism for possible addition to R in the neas
future.

[I sent a posting about the proposed mechanism a while back.  So far
we have received little feedback, so here is another request: Please
have a look at http://www.stat.umn.edu/~luke/R/exceptions/simpcond.html
and let us know if you have any comments/suggestions]

But that is not the issue here.  The issue is whether we allow a
SIGINT signal in UNIX (and whatever its analog is on other systems) to
interrupt the current calculation immediately, no matter where it
might be, or whether we impose more structure.  Windows/Mac pretty
much force more structure at the C level, since their analogs have to
arrive through mechnisms that require explicit polling.  So on Windows
you know that an expression like

	x = malloc(n)

will not get interrupted between the malloc call and the assignment to
x (unless some very low level tricks are involved).  On UNIX, the
signal can arrive in between those two operations.

The safe thing to do on UNIX is to have the signal handler just set a
flag which is then checked at appropriate points.  This is the
approach that John Eaton mentioned, and is used by most Scheme systems
I've looked at.  I suspect Python and Perl do this as well, but I'll
have to check.  This is also the way Java handles thread interrupts.
It would make the UNIX behavior identical to the WIndows behavior.

The drawback for systems like R and Octave is that we rely on being
able to use chunks of C/Fortran that can potentially run for a long
time (forever if they happen to get into infinite loops occasionally)
and where it is either impractical or impossible to insert flag
checking code.  For those situations it is nice to be able to use a
signal handler to force a jump out of that code.  We live without this
ability on Windows/Mac, and don't do too badly there, but it would be
nice not to compltely loose this facility on UNIX. Most numerical code
tends to not behave too badly when exited by a longjmp, but there are
no guarantees.  For example, if a piece of C code does something like this:

	static inited = FALSE;
	if (! inited) {
	    inited = TRUE;
	    ... initialize a table needed for computations ...
        }
        ... use the table ..

and a Control-C arrives in the first call after inited=TRUE is executad
but before the table is fully initialized, then future calls to this
function will happily return nonsense.

One option would be to tag routines at library regestration time as
safe for LONGJMP's or not.  That way we can disable LONGJUMP
interrupts everywhere except in explicitly marked .C or .Fortran calls
(and blocking IO operations). This will insure that no internal R
state gets messed up by asynchronous signals that arrive at on
inopportune time.

But this only addresses the C level.  On Windows/Mac, the place where
a user break is turned into an R exception is (mainly) in the internal
eval, where every 1000 calculatins (or some such number) the flag is
checked and a jump is done if the flag is set.  UNIX would work the
same way.  Since the internals know exactly where this jump can occur,
unlike jumps out of a signal handler, they can make sure all internal
state is consistent before checking the flag.

happen anywhere, so a piece of R code that does

	file <- file(file, "w")
        on.exit(close(file))
        ... do something with file ...

has a race condition: an interrupt that arrives between the creation
of the file and the registration of the on.exit handler will leave
the file open.  Something along the lines of

	without.interrupts({
            file <- file(file, "w")
            on.exit(close(file))
            with.interrupts(... do something with file ...)
        })

would be safe but is too awkward in this form. [Using a structured
exception handling mechanism, some sort of try/finally construct,
would make this code cleaner but would not resolve the race
condition.]

There are no easy solutions I think, but we need to look at a range of
options and see what works best.

[Threads add the additional problem that an interrupted thread might
be holding a lock, and failure to release the lock could cause
deadlock.  Using a structured exception handling mechanism to manage
lock release helps, but race conditions are still potentially an issue
with asynchronous interrupts.]

luke

Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke@stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Jan de Leeuw

Thu, Aug 23, 2001 8:40 AM #

You can get the macros from "C Interfaces and Implementations" by
David Hanson, Chapter 4. He has some references to other C work
by Eric Roberts and Steve Maguire on this.

On Thursday, August 23, 2001, at 05:16 AM, Duncan Murdoch wrote:

===
Jan de Leeuw; Professor and Chair, UCLA Department of Statistics;
US mail: 9432 Boulter Hall, Box 951554, Los Angeles, CA 90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw@stat.ucla.edu
homepage: http://www.stat.ucla.edu/~deleeuw
========================================================
           No matter where you go, there you are. --- Buckaroo Banzai
                    http://www.stat.ucla.edu/~deleeuw/sounds/nomatter.au
========================================================
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Thomas Lumley

Thu, Aug 23, 2001 8:50 AM #

On Thu, 23 Aug 2001, Luke Tierney wrote:

This is what is happening in the segfault caused by interrupting
Sys.sleep under Linux.  Presumably the fix is to wait and take the
interrupt after restoring the InputHandler.  It would be nice if someone
who understood this sort of thing could fix it before 1.3.1

	-thomas


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Brian Ripley

Thu, Aug 23, 2001 10:05 AM #

On Thu, 23 Aug 2001, Thomas Lumley wrote:

Well, I don't know if I will have time, but I do think I know how to do it.
I suspect I always knew this could happen, but forgot to secure against it,
and it hadn't (hasn't) come up in earnest yet .

Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Byron Ellis

Thu, Aug 23, 2001 10:37 AM #

On Thu, 23 Aug 2001, Luke Tierney wrote:

That sounds like two problems--- the first being how to make sure that the
allocation and assignment happen as an atomic operation (that's not
difficult--- we just throw a critical section around it) but also in the
event of an interrupt, what to do with things like allocated memory in the
case of an interrupt (which was Duncan's point in the earlier message that
got snipped). Unfortunately, I think the transactional (is that a word?) 
database people may be the only ones with a good handle on that particular
problem and thats expensive... Though I suppose if you're wrapping
everything in an environment and interrupt could just ensure the
destruction of the environment, but it doesn't handle global variables.
You'd need some sort of stack on each variable that kept weak references
to previous values (since R doesn't usually access things by reference
unless you force it, right?) until they're garbage collected at which
point their weak references will also be removed from the stack--- "Undo
Capability, Limited Time Only!" :-)

Step 1. Have anyone who uses statics burned at the stake for not writing
threadsafe code ;-)

I think you'll want finer grain control as well--- my guess would be to
have external calls execute within a critical section (like Java, if my
understanding is correct) unless explicitly marked otherwise but still
allow for a critical section to be entered (and left) within the code
block. An example would be something like reading from a URL where I may
spend some time blocked waiting for a connection where you would want the
ability to break out without having to wait for a timeout, but if for some
reason once the transfer is started it must be allowed to complete I would
want to engage the critical section later in the function.

I'm just sort of pulling stuff out of thin air and I don't expect this
stuff to be easy to implement, but here goes: :-)

Say, we have an ideal world where R executes in a bytecode VM like Java or
something else---I've noticed that this idea pops up every now and then
for performance reasons. In this case, why not just take it one step
further and have the R environment actually be something of a lightweight
operating system (they don't have to be bloaty and the VM's nature means
it can be fairly abstracted---no need for 'device drivers' in the
traditional sense and whatnot) that manages each of R's user-level threads
as a distinct process. The 'OS' then handles the context switching and
preemption that we'll need anyway but also handles interrupt cases by
trapping them from the operating system (using signal handlers or the
particular OS's analogue). The interrupt handler would then be able to
forcibly shutdown whatever shared resources and memory allocated to the
'process' in the same way it happens in a real OS (this would obviously
require that the I/O system be abstracted away in internal and C calls---
but I think we've already got a good start on that with the connection
mechanism and we want it for other reasons as well). My thought is that
this sort of set up would also change the flavour of the native threading
as well since it actually becomes more analogous to developing an SMP
operating system, which people already know how to do, though you would
want to keep the 'OS' bits to an absolute minimum more like an RTOS than a
UNIX or something like that (read: primitive :-)).

Byron Ellis (bellis@hsph.harvard.edu)
"Oook" - The Librarian



-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Luke Tierney

Thu, Aug 23, 2001 10:41 AM #

That's a good set of tools for starting from scratch.  But we really
have all the internal harness already in the form of the R context
structure, and Hansen's exception objects are more primitive (not a
criticism: they're designed to be quite simple) than the things we
have in the proposal I mentioned.  Wrapping some macros like Hansen's
or others around what we have isn't hard.

One thing to keep in mind though: whatever tools you use, all code has
to agree on it.  (You can't mix Hansen's macros with R's contexts
since both assume they are the only ones using LONGJMP.)  When the
implementation language provides the mechanism, like C++ or Java
exceptions, you are guaranteed that.  When grafting onto a language
like C you need to put someone in charge and make everyone else play
nice with that one.

luke

On Thu, Aug 23, 2001 at 08:40:53AM -0700, Jan de Leeuw wrote:

You can get the macros from "C Interfaces and Implementations" by
David Hanson, Chapter 4. He has some references to other C work
by Eric Roberts and Steve Maguire on this.

On Thursday, August 23, 2001, at 05:16 AM, Duncan Murdoch wrote:

On Wed, 22 Aug 2001 19:32:51 -0500, you wrote:

I'm surprised we don't get more of these sorts of things on UNIX.  Our
current UNIX interrupt handling approach takes an immediate LONGJMP
out of the signal handler no matter where the signal occurs (except
for two places where signals are suspended).  Any place where an
invariant is temporarily broken, any place where an assignment is not
yet complete, is a potential trouble spot.

Delphi protects against these kind of errors with "try ... except ..."
and "try ... finally ..." blocks to deal with exceptions.  The first
one executes special code if an exception occurs, the second
guarantees execution of cleanup code.

These are implemented as a linked list of records on the stack which
the exception handler knows how to interpret.  When a particular kind
of exception occurs, all "finally" blocks are executed in the
appropriate order (and the stack base pointer is moved to simulate
exits from all active routines) until an "except" block handling the
particular kind of exception is reached.

C doesn't have these statements built in, but presumably someone has
written macros to do the same sort of thing.  Adding them would be a
lot of work, but would be worthwhile.

Duncan Murdoch
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-
FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-
request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
._._._

===
Jan de Leeuw; Professor and Chair, UCLA Department of Statistics;
US mail: 9432 Boulter Hall, Box 951554, Los Angeles, CA 90095-1554
phone (310)-825-9550;  fax (310)-206-5658;  email: deleeuw@stat.ucla.edu
homepage: http://www.stat.ucla.edu/~deleeuw
========================================================
           No matter where you go, there you are. --- Buckaroo Banzai
                    http://www.stat.ucla.edu/~deleeuw/sounds/nomatter.au
========================================================
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Luke Tierney
University of Minnesota                      Phone:           612-625-7843
School of Statistics                         Fax:             612-624-8868
313 Ford Hall, 224 Church St. S.E.           email:      luke@stat.umn.edu
Minneapolis, MN 55455 USA                    WWW:  http://www.stat.umn.edu
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._