reference counting problem in .Primitive's?

-----Original Message-----
From: luke at stat.uiowa.edu [mailto:luke at stat.uiowa.edu]
Sent: Thursday, April 23, 2009 11:06 AM
To: William Dunlap
Cc: r-devel at r-project.org
Subject: Re: [Rd] reference counting problem in .Primitive's?

On Thu, 23 Apr 2009, William Dunlap wrote:

I think the following rather wierd expressions show a problem in how
some of the .Primitive functions evaluate their arguments.
I haven't
yet thought of a way that a nonabusive user might run into
this problem.
In each case the first argument, x, is modified in the course of
evaluating the second argument and then modified x gets used
as the first argument:

x<-as.integer(1:5); y <- x + { x[3]<-33L ; 1L } ; y
[1]  2  3 34  5  6
x<-2^(0:4) ; y <- log(x, { x[3]<-64 ; 2 }) ; y
[1] 0 1 6 3 4

The reason I think it looks like a sharing problem (and not an order
of evaluation problem) is that if your modification to x
causes it to
use a new block of memory then the unmodified version of x gets
used as the first argument.  E.g.,

x<-as.integer(1:5) ; y <- x + { x[3]<-33.3; 1L} ; y
[1] 2 3 4 5 6

I haven't yet thought of a way that a nonabusive user might run
into this problem.
An hour after writing this one of our support folks sent me some
user-written code that contained something very close to this idiom;
the second argument to ":" is an altered version of the first argument:

  lengths<-5:1 ; start<-1
  for(i in seq(along=lengths)) {
       thisSeq <- start:((start <- start + lengths[i])-1)
       print(thisSeq)
  }
  [1] 1 2 3 4 5
  [1] 6 7 8 9
  [1] 10 11 12
  [1] 13 14
  [1] 15

That works.  However, if that user had also used 'start[] <- ' instead
of 'start <- ' then they would have run into this bug:

 lengths<-5:1 ; start<-1
 for(i in seq(along=lengths)) {
       thisSeq <- start:((start[] <- start + lengths[i])-1)
       print(thisSeq)
 }
 [1] 1 2 3 4 5
 [1] 10  9
 [1] 13 12
 [1] 15 14
 [1] 16 15

If they use start[] or start[1] consistently in the call to ":" then
they
don't hit the bug.
Unless you know of somewhere where it is guaranteed that evaluation
order for : is left to right then this code is buggy.  (At one point I
either had or serously thought about having codetools warn about
assignments in arguments other than in a very limited number of
cases.)

As I said previously unless I can convince myself that the current
behavior isn't consistent with _some_ evaluation order in each case
(even if it changes with changes in expressions used) then I don't
think it is worth doing anything about other than explicitly stating
that evaluation order is undefined.

You are probably right.  I have not yet looked at the code but am
virtually certain it does not try to temporarily bump up the NAMED
values on argument values.  Doing so would cure this but probably at
serious cost to performance, as NAMED values of 2 cannot be brought
down again and so cause copying on next modify. (Might be worth
running some tests on that though to see what the cost would be).
So, if NAMED were not limited to 0,1,or 2 this sort of thing might be
avoided with less pain?
If we had full reference counting I think we could avoid this fairly
easily, but I'm not convinced it is worth avoiding as there are good
reasons to allow indeterminacy in order of evaluation (compiler
optimizations, parallelization, and such) and in any case going to
full reference counting is not realistic without a full rewrite of the
engine (and has its own potential performance issues).

luke
I'm not sure if it is written anywhere that argunments of primitives
(BUILTINS in articular as those are always strict; SPECIALS can be
non-strict but log is strict) are evaluated in any particular order.
All these examples are consistent with _some_ evaluation order, but
not the same one.  It might be possible to show that the results
obtained in these situations will always be consistent with some
evaluation order, in which case documenting that order of evaluation
is unspecified would be good enough form me.  It may also be possible
that an order that does compound expressions first and then symbols
would also solve the issue (I don't think I would want to do this in
the interpreter though because of the performance overhead.)

luke

Bill Dunlap
TIBCO Software Inc - Spotfire Division
wdunlap tibco.com

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa                  Phone:             319-335-3386
Department of Statistics and        Fax:               319-335-3017
    Actuarial Science
241 Schaeffer Hall                  email:      luke at stat.uiowa.edu
Iowa City, IA 52242                 WWW:  http://www.stat.uiowa.edu

reference counting problem in .Primitive's?

Thread (4 messages)