reference counting problem in .Primitive's?
On Thu, 23 Apr 2009, William Dunlap wrote:
-----Original Message----- From: luke at stat.uiowa.edu [mailto:luke at stat.uiowa.edu] Sent: Thursday, April 23, 2009 11:06 AM To: William Dunlap Cc: r-devel at r-project.org Subject: Re: [Rd] reference counting problem in .Primitive's? On Thu, 23 Apr 2009, William Dunlap wrote:
I think the following rather wierd expressions show a problem in how some of the .Primitive functions evaluate their arguments.
I haven't
yet thought of a way that a nonabusive user might run into
this problem.
In each case the first argument, x, is modified in the course of evaluating the second argument and then modified x gets used as the first argument:
x<-as.integer(1:5); y <- x + { x[3]<-33L ; 1L } ; y
[1] 2 3 34 5 6
x<-2^(0:4) ; y <- log(x, { x[3]<-64 ; 2 }) ; y
[1] 0 1 6 3 4 The reason I think it looks like a sharing problem (and not an order of evaluation problem) is that if your modification to x
causes it to
use a new block of memory then the unmodified version of x gets used as the first argument. E.g.,
x<-as.integer(1:5) ; y <- x + { x[3]<-33.3; 1L} ; y
[1] 2 3 4 5 6 I haven't yet thought of a way that a nonabusive user might run into this problem.
An hour after writing this one of our support folks sent me some
user-written code that contained something very close to this idiom;
the second argument to ":" is an altered version of the first argument:
lengths<-5:1 ; start<-1
for(i in seq(along=lengths)) {
thisSeq <- start:((start <- start + lengths[i])-1)
print(thisSeq)
}
[1] 1 2 3 4 5
[1] 6 7 8 9
[1] 10 11 12
[1] 13 14
[1] 15
That works. However, if that user had also used 'start[] <- ' instead
of 'start <- ' then they would have run into this bug:
lengths<-5:1 ; start<-1
for(i in seq(along=lengths)) {
thisSeq <- start:((start[] <- start + lengths[i])-1)
print(thisSeq)
}
[1] 1 2 3 4 5
[1] 10 9
[1] 13 12
[1] 15 14
[1] 16 15
If they use start[] or start[1] consistently in the call to ":" then
they
don't hit the bug.
Unless you know of somewhere where it is guaranteed that evaluation order for : is left to right then this code is buggy. (At one point I either had or serously thought about having codetools warn about assignments in arguments other than in a very limited number of cases.) As I said previously unless I can convince myself that the current behavior isn't consistent with _some_ evaluation order in each case (even if it changes with changes in expressions used) then I don't think it is worth doing anything about other than explicitly stating that evaluation order is undefined.
You are probably right. I have not yet looked at the code but am virtually certain it does not try to temporarily bump up the NAMED values on argument values. Doing so would cure this but probably at serious cost to performance, as NAMED values of 2 cannot be brought down again and so cause copying on next modify. (Might be worth running some tests on that though to see what the cost would be).
So, if NAMED were not limited to 0,1,or 2 this sort of thing might be avoided with less pain?
If we had full reference counting I think we could avoid this fairly easily, but I'm not convinced it is worth avoiding as there are good reasons to allow indeterminacy in order of evaluation (compiler optimizations, parallelization, and such) and in any case going to full reference counting is not realistic without a full rewrite of the engine (and has its own potential performance issues). luke
I'm not sure if it is written anywhere that argunments of primitives (BUILTINS in articular as those are always strict; SPECIALS can be non-strict but log is strict) are evaluated in any particular order. All these examples are consistent with _some_ evaluation order, but not the same one. It might be possible to show that the results obtained in these situations will always be consistent with some evaluation order, in which case documenting that order of evaluation is unspecified would be good enough form me. It may also be possible that an order that does compound expressions first and then symbols would also solve the issue (I don't think I would want to do this in the interpreter though because of the performance overhead.) luke
Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
--
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Luke Tierney
Chair, Statistics and Actuarial Science
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke at stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu