I have encountered a strange behavior of the str function - it seems to
modify the object that is displayed. Probably I'm using something
unsupported (objects consisting just of an external reference), but
still I'm curious as of why this happens. I create (in C code)
EXTPTRSXP and associate a class to it via SET_CLASS. Such objects works
fine until it's passed to str as the following output demonstrates:
> c<-.MCall("RController","getRController")
> c
[1] "<RController: 0x3be5d0>"
> str(c)
Class 'ObjCid' length 1 <pointer: 0x3be5d0>
> c
<pointer: 0x3be5d0>
> str(c)
length 1 <pointer: 0x3be5d0>
The .MCall basically produces an external reference and assigns a class
(ObjCid) to it. There's a corresponding print method and it works fine.
However, when str is called, it strips the class information from the
object as a repeated call to str also shows:
> str(c); str(c)
Class 'ObjCid' length 1 <pointer: 0x3be5d0>
length 1 <pointer: 0x3be5d0>
Is this behavior intentional, undocumented or simply wrong?
Cheers,
Simon
[Tested with R 2.0.0 release (2004-10-04) on Mac OS X 10.3.5 - I have
currently no other machine to test it on, but I very much suspect that
this is platform-independent.]
the C code used to generate the object:
SEXP class, sref = R_MakeExternalPtr((void*) obj, R_NilValue,
R_NilValue);
PROTECT(class = allocVector(STRSXP, 1));
SET_STRING_ELT(class, 0, mkChar("ObjCid"));
SET_CLASS(sref, class);
UNPROTECT(1);
Destructive str(...)?
7 messages · Simon Urbanek, Luke Tierney, Peter Dalgaard +1 more
On Fri, 29 Oct 2004, Simon Urbanek wrote:
I have encountered a strange behavior of the str function - it seems to modify the object that is displayed. Probably I'm using something unsupported (objects consisting just of an external reference), but
Yes, and I think it is documented somewhere, but I can't lay my hands on it right now.
still I'm curious as of why this happens. I create (in C code) EXTPTRSXP and associate a class to it via SET_CLASS. Such objects works fine until it's passed to str as the following output demonstrates:
> c<-.MCall("RController","getRController")
> c
[1] "<RController: 0x3be5d0>"
> str(c)
Class 'ObjCid' length 1 <pointer: 0x3be5d0>
> c
<pointer: 0x3be5d0>
> str(c)
length 1 <pointer: 0x3be5d0> The .MCall basically produces an external reference and assigns a class (ObjCid) to it. There's a corresponding print method and it works fine. However, when str is called, it strips the class information from the object as a repeated call to str also shows:
> str(c); str(c)
Class 'ObjCid' length 1 <pointer: 0x3be5d0> length 1 <pointer: 0x3be5d0> Is this behavior intentional, undocumented or simply wrong?
The issue is almost certainly that something has forgotten/decided not to either set or respect SET_NAMED on the object, so when str does object <- unclass(object) or some such, the original object gets changed. Now the `something' has to be C code: possibly yours but probably something in R itself. I think this is intentional. External references do not get copied, and the advice I recall is to wrap them in a list for use at R level (and before setting a class on them). In RODBC I took another tack, and attach the reference as an attribute to a `documentation' object. str() probably ought to be more cautious when it encounters at external reference or similar exotic object, since it will look at list elements and attributes. Brian
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Sat, 30 Oct 2004, Prof Brian Ripley wrote:
On Fri, 29 Oct 2004, Simon Urbanek wrote:
I have encountered a strange behavior of the str function - it seems to modify the object that is displayed. Probably I'm using something unsupported (objects consisting just of an external reference), but
Yes, and I think it is documented somewhere, but I can't lay my hands on it right now.
still I'm curious as of why this happens. I create (in C code) EXTPTRSXP and associate a class to it via SET_CLASS. Such objects works fine until it's passed to str as the following output demonstrates:
c<-.MCall("RController","getRController")
c
[1] "<RController: 0x3be5d0>"
str(c)
Class 'ObjCid' length 1 <pointer: 0x3be5d0>
c
<pointer: 0x3be5d0>
str(c)
length 1 <pointer: 0x3be5d0> The .MCall basically produces an external reference and assigns a class (ObjCid) to it. There's a corresponding print method and it works fine. However, when str is called, it strips the class information from the object as a repeated call to str also shows:
> str(c); str(c)
Class 'ObjCid' length 1 <pointer: 0x3be5d0> length 1 <pointer: 0x3be5d0> Is this behavior intentional, undocumented or simply wrong?
The issue is almost certainly that something has forgotten/decided not to either set or respect SET_NAMED on the object, so when str does object <- unclass(object) or some such, the original object gets changed. Now the `something' has to be C code: possibly yours but probably something in R itself. I think this is intentional. External references do not get copied, and the advice I recall is to wrap them in a list for use at R level (and before setting a class on them). In RODBC I took another tack, and attach the reference as an attribute to a `documentation' object. str() probably ought to be more cautious when it encounters at external reference or similar exotic object, since it will look at list elements and attributes.
It's probably just unclass itself, not an issue with NAMED. External references are one of a handful of objects that are handled as references to mutable objects rather than as immutable values (the main other one being environments). unclass is destructive when applied to a reference object. At some point it might make sense to make unclass signal an error when used on a reference object, and clean up the things this breaks, including str and a number of other print methods. On the other hand, the same issue exists with all attributes on referece objects, so the safest approach is to use a wrapper as Brian suggests. luke
Luke Tierney
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke@stat.uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
Luke Tierney <luke@stat.uiowa.edu> writes:
str() probably ought to be more cautious when it encounters at external reference or similar exotic object, since it will look at list elements and attributes.
It's probably just unclass itself, not an issue with NAMED. External references are one of a handful of objects that are handled as references to mutable objects rather than as immutable values (the main other one being environments). unclass is destructive when applied to a reference object. At some point it might make sense to make unclass signal an error when used on a reference object, and clean up the things this breaks, including str and a number of other print methods. On the other hand, the same issue exists with all attributes on referece objects, so the safest approach is to use a wrapper as Brian suggests.
Argh. I think this means that there is a bug in the tcltk code since tclObj class objects are exactly external references with a class attribute. It doesn't seem to have bitten anyone yet, though. Or were you saying that we should fix str() instead? Anyways, Tcl objects do provide a rather nice illustration of why reference objects are non-duplicatable (which is the reason behind unclass being destructive). They have a finalizer that decrements the Tcl reference count when the R object is destroyed. To avoid bad things resulting from decreasing the refcount multiple times, duplication would require an increment of the reference count, and R just isn't geared to do that: we'd need to introduce something like an R_RegisterCDuplicator function.
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Thank you all for your replies. I wrapped the reference in a LISTSXP and everyone's happy (I know that the docs say one ought to do so right away, but I was curious what breaks ;)).
On Oct 30, 2004, at 5:55 PM, Peter Dalgaard wrote:
Anyways, Tcl objects do provide a rather nice illustration of why reference objects are non-duplicatable (which is the reason behind unclass being destructive). They have a finalizer that decrements the Tcl reference count when the R object is destroyed. To avoid bad things resulting from decreasing the refcount multiple times,
Now, hold on a second - I thought the main point of EXTPTR is that the finalizer is called only once, that is when the last instance of the reference is disposed of by the gc (no matter how many copies existed meanwhile). Am I wrong and/or did I miss something? I did some tests which support my view, but one never knows ... Cheers, Simon
Simon Urbanek <simon.urbanek@math.uni-augsburg.de> writes:
Now, hold on a second - I thought the main point of EXTPTR is that the finalizer is called only once, that is when the last instance of the reference is disposed of by the gc (no matter how many copies existed meanwhile). Am I wrong and/or did I miss something? I did some tests which support my view, but one never knows ...
How do you ensure that the finalizer is called once? By *not* copying the reference object! You can have as many references to it as you like (i.e. assign it to multiple variables), and the object itself is not removed until the last reference is gone, but if you modify the object (most likely by setting attributes, but you might also change the C pointer payload in a C routine), all "copies" are changed:
x <- as.tclObj(pi) x
<Tcl> 3.14159265359
y <- x y
<Tcl> 3.14159265359
mode(x)
[1] "externalptr"
attr(x, "Simon") <- "Urbanek" attributes(y)
$class [1] "tclObj" $Simon [1] "Urbanek"
O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907
Just to be 100% clear, the finalizer is called *at most* once if (as in tcltk) R_RegisterCFinalizer is called. If you want it to be called exactly once, you need to use R_RegisterCFinalizerEx. The issue is that there may not be a final gc(). BTW, str(x) is destructive here too, so we do need to improve str(). I have code written, but access to svn.r-project.org is down (yet again).
x <- as.tclObj(pi) str(x)
Class 'tclObj' length 1 <pointer: 0x860c3f8>
str(x)
length 1 <pointer: 0x860c3f8>
On 31 Oct 2004, Peter Dalgaard wrote:
Simon Urbanek <simon.urbanek@math.uni-augsburg.de> writes:
Now, hold on a second - I thought the main point of EXTPTR is that the finalizer is called only once, that is when the last instance of the reference is disposed of by the gc (no matter how many copies existed meanwhile). Am I wrong and/or did I miss something? I did some tests which support my view, but one never knows ...
How do you ensure that the finalizer is called once? By *not* copying the reference object! You can have as many references to it as you like (i.e. assign it to multiple variables), and the object itself is not removed until the last reference is gone, but if you modify the object (most likely by setting attributes, but you might also change the C pointer payload in a C routine), all "copies" are changed:
x <- as.tclObj(pi) x
<Tcl> 3.14159265359
y <- x y
<Tcl> 3.14159265359
mode(x)
[1] "externalptr"
attr(x, "Simon") <- "Urbanek" attributes(y)
$class [1] "tclObj" $Simon [1] "Urbanek"
Brian D. Ripley, ripley@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595