common base functions stripping S3 class - R-devel

Tue, Nov 18, 2014 5:15 AM #

On 17/11/2014, 4:23 PM, Murat Tasan wrote:

What I meant is that you can just try it.  If you think your users will
want to subset your object, then you can try it yourself, and you'll see
that you need to write a `[` method.

Duncan Murdoch

The most common motivating example for S3 classes (I've seen) is
overriding plot().
I imagine many people would want to take a base structure (e.g. a
simple vector) and 'class-ify' it solely for the purposes of
encapsulating domain-specific plotting commands:

MyClass <- function(x) structure(x, class = "MyClass")
plot.MyClass <- function(...) ## large complicated plotting function here.

Those examples, however, basically never mention the need to then
override/implement many other common methods, `c`, `[`, `unique`,
`as.list`, `as.data.frame`, etc.
I believe this is a _huge_ tripping point for new-comers to R
programming (even if they are not new-comers to programming more
generally).
In my own experience, I had to work backwards by finding methods that
dropped my class, then examine the source for those methods, find the
underlying calls in those methods that dropped the class, and continue
on down the (rabbit hole) call stack... this is hardly ideal for any
programmer, I think, experienced or novice.

In the end, I completely understand your point (e.g. with the sorted
numbers example), and I don't know how to resolve the issue, save
perhaps for more explicit warnings when introducing S3 programming?

My own solution, by the way, is to define a single ancestor class that
either (i) errors immediately if some assumptions fail, or (ii)
dispatch to the default method while working to properly restore class
attributes of the return object.
Most of my 'useful' classes inherit from this 'dummy' ancestor class,
just to save a lot of re-writing dispatch code.
An example of where I error-out immediately is something like `c`,
where I'll check to make sure all args are of the same class type...
if they aren't, I could use R's coercion rules, but I've opted for the
'type-safe' approach of mixing variables when dealing with my own
custom classes.
An example of where I opt for preserving class is `[`.
If I write a class where subsetting doesn't make sense, I'll have to
write a fail-fast implementation of `[` for that specific class.
The whole thing seems... inelegant (for lack of a better word), which
is what prompted my post in the first place.

Cheers, and thanks for the discussion and points... they're definitely
helpful in guiding development.

-murat


On Mon, Nov 17, 2014 at 9:19 AM, Duncan Murdoch
<murdoch.duncan at gmail.com> wrote:

On 17/11/2014 10:41 AM, Hadley Wickham wrote:

Generally the idea is that the class should be stripped because R has no
way of knowing if the new object, for example unique(obj), still has the
necessary properties to be considered to be of the same class as obj.
Only the author of the class knows that.  S4 would help a bit here, but
only structurally (it could detect when the object couldn't possibly be
of the right class), not semantically.

There are two possible ways that S3 methods could handle subclasses:

* preserve by default (would also have preserve all attributes)
* drop by default

If you could really on either system consistently, I think you could
write correct code. It's very hard when the defaults vary.

(In other words, I agree with everything you said, except I think if
the default was to preserve you could still write correct code)


I don't see how default preserving could work.

For example, I might define a "SortedNumbers" class, which is a vector of
numbers in non-decreasing order.  I could define min() and max() methods for
it which would be really fast, because they only need to look at the first
or last elements.  But a rev() method wouldn't make sense, so I wouldn't
define one of those.

If the rev() default method left the class as "SortedNumbers", then my min()
and max() calculations would end up broken.
So maybe I should have defined a rev() method that just stops with an error.
But classes don't own methods, so I'd have no way of knowing that someone
else defined a new generic (e.g. shuffle()) that broke things.  I don't see
any way around this within the S3 system.

In fact, some default methods do preserve the class, for example the
replacement method `[<-`.  I could take a SortedNumbers vector of the
numbers 1:10, and set element 1 to 11, and end up breaking min() and max().
This is a problem with the current design.

Probably we should do a better job of documenting which methods preserve the
class and which ones don't.  (For example, `[` doesn't preserve the class,
even though it would be fine to do so in this example.)  But there are a lot
of things to do, and this is one thing that is pretty easy to figure out
without documentation, so I'd say it's a low priority.

Duncan Murdoch