Skip to content

s4 methods and base

8 messages · Marsland, John, Brian Ripley, John Chambers +2 more

#
I'm sure that many people are in the same position as me in that they are
trying to write packages and code that is vaguely "future proof".

Would it be possible to get some guidance on how the R-core team see the
evolution of the "base" package with regard to s4 methods.

There seem to be quite a lot of inconsistencies between s3 and s4 methods
and classes currently and this (I'm sure) is only to be expected in a period
of transition. eg POSIXlt vs POSIXt and POSIXct. And there seem like dozens
of print methods to convert - it's not an enviable task and I'm sure it will
take time! ... if indeed you do see R being purged of s3 by some point in
the future.

But I think it would be helpful if there was some general guidance in terms
of the direction R is going?

For my own part I am very impressed with the power of the s4 methods and the
functional nature of the language. But there are some comparisons with (say)
Python that would make it even better. Many of these relate to the base
package and the very diverse collection of demos, stats, dates and file
handling etc... are there plans to break this up so as to make the
instruction set smaller (an obvious use of s4 methods) and the whole
language more light-weight and like a scripting language. Others seem to be
thinking in this direction with the addition of "import" and namespaces, but
more detail (or a pointer towards it if it exists!) would be helpful.

Regards,

John Marsland


********************************************************************** 
This is a commercial communication from Commerzbank AG.\ \ T...{{dropped}}
#
On Tue, 5 Aug 2003, Marsland, John wrote:

            
Those are S3 classes, and there is nothing transitional about them!
There are no S4 classes in `base R' that I am aware of to be inconsistent.
Are you sure you understand the difference?
First you need to define S4 classes, and there are currently no moves to 
do that for the statistical modelling software, for example.
We are still using S3 classes designed a decade ago, and I expect to be 
still using them in another decade.  I even have S4 versions of some of 
them (e.g. lda, multinom) and no plans to use those in R.
6 days later
#
"Marsland, John" wrote:
Your concerns are reasonable, and deserve some thoughtful discussion
from the community.  r-devel is a good place to start.

The topic is not really S4 methods and base, but what S4 methods and
classes imply about S3 methods and classes.  (The S4 methods and classes
are not in package base, and the overall drift of thinking at the moment
is not towards adding to base, but if anything considering unbundling of
some material from base.)

So your request might be rephrased as asking for advice to owners of
software that currently use S3 methods and classes.

There is no proposal to make S3 methods defunct in the forseeable
future.  Owners of existing code that uses them should not feel
pressured to rewrite the code, solely to keep the code working.

It's technically true that existing S3 methods software can just ignore
the existence of S4 methods and classes, at least under some
circumstances.  But maintainers of software using S3 methods and classes
might want to consider conversions or partial conversions, when and if
they decide to revise the software.  

Similarly, the methods package COULD simply have ignored existing S3
classes.  There would then be no "inconsistencies" because S3 classes
are not "classes" at all in the S4 sense: there is no definition for
them, only objects that contain the class name in their "class"
attribute. But  in fact the methods pacakge provides some heuristic
mapping of "old classes" into new classes.  It will be useful to get
feedback if the mapping is not right in specific examples.

To summarize the situation so far:

1.  If S3 classes are not defined as S4 classes, no methods can be
written for S4 generic functions for them, and objects from these
classes cannot be slots in S4 classes.  So some attempt to link the two
seemed worthwhile.

2.  In the methods package, an attempt was made to map known S3 classes
in the R code in base into S4 classes, reflecting S3 inheritance, so
that existing classes could be used for methods and slots.

Here's how that was done:

An object using S3 methods can have one or more strings in its class
attribute.  Heuristically, the first of these strings is interpreted as
"the" class of the object.  Subsequent strings are classes which the
first class inherits from--"superclasses" of the first class, in common
terminology.  (See section A.5 of the "Statistical Models in S" book, pp
463-467.)

With S4 classes, every object has a single class, with an explicit
definition.  That class can have superclasses  (defined as the classes
this class contains).  ALL objects from the class have the same single
string as the value of class(x).

The goal was to map each S3 class into an S4 class, and to infer
superclasses from places where two or more strings were used as a class
attribute.  Clearly a guessing game, because there really is no
"definition" of an S3 class.

So, for example, there appear to be objects with class attribute
c("ordered", "factor"), meaning that the object has main class "ordered"
but inherits from "factor". This gets mapped into S4 classes:

--------------
R> getClass("ordered")
Virtual Class

No Slots, prototype of class "NULL"

Extends: 
Class "factor", directly
Class "oldClass", by class "factor"
R> getClass("factor")
Virtual Class

No Slots, prototype of class "NULL"

Extends: "oldClass"

Known Subclasses: "ordered"
----------------

Both classes are "virtual" in the S4 sense, because you can't say
new("ordered"), and you can't say that because we haven't tried to
legislate what "slots" objects from any of these classes have.  And an
object from a non-virtual S4 class will have a single string as its
class, meaning that S3 inheritance would cease to work.

Undoubtedly, some existing classes were missed and/or misinterpreted. 
Feedback on these (as in your second e-mail) will be  helpful.  Yes, it
looks as if POSIXlt slipped through the cracks, because there wasn't any
R code that assigned that as a class.

As for the relation of the POSIX* classes:  Your interpretation (in your
second e-mail) sounds reasonable--that POSIXt is a superclass to both
POSIXlt and POSIXct.  Unfortunately both the literal inerpretation of
the documentation and the code itself seem to contradict that.  There
are several instances of expressions such as:
        class(z) <- c("POSIXt", "POSIXct")
This says the opposite:  that POSIXct is a superclass of POSIXt.  There
doesn't seem to be any code that links POSIXlt to POSIXt.



3. Anyway, to get back to some general suggestions.  The heuristic
mapping in the methods package has to be just a stop-gap.  We should try
to fix errors and omissions, but as noted, we can't push it much further
at least for objects with more than one string in their class
attribute.  And there are several examples that just can't be included,
because objects start with the same class string, but then follow that
with DIFFERENT superclass strings.  (The use of "aov" and "maov" seems
to be of this form.) No single mapping to S4 classes can capture this
behavior explicitly.

There seem to be some examples where the S3 inheritance wasn't
understood or was used in a way inconsistent with ANY S4 class
structure.

A better solution in the long run is to try to convert S3 classes to
non-virtual S4 classes, where possible.

Although owners of S3 class software shouldn't feel threatened, they
might consider conversions, perhaps when revising the software for other
reasons:

 - classes that don't have multiple strings in the class attribute can
often just be converted to a non-virtual S4 class, so long as objects
always have the same attributes.  Attributes go into slots (the slot
must have some specified class, but there are ways to allow some
variation in the actual type of data in the slot).  S3 methods can
generally stay unchanged.

 - classes with multiple strings can also be converted, again if they
have consistent attributes.  In this case, DIRECT S3 methods (those
dispatching from the first string) are OK, but INHERITED S3 methods are
not, so some conversion to S4 methods may be needed.

 - classes that have inconsistent superclasses from one object to
another may just not be convertible, but it's worth looking at examples.

When conversion or re-design is feasible, it's likely worthwhile in the
long run. 

Regards,
 John Chambers

  
    
#
John
I'm curious about the logistics of a partial conversion. Initially I 
think I will avoid classes that inherit from other classes. But what 
general approach would you suggest for classes that have an object of 
another class in their structure. Can the larger object be an S4 class 
and the contained object an S3 class, or would you start the other way 
around, or is it not possible?

Also, would you suggest converting to namespaces before or after 
converting to S4 classes?
(newbie question) Are you using "contains" as a synonym for "extends," 
meaning that many classes can contain the same superclass, or is this 
intending to indicate that the logic is the reverse of (my understanding 
of) S3 logic?
(newbie question) I often use attributes to stick "extra" information on 
an object, like the date of retrieval from a database onto a matrix. If 
the matrix is an S4 matrix class, does this mean that I have to define a 
new class in S4 that extends the matrix class in order to stick on extra 
information?

Thanks,
Paul Gilbert
#
Paul Gilbert wrote:
I would think it's wise to imagine converting all the functions in an S3
inheritance "tree" at the same time.

The basic point is that ordinary formal classes have to have an
unambiguous "structure"; that is, known slots each having a known class.

So the S3 class can't be extended by an S4 class, since the latter
wouldn't then have a known structure.

The other direction is technically feasible, but it feels like a bad
idea in general.
At the moment, they don't work together, so it really amounts to
choosing which matters most to a given project.

The hope is to make them work together for 1.8, but that's still a hope
rather than a certainty.
Yes, roughly.
The S language terminology "Class B extends class A" is analogous to
"Class A is a superclass of class B" in other languages.  Saying "Class
B contains class A" is a special form of "extends" (the most common
one), where B has all the slots of A, and perhaps others as well.  The
setIs() function allows a more general version of "extends" that doesn't
depend on the data structures being compatible.
A good question, which points out some of the tradeoffs involved.  Yes,
if you want to work in terms of ordinary formal classes, the set of
slots (the "structure") is determined by the class, not by the
individual object.

It's fundamental to most formal class systems that objects from a class
have a known structure, in our case meaning that the slots are fixed as
to name and class.  This is indeed more restrictive than the traditional
S approach (dating back well before "white book" classes) of attaching
attributes, on a per-object basis, while retaining the underlying
"structure" (a matrix, as in your example).

It's a basic difference, and the formality has been adopted in many
languages because it makes possible operations that a less formal system
can't do, especially in terms of automating some computations.  Two
examples:  an object can be tested for being a valid member of a class;
and methods can be generated automatically to coerce an object to a
superclass, or to replace the part of an object that corresponds to a
superclass.

There are of course ways to add general annotation to a class, e.g., by
having a slot for miscellany (that slot might be a named list).

Regards,
 John

  
    
#
John Chambers wrote:

            
There is no inheritance in the S3 sense when an object contains an 
object of another class. Am I missing something? Are you using this term 
loosely or is there necessarily inheritance in S4 when one object 
contains another object?
I can see that it feels bad, but it seems like the only way to do a 
partial conversion. My problem is that it is much easier to do a big 
project in small bites.
Now if I understand this correctly, and to be pedantic, one would say an 
object of class B contains and object of class A, and,  the definition 
of class B extends the definition of class A. Or, said differently,  a 
class A object is a subset of a class B object, and a class A definition 
is a superclass of a class B definition. Does that make sense?
Would there be a point in formalizing this so that everyone does not 
need to define extensions of all the basic classes with an extra 
miscellany slot. I'm sure this sounds like herecy, but the alterative is 
that many people will be doing somewhat similar things using different 
class and slot names, and there will be a lot of unnecessary 
incompatibility among packages.

Thanks again,
Paul
#
On Wed, Aug 13, 2003 at 10:06:45AM -0400, Paul Gilbert wrote:
In this instance I have to say that it is a good idea to try some of
  these things out, to read John's book, or some other books on object
  oriented programming or to look at some existing packages that use
  S4 methods. There is rather a lot of ground you are trying to cover
  in your questions.

  Inheritance (or lack of it) is directly under the control of the
  programmer. The difference in going from S3 to S4 is that you
  actually can control this in a systematic way. If I have an object
  of class "a", I can extend it in many ways. One might be to create a
  new class "b", say, that adds a new slot (or two). 

  For example,
 
   setClass("a", representation(d="numeric"))
   setClass("b", representation(e="character"), contains="a")

  so that b extends a and instances of a have one slot named d, while
  instances of b have two slots, one named d and one named e.
  If we want to call the generic function foo, and it has a method for
  class a objects, but not for class b objects then we would use the
  method for class a (one of the important points of inheritance).

  Now, I could get b's in other ways,

  setClass("b2", representation(d="numeric", e="character"))

  this class b2 does not extend the class a above (but it looks very
  similar to the class b previously defined). Instances from the two
  would be almost indistinguishable - but they are very different in
  important ways.

  I can also do the following:

  setClass("b3", representation(d="a", e="character"))

  so that the d slot in this case contains objects of class "a"
  but in no way does this class b3 extend class a.

  Calling my generic foo, as described above will not dispatch to the
  a method, it will either find a b method or call the default handler
  (usually an error message at that point).

  Using inheritance usually gives you cleaner code and less of it to
  write. But it requires that you think about your classes and methods
  before you start writing them.

  In S3 there is no real sense of inheritance (which is why we are
  moving to S4).

  Robert

  
    
#
Paul Gilbert wrote:
......................
No, I'm not using it loosely, but I am using it as it's used for S4
classes.

"contains" is an argument in defining a class via setClass().  Its
meaning is essentially the mathematical one, as in "Set B contains set
A."  If we think of a class as corresponding to a set of slots, that is,
a set of (name, class) pairs, then "class B contains class A" says that
the set of slots for B contains that for A.

"extends" is a more general concept in the S class system.  See
Programming with Data for details and examples, but the essential
meaning of "class B extends class A"  is that any instance of B can be
used when an instance of A is needed.  As I said, "contains" is the most
common but not the only way to create an "extends" relation.

"inherits" is an S3 concept defined in Statistical Models in S.  In
practice, it says ONLY that "A" appears in the class attribute for a
particular object, and not necessarily as the first string.  One of the
problems with an informal, instance-based, system is that there is no
way to verify what if anything inheritance implies.
If we rephrase the question as "can an S4 class have objects from S3
classes as slots", the answer is yes, but you should "register" the S3
classes to the formal class system, by calling setOldClass (see the
online documentation).  That's what we did for the base package's
classes, as discussed re John Marslan's mail. So, e.g.,
  setOldClass(c("ordered", "factor"))
to register the S3 class "ordered" with its inheritance from "factor".

Registration isn't enforced yet, but I would like to do so, in order to
require that all the slots in a class definition correspond to defined
classes.
There are ways to provide a degree of "automatic" class extension for
features that are midway between universal necessities and purely
specialized techniques.  I'm not quite convinced this is a sufficiently
desirable mechanism, but in any case a way to start would be a nice
simple implementation, e.g., as a class with corresponding methods, say,
to extract or assign a "note"; i.e., for generic functions `note(object,
name)' and `note(object, name) <- value'.

John