Skip to content

SV4 on R?

5 messages · Brian Ripley, John Chambers, Luke Tierney +1 more

#
Dear R-Developers,

The traffic today on s-news where Terry Therneau, I, and others
are reporting some of the problems we have had using or
converting applications to SV4 reminded me of something
Duncan Temple Lang had mentioned to me a year ago that
I wanted to follow up on.  I recall that Duncan said either that if SV4
were to be implemented in R that it would not be
the default behavior, or that through an option the
new behavior could be turned off.  Would you mind
bringing me up to date on this?  I sincerely hope
that what I thought Duncan said last year still
applies.  Thank you very much for your consideration.  -Frank

P.S.  If anyone receiving this does not subscribe to s-news
and wants to receive background information please let me know.
#
On Thu, 13 Sep 2001, Frank E Harrell Jr wrote:

            
There are lots of aspects of S4 behaviour.  Some R already has
(connections, preserving logicals in data frames, for example).  I guess
you are concerned about classes. There I don't think a decision has been
taken.  There is a `methods' package in R-devel, written by John Chambers
that is not yet finished (or I believe so, as it is being changed quite
rapidly and is only partially documented).  What is likely is that in R
1.4.0 all objects will have classes, and that functions like identical(),
as() and is() will be available.  So far this has caused only a few
problems, mainly because there are not problems like the "named" class
presents in S4. (Bit of background: any vector with names in S4 is of class
"named", and that includes lists with names.)

Assuming we decide to keep it I don't expect that all objects will have a
class is not going to be optional.  You will get classes like "numeric"
from class(x).  I think that's a good move.  It does mean that a test for
null class should be a test of !is.object(x), probably.

I do think you and Terry and others are being too negative. The problems
are not with designing new S4 classes to do the job, but porting S3 ones.
I believe too few people have tried to re-design to make use of S4 classes
for there to be enough evidence.  FWIW, MASS and nnet have been using
S4 classes for a few years quite sucessfully: we re-designed the classes.
We stopped because no one else seemed to be moving, and the basic modelling
functions are still confined to S3-style objects.


Brian
#
Prof Brian D Ripley wrote:
In the current 1.4, the new definitions are in a package "methods". 
Unless that package is attached, the old versions (only) apply,
including the definition of class.  So at present, you can have the
old-style class definition if you want.  In the long run, that may
change as Brian suggests.

In the R context, there are a number of future plans such as threading,
namespaces and compiling.  Defining those using the "S3" model seems to
me to be difficult and undesirable.  Luke and Duncan can comment more
authoritatively, since they are mainly responsible.

  
    
#
On Thu, Sep 13, 2001 at 01:03:11PM -0400, Frank E Harrell Jr wrote:
One of the reasons for looking at alternatives to the SV3 style
classes is that they are not very compatible with one thing I think we
need fairly badly: some form name space management.

Name spaces are about controlling which variables in a package are
visible outside a package and about preventing variables in one
package from unintentionally shadowing ones in another.  With name
spaces packages will be more robust; separately developed packages can
be used together more safely.  This is why languages like Tcl and Perl
have added name space facilities in recent years.  For R, name spaces
should also make maintaining the existing R code component of the core
system easier and also allow more internals to be migrated from C to
R.  Name spaces are also needed to support some ideas on byte code
compilation that are being explored.  Some early ideas on how name
spaces might be added to R are available off the developer page,
http://developer.r-project.org/.

Here is the problem with the SV3 approach. Suppose a package
SpecialDates defines a special kind of date class along with a
SpecialDate method for as.character.  A second class DataTree for
managing tree structured data defines a function that plots a tree of
generic objects using the as.character representations of the objects.
The current SV3 approach to dispatch needs to find a definition for
as.character.SpecialDate.  With a single dynamic search path as we
have now that is no problem.  But if we try to use name spaces of any
form I can think of to limit the things that are visible inside the
DataTree package then we also lose its ability to see methods of the
SV3 type that are defined outside the names pace, and this severely
limits the usefulness of generic functions.

Put another way, when a generic function is called with the SV3
dispatching approach it takes the name of the generic function, which
might be defined in one name space, and the name of a class whose
methods are defined in a second names pace, glues them together to get
the method name, and then needs to look up the method name in the right
place.  But at the call site it has no way of knowing that that right
place might be.

The only way around this that I can see is to provide some form of
registration mechanism for methods, some means of saying explicitly
that this function is a method for that generic function and that
class.  The SV4 model is one way, though certainly not the only way,
to achieve this.  By having explicit registration of classes the S4
model also provides for a way of having name space management for
class names, which is also useful.

luke
#
Dear Luke,

Thank you very much for the explanation and for the
good work you are doing on this.  It sounds very
valuable.  I do hope that it will not lead to
non-downward-compatible changes in the S language
except in perhaps unusual cases not affecting
many functions.

One other thought: Something short of what you
are developing could be implemented in the meantime,
namely "manual name registration" through a database
on r-project.org through which developers and users
could register function and class names.  Users
developing packages who desire to reuse an
already registered name would be able to search
the database to find potential incompatibilities 
or overrides they would need to warn users of the
package about.  Periodically-run automatic
database queries could compile lists of conflicts
that can be used to warn all users, with a
suggestion that those registering names last should
consider renaming tings. 
I believe that Stata, along with the
many other good decisions they have made, has
a web-driven registration system.  If Stata's
language were decent I would have been using it.

Let me renew a plea I've made before which will
lessen the need for some overrides.  A user-specified
vector of attribute names (e.g., "label", "comment",
"units") that would be carried forward when a vector
or matrix is subsetted would do away with the need for
the few overrides I have in R implementations of my
libraries, for [.* functions.

Best regards,

Frank Harrell
Luke Tierney wrote: