S4 accessors

Ross Boylan <ross at biostat.ucsf.edu> writes:
Did you want this offlist?  I'm happy keeping it on the list.
No, I accidentally responded privately and I believe I already resent
my reply to the list.  Sorry about that.  I've cc'd the list for this response.
If anyone else is going to extend your classes, then you are doing
them a disservice by not making these proper methods.  It means that
you can control what happens when they are called on a subclass. 
My style has been to define a function, and then use setMethod if I want
to redefine it for an extension.  That way the original version becomes
the generic.

So I don't see what I'm doing as being a barrier to adding methods.  Am
I missing something?
You are not, but someone else might be: suppose you release your code
and I would like to extend it.  I am stuck until you decide to make
generics.
Originally I tried defining the original using setMethod, but this
generates a complaint about a missing function; that's one reason I fell
into this style.
You have to create the generic first if it doesn't already exist:

   setGeneric("foo", function(x) standardGeneric("foo"))
For accessors, I like to document them in the methods section of the
class documentation.
This is for accessors that really are methods, not my fake
function-based accessors, right?
Which might be a further argument not to have the distinction in the
first place ;-)

To me, simple accessors are best documented with the class.  If I have
an instance, I will read help on it and find out what I can do with
it.
If you use foo as an accessor method, where do you define the associated
function (i.e., \alias{foo})? I believe such a definition is expected by
R CMD check and is desirable for users looking for help on foo (?foo)
without paying attention to the fact it's a method.
Yes you need an alias for the _generic_ function.  You can either add
the alias to the class man page where one of its methods is documented
or you can have separate man pages for the generics.  This is
painful.  S4 documentation, in general, is rather difficult and IMO
this is in part a consequence of the more general (read more powerful)
generic function based system.

IOW, I think these are good questions.  They are ones that I struggle
with and do not know of any truly satisfying answers.

Best,

+ seth
Seth Falcon | Computational Biology | Fred Hutchinson Cancer Research Center
http://bioconductor.org
Ross Boylan <ross at biostat.ucsf.edu> writes:
If anyone else is going to extend your classes, then you are doing
them a disservice by not making these proper methods.  It means that
you can control what happens when they are called on a subclass. 

My style has been to define a function, and then use setMethod if I want
to redefine it for an extension.  That way the original version becomes
the generic.

So I don't see what I'm doing as being a barrier to adding methods.  Am
I missing something?
You are not, but someone else might be: suppose you release your code
and I would like to extend it.  I am stuck until you decide to make
generics.
This may be easier to do concretely.
I have an S4 class A.
I have defined a function foo that only operates on that class.
You make a class B that extends A.
You wish to give foo a different implementation for B.

Does anything prevent you from doing 
setMethod("foo", "B", function(x) blah blah)
(which is the same thing I do when I make a subclass)?
This turns my original foo into the catchall method.

Of course, foo is not appropriate for random objects, but that was true
even when it was a regular function.

Originally I tried defining the original using setMethod, but this
generates a complaint about a missing function; that's one reason I fell
into this style.
You have to create the generic first if it doesn't already exist:

   setGeneric("foo", function(x) standardGeneric("foo"))
I wonder if it might be worth changing setMethod so that it does this by
default when no existing function exists. Personally, that would fit the
style I'm using better.

For accessors, I like to document them in the methods section of the
class documentation.

This is for accessors that really are methods, not my fake
function-based accessors, right?
Which might be a further argument not to have the distinction in the
first place ;-)

To me, simple accessors are best documented with the class.  If I have
an instance, I will read help on it and find out what I can do with
it.  

If you use foo as an accessor method, where do you define the associated
function (i.e., \alias{foo})? I believe such a definition is expected by
R CMD check and is desirable for users looking for help on foo (?foo)
without paying attention to the fact it's a method.
Yes you need an alias for the _generic_ function.  You can either add
the alias to the class man page where one of its methods is documented
or you can have separate man pages for the generics.  This is
painful.  S4 documentation, in general, is rather difficult and IMO
this is in part a consequence of the more general (read more powerful)
generic function based system.
As my message indicates, I too am struggling with an appropriate
documentation style for S4 classes and methods.  Since "Writing R
Extensions" has said "Structure of and special markup for documenting S4
classes and methods are still under development." for as long as I cam
remember, perhaps I'm not the only one.

Some of the problem may reflect the tension between conventional OO and
functional languages, since R remains the latter even under S4.  I'm not
sure if it's the tools or my approach that is making things awkward; it
could be both!

Ross
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20060927/110a6f57/attachment-0003.pl
John Chambers <jmc at r-project.org> writes:
There is a point that needs to be remembered in discussions of
accessor functions (and more generally).

We're working with a class/method mechanism in a _functional_
language.  Simple analogies made from class-based languages such as
Java are not always good guides.

In the example below, "a function foo that only operates on that
class" is not usually a meaningful concept in R.   
If foo is a generic and the only method defined is for class Bar, then
the statement seems meaningful enough?
Functions are first-class objects and in principle every function
should have a "function", a purpose.  Methods implement that purpose
for particular combinations of arguments.

Accessor functions are therefore a bit anomalous.  
How?  A given accessor function has the purpose of returning the
expected data "contained" in an instance.  It provides an abstract
interface that decouples the structure of the class from the data it
needs to provide to users.

The anomaly, is IMO, a much larger challenge with generic function
based systems.  When the same name for a generic is used in different
packages, you end up with a masking problem.  This scenario is
unavoidable in general, but particularly likely, for accessors.  As S4
becomes more prevalent, I suspect that '<pkg>::foo' is going to become
a required idiom for interactive use (other options are available for
package code).

+ seth
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20060927/5ddf5e9c/attachment-0003.pl
There is a point that needs to be remembered in discussions of accessor
functions (and more generally).

We're working with a class/method mechanism in a _functional_ language.
Simple analogies made from class-based languages such as Java are not
always good guides.

In the example below, "a function foo that only operates on that class"
is not usually a meaningful concept in R.   Whereas in Java a method can
only be invoked "on" an object, given the syntax of the Java language, R
(that is, the S language) is different.  You can intend a function to be
used only on one class, but that isn't normally the way to think about R
software.

Functions are first-class objects and in principle every function should
have a "function", a purpose.  Methods implement that purpose for
particular combinations of arguments.

Accessor functions are therefore a bit anomalous.  If they had a
standard syntactic pattern, say get_foo(object), then it would be more
reasonable to think that you're just defining a method for that function
for a given class that happens to have a slot with the particular name,
"foo".

Also, slot-setting functions will be different in R because we deal with
objects, not object references as in Java.  An R-like naming convention
would be something along the lines of
  set_foo(object) <- value
but in any case one will need to use replacement functions to conform to
the way assignments work.
In the Object class system of the R.oo package I have for years worked
successfully with what I call virtual fields.  I find them really
useful and convenient to work with.

These works as follows, if there is a get<Field>(object) function,
this is called whenever object$<field> is called.  If there is no such
function, the internal field '<field>' is access (from the environment
where all fields live in).  Similarily, object$<field> <- value check
for set<Field>(object, value), which is called if available. [I work
with environments/references so my set functions don't really have to
be replacement functions, but there is nothing preventing them from
being such.]

There are several advantages doing it this way.  You can protect
fields behind a set function, e.g. preventing assignment of negative
values and similar, e.g.

  circle$radius <- -5
  Error: Negative radius: -5

You can also provide redundant fields in your API, e.g.

  circle$radius <- 5
  print(circle$diameter)
  circle$area <- 4
  print(circle$radius)

and so on. How the circle is represented internally does not matter
and may change over time. With such a design you don't have to worry
as a software developer; the API is stable.  I think this schema
carries over perfectly to S4 and '@'.

FYI: I used the above naming convention because I did this way before
the '_' operator was redefined.

Comment: If you don't want the user to access a slot/field directly, I
recommend to name the slot with a period prefix, e.g. '.radius'.  This
gives at least the user the chance to understand your design although
it does not prevent them to misuse it.  The period prefix is also
"standard" for "private" object, cf. ls(all.names=FALSE/TRUE).

/Henrik
Ross Boylan wrote:
On Tue, 2006-09-26 at 10:43 -0700, Seth Falcon wrote:

Ross Boylan <ross at biostat.ucsf.edu> writes:

If anyone else is going to extend your classes, then you are doing
them a disservice by not making these proper methods.  It means that
you can control what happens when they are called on a subclass.

My style has been to define a function, and then use setMethod if I want
to redefine it for an extension.  That way the original version becomes
the generic.

So I don't see what I'm doing as being a barrier to adding methods.  Am
I missing something?

You are not, but someone else might be: suppose you release your code
and I would like to extend it.  I am stuck until you decide to make
generics.

This may be easier to do concretely.
I have an S4 class A.
I have defined a function foo that only operates on that class.
You make a class B that extends A.
You wish to give foo a different implementation for B.

Does anything prevent you from doing
setMethod("foo", "B", function(x) blah blah)
(which is the same thing I do when I make a subclass)?
This turns my original foo into the catchall method.

Of course, foo is not appropriate for random objects, but that was true
even when it was a regular function.

Originally I tried defining the original using setMethod, but this
generates a complaint about a missing function; that's one reason I fell
into this style.

You have to create the generic first if it doesn't already exist:

   setGeneric("foo", function(x) standardGeneric("foo"))

I wonder if it might be worth changing setMethod so that it does this by
default when no existing function exists. Personally, that would fit the
style I'm using better.

For accessors, I like to document them in the methods section of the
class documentation.

This is for accessors that really are methods, not my fake
function-based accessors, right?

Which might be a further argument not to have the distinction in the
first place ;-)

To me, simple accessors are best documented with the class.  If I have
an instance, I will read help on it and find out what I can do with
it.

If you use foo as an accessor method, where do you define the associated
function (i.e., \alias{foo})? I believe such a definition is expected by
R CMD check and is desirable for users looking for help on foo (?foo)
without paying attention to the fact it's a method.

Yes you need an alias for the _generic_ function.  You can either add
the alias to the class man page where one of its methods is documented
or you can have separate man pages for the generics.  This is
painful.  S4 documentation, in general, is rather difficult and IMO
this is in part a consequence of the more general (read more powerful)
generic function based system.

As my message indicates, I too am struggling with an appropriate
documentation style for S4 classes and methods.  Since "Writing R
Extensions" has said "Structure of and special markup for documenting S4
classes and methods are still under development." for as long as I cam
remember, perhaps I'm not the only one.

Some of the problem may reflect the tension between conventional OO and
functional languages, since R remains the latter even under S4.  I'm not
sure if it's the tools or my approach that is making things awkward; it
could be both!

Ross

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

        [[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

I'm trying to understand what the underlying issues are here--with the
immediate goal of how that affects my design and documentation
decisions.
Seth Falcon wrote:
John Chambers <jmc at r-project.org> writes:

There is a point that needs to be remembered in discussions of
accessor functions (and more generally).

We're working with a class/method mechanism in a _functional_
language.  Simple analogies made from class-based languages such as
Java are not always good guides.

In the example below, "a function foo that only operates on that
class" is not usually a meaningful concept in R.   
The sense of "meaningful" here is hard for me to pin down, even with
the subsequent discussion.

I think the import is more than formal: R is not strongly typed, so
you can hand any argument to any function and the language will not
complain.

If foo is a generic and the only method defined is for class Bar, then
the statement seems meaningful enough?

This is not primarily a question about implementation but about what the 
user understands.   IMO, a function should have an intuitive meaning to 
the user.  Its name is taking up a "global" place in the user's brain, 
and good software design says not to overload users with too many 
arbitrary names to remember.
It's true that clashing uses of the same name may lead to confusion,
but that need not imply that functions must be applicable to all
objects.  Many functions only make sense in particular contexts, and
sometimes those contexts are quite narrow.

One of the usual motivations for an OO approach is precisely to limit
the amount of global space taken up by, for example, functions that
operate on the class (global in both the syntactic sense and in the
inside your brain sense).  Understanding a traditional OO system, at
least for me, is fundamentally oriented to understanding the objects
first, with the operations on them as auxiliaries.  As you point out,
this is just different from the orientation of a functional language,
which starts with the functions.
To be a bit facetious, if "flag is a slot in class Bar, it's really not 
a good idea to define the accessor for that slot as
 flag <- function(object)object at flag

Nor is the situation much improved by having flag() be a generic, with 
the only method being for class Bar.  We're absconding with a word that 
users might think has a general meaning.  OK, if need be we will have 
different flag() functions in different packages that have _different_ 
intuitive interpretations, but it seems to me that we should try to 
avoid that problem when we can.

OTOH, it's not such an imposition to have accessor functions with a 
syntax that includes the name of the slot in a standardized way:
  get_flag(object)
(I don't have any special attachment to this convention, it's just there 
for an example)
I don't see why get_flag differs from flag; if "flag" lends itself to
multiple interpretations or meanings, wouldn't "get_flag" have the
same problem?

Or are you referring to the fact that "flag" sounds as if it's a verb
or action?  That's a significant ambiguity, but there's nothing about
it that is specific to a functional approach.

Functions are first-class objects and in principle every function
should have a "function", a purpose.  Methods implement that purpose
for particular combinations of arguments.

If this is a claim that every function should make sense for every
object, it's asking too much.  If it's not, I don't really see how a
function can avoid having a purpose.  The purpose of accessor
functions is to get or set the state of the object.
Accessor functions are therefore a bit anomalous.  

How?  A given accessor function has the purpose of returning the
expected data "contained" in an instance.  It provides an abstract
interface that decouples the structure of the class from the data it
needs to provide to users.

See above.  That's true _if_ the name or some other syntactic sugar 
makes it clear the this is indeed an accessor function, but not otherwise.
Aside from the fact that I don't see why get_flag is so different from
flag, the syntactic sugar argument has another problem.  The usually
conceived purpose of accessors is to hide from the client the
internals of the object.  To take an example that's pretty close to
one of my classes, I want startTime, endTime, and duration.
Internally, the object only needs to hold 2 of these quantities to get
the 3rd, but I don't want the client code to be aware of which choice
I made.  In particular, I don't what the client code to change from 
duration to get_duration if I switch to a representation that stored
the duration as a slot.

Ross
An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-devel/attachments/20060928/6a43c787/attachment-0003.pl