Documenting classes and methods: was [Rd] Re: R-devel Digest, Vol 3, Issue 23
Many thanks for your helpful comments.
To over-simplify the previous discussion a bit, you are emphasising that
all the aliases written by promptMethods will be needed by future versions
of the help system, so it is important that they be included in .Rd files
documenting methods. I was saying that these aliases are very verbose.
I am not the only one having trouble with the aliases. I have looked
through all the R packages that I know of that use S4 methods and can't
find even one example of a documented method which preserves all the
promptMethod style aliases. This may be partly because package authors
aren't sure what they should be doing, but I think it has to be taken as a
pretty strong statement from authors that the aliases don't produce a
workable result with the online help system as it currently stands. The
trouble I think is with the html package contents page, which very quickly
becomes too long and cluttered to be useful if all the aliases are included.
Could we satisfy both needs, (i) to have a unique help alias associated
with each method and (ii) to have a package contents page which is
readable, by giving authors control over which aliases are included on the
contents page? Could we have a new command \aliasonly{} for the .Rd files
for aliases which are to be available to the help system but not listed on
the contents page? Authors could use \alias{} for aliases which are to be
listed and \aliasonly{} for aliases which aren't.
It might also be helpful to have an optional argument, as in
\alias[optional.text]{}, to choose the text which is listed on the package
contents page.
Some other comments on generic functions are interpolated below.
Regards
Gordon
At 12:45 AM 27/05/2003, John Chambers wrote:
Gordon Smyth wrote:
I am another person who has had trouble documenting S4 classes and (particularly) methods. The methods package itself is pretty cool by the way, but it is a pity that there are as yet no guidelines on S4 in the "Writing R Extensions" document. I have actually put together a guide on S4 documentation myself for the use of my own lab which is at http://bioinf.wehi.edu.au/limma/Rdocs.html. I don't pretend that the guide is perfect - I can already see problems with it - but it has proved adequate so far for our own use (writing the limma package) and has gained some more general acceptance from the Bioconductor community. I found it hard to use the skeleton documentation provided by promptMethods.
The "structure" of the skeletons (the \alias lines especially) are intended to be used by the help system. You're not meant to "use" these directly, much of the time. It's the case that the tools to work with the .Rd structure haven't caught up yet, but please don't modify the skeleton's structure arbitrarily.
Suppose for example that I wish to document a method for
generic function 'foo' with argument list (x,y,...) for x of class 'bar1'
and y of class 'bar2':
1. The skeleton .Rd file contains \alias{foo-methods}. If two or more more
packages document methods for 'foo', they'll all have the same alias entry,
and the help that a user will get by typing ?"foo-methods" will depend on
which package happens to have been loaded most recently.
Good point, but related to the behavior of "?". It's related to a number of other issues about multiple packages referring to the same generic function. Not likely to change for 1.7.1, but likely to be different in several ways in 1.8
2. There seems to be no allowance for documenting extra named arguments for
this method which are not specified in the generic. There is no usage
entry, no argument list, and no process for R CMD check to check the
argument list against the definition of the method. In S3 one can write
\usage{\method{generic}{class}} and it would be nice to have an extension
of this facility for S4 methods. I have been abandoning the skeleton
structure produced by promptMethods and have been using \section{Usage} and
\section{Arguments}.
Seems ok to have separate discussion of arguments, but don't "abandon" the rest of the material in the skeleton (see below). Heavy use of extra arguments in the methods is a little bit worrisome. There is an efficiency penalty, though not likely serious in sizable computations. More basic (this is just my personal view), I like to think of the function as having a single conceptual definition--what it does and (by and large) what arguments it takes to describe what it should do. Then the methods are the implementation. The function description is likely what users, begining users particularly, want to see. More advanced users and programmers may also be concerned with the implementation. So, most of the time, one would like the function to define the arguments, and the methods to work from these. In some examples of extra arguments (the S3 print() methods, for instance), these are style-setting parameters, or perhaps control parameters for numeric computations. It might be clearer in such cases to say that "..." is always passed to a (class-dependant) parameter-setting function. Documenting that function is then a separate step. Again, this is just by way of what may help users to understand the functions and help designers to write functions cleanly; not suggesting you should be forced to take this route.
The need for extra arguments seems to increase with the complexity of the task which the generic function does. In R base most use of generic functions are for language-type functions like 'print' or 'summary', although there are also data analysis functions like 'anova', 'residuals' and 'coefficients'. In Bioconductor we are tying to use S4 generic functions to undertake some quite complex data analysis talks, for example the 'normalization' of a mult-array microarray experiment. Normalization means to adjust the data for unwanted systematic trends due to technological sources other than the genes and the treatments of interest. The generic function 'normalize' encapsulates a single unifying concept, but the implementation may differ considerably depending on the type of microarrays being used and the factors will seem important to adjust for. We can and probably should consider using parameter-setting functions. But if we end up with a separate parameter-setting function for every class, and most users need to read the documentation for these functions, then we haven't really achieved a simplification. Another consideration which seems to increase the importance of the methods relative to the generic function itself is the modular package style of development being used in R. If I am the first author to use the function name 'normalize', do I have the right to specify all the arguments I need for this function as part of the generic definition, or should I minimize the specified arguments to give other authors as much flexibility as possible? One example of document which I have been using as a model is the documentation in R base for the S3 generic function 'residuals'. The ?residuals document explains the concept in generic terms and limits the argument list to 'object' and '...'. One can then read the separate method-documents ?residuals.lm or ?residuals.glm for other arguments. I would be happy if we could do similar with S4 functions and methods.
3. The aliases for methods are pretty verbose and make the html contents
page for the package look rather cluttered. I have been deleting the
\alias{foo-methods} alias and been replacing \alias{foo,bar1,bar2-method}
with \alias{foo.bar1.bar2}. I know that using a syntactically valid name
for the alias has the potential problem that a function could actually
exist with that name, but I just like to use something shorter.
Don't do that. It's not what you like that counts, it's what works with the ? function, and your change will wipe out the ability of the help functions to identify correctly which method is being documented. For 1.8 (unfortunately, unlikely to be ironed out for 1.7.1), users should be able to get documentation on the method, say, for function f(x,y) corresponding to signature(x = "character", y = "numeric") by the expression method ? f(x="character", y = "numeric") (or something along these lines). In any case, the \alias lines are crucial to going from any way of requesting method documentation to the correct documentation.
4. There don't seem to be any guidelines for documenting a method with the generic, if the generic happens to be defined in the same package, or with the object class, if the generic dispatches on only one argument. I know that you have thought about this, and in the document http://developer.r-project.org/moreClassMethodIssues.html you refer to the 'addTo' argument for 'promptMethods'. The 'addTo' argument however has not yet been implemented in R. It would be nice to have a method for finding dynamically all available documentation for methods for a given generic function. I wrote a little prototype function called 'helpMethods' which simply extracts the list of available methods and prompts the user for which help topic they'd like to read. For this to work though, developers need to use a consistent alias system for documenting methods. I haven't seen any package yet which is using the aliases suggested by promptMethods. Do you think there is any value in my S4 documentation guide? Are there errors or mis-understandings in it which should be corrected before it is adopted as a guideline by Bioconductor?
It's a useful document to have. The whole area of documentation and online help is being worked on by a number of people, so there is the "moving target" difficulty. You mention in your document altering the output of the promptMethods skeleton. Adding material, up to a point, is OK, but changing or deleting the "\" lines is not a good idea if you want the documentation to work with R's (evolving) help system. As noted, the \alias lines should be left alone. There are a few other points we can discuss off-list, not directly related to this thread.
Are there major changes planned for the documentation system for S4 methods and classes in R in the near future? Is it worth our while spending time working out guidelines now or should we wait a bit until the situation stabilizes?
Commented on above--yes changes are in prospect. Bioconductor may want to encourage documentation even before things settle down--really for the people in the project to assess whether guidelines are helpful at this point. As said, there will be some changes for 1.8, mostly additions to the code that processes the online help requests. It's a fairly good guess that the structure of \ lines, esp. the \alias lines, will be kept or extended, not radically changed, so keeping the current prompt output of these lines would be desirable. If there are changes in the structure, you're more likely to see tools to modify what you have if you follow the current prompt output. In the longer run, it would be useful to have a documentation system based on a more modern form (e.g., XML), making possible more powerful online help software. Duncan Temple Lang and others have done some good work on such systems. My crystal ball is very foggy on what will happen with the R community in this direction. Regards, John
Best wishes Gordon
Date: Fri, 23 May 2003 15:37:50 -0400 From: John Chambers <jmc@research.bell-labs.com> Subject: Re: [Rd] Documenting S4 classes; debugging them To: Duncan Murdoch <dmurdoch@pair.com> Cc: r-devel@stat.math.ethz.ch Duncan Murdoch wrote:
1. I'm putting together my first package that uses S4 classes and objects. I'd like to document them, but I'm not sure what the documentation should look like, and package.skeleton doesn't produce any at all for the classes or methods.
Hmm, sounds as if it should. Meanwhile, promptClass and promptMethods generate skeleton documentation.
Are there any good examples to follow?
The bioconductor packages (e.g, Biobase) have some examples.
...
John
Duncan Murdoch
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
-- John M. Chambers jmc@bell-labs.com Bell Labs, Lucent Technologies office: (908)582-2681 700 Mountain Avenue, Room 2C-282 fax: (908)582-3340 Murray Hill, NJ 07974 web: http://www.cs.bell-labs.com/~jmc
---------------------------------------------------------------------------------------
Dr Gordon K Smyth, Senior Research Scientist, Bioinformatics, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Vic 3050, Australia Tel: (03) 9345 2326, Fax (03) 9347 0852, Email: smyth@wehi.edu.au, www: http://www.statsci.org
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
-- John M. Chambers jmc@bell-labs.com Bell Labs, Lucent Technologies office: (908)582-2681 700 Mountain Avenue, Room 2C-282 fax: (908)582-3340 Murray Hill, NJ 07974 web: http://www.cs.bell-labs.com/~jmc