Documentation issues [Was: Function hints]

Duncan Murdoch <murdoch at stats.uwo.ca> 06/20/06 11:58am >>>
I would like to follow up on another one of the documentation issues raised in the discussion on function hints. Duncan mentioned that the R core were working on preprocessing directives for .Rd files, which could possibly include some sort of include directive. I was wondering if a "includeexamples" directive might also be considered.

It often makes sense to use the same example to illustrate the use of different functions, or perhaps extend an example used to illustrate one function to illustrate another. One way to do this is simply to put

example(fnA)

in the \examples for fnB, but this is not particularly helpful for people reading the help pages as they either need to look at both help pages or run the example. The alternative is to maintain multiple copies of the same code, which is not ideal.

So it would be useful to be able to put

\includeexamples(fnA)

so that the code is replicated in fnB.Rd. Perhaps an include directive could do this anyway, but it might be useful to have a special directive for examples so that RCMD check is set up to only check the original example to save time (and unnecessary effort).
Thanks, that's a good suggestion.  My inclination would be towards just 
one type of \include; it could be surrounded by notation saying not to 
check it in all but one instance if the author wanted to save testing time.

Fair enough, but at the moment I don't think such notation exists - using \dontrun would skip the check, but would also mean the code would not get run by example(), leading to missing/broken examples. You could introduce a \dontcheck directive but this might be dangerous!

Heather
On a related issue, it would be nice if source() had an option to print comments contained in the source file, so that example() and demo() could print out annotation.
Yes, this has been a long-standing need, but it's somewhat tricky 
because of the way source currently works:  it parses the whole file, 
then executes the parsed version.  The first step loses the comments, so 
you see a deparsed version when executing.  What I think it should do is 
have pointers back from the parsed version to the original source code, 
but that needs fairly low level changes.  This is some of the missing 
"infrastructure" I mentioned below.

Duncan Murdoch
Heather

Dr H Turner
Research Assistant
Dept. of Statistics
The University of Warwick
Coventry
CV4 7AL

Tel: 024 76575870
Fax: 024 7652 4532
Url: www.warwick.ac.uk/go/heatherturner 

<Mark.Bravington at csiro.au> 06/20/06 01:43am >>>
[This is not about the feasibility of a "hints" function-- which would
be incredibly useful, but perhaps very very hard to do-- but about some
of the other documentation issues raised in Hadley's post and in
Duncan's reply]

WRTO documentation & code together: for several years, I've successfully
used the 'mvbutils' package to keep every function definition & its
documentation together, editing them together in the same file--
function first, then documentation in plain-text (basically the format
you see if you use "vanilla help" inside R). Storage-wise, the
documentation is just kept as an attribute of the function (with a print
method that hides it by default)-- I also keep a text backup of the
combination. Any text editor will do. When it's time to create a
package, the Rd file is generated automatically.

For me, it's been extremely helpful to keep function & documentation
together during editing-- it greatly increases the chance that I will
actually update the doco when I change the code, rather than putting it
off until I've forgotten what I did. Also, writing Rd format is a
nightmare (again, personal opinion)-- being able to write plain-text
makes the whole documentation thing bearable.

The above is not quite to the point of the original post, I think, which
talks about storing the documentation as commented bits *inside* the
function code. However, I'm not sure the latter is really desirable;
there is some merit in forcing authors to write an explicit "Details" or
"Description" section that is not just a paraphrase of programming
comments, and such sections are unlikely to fit easily inside code. At
any rate, I wouldn't want to have to interpret my *own* programming
comments as a usage guide!

WRTO automatic "usage" sections: it is easy to write code to do this
('prompt', and there is also some in 'mvbutils'-- not sure if it's in
the current release though) but at least as far as the "usage" section
goes, I think people should be "vigorously encouraged" to write their
own, showing as far as possible how one might actually *use* the
function. For many functions, just duplicating the argument list is not
helpful to the user-- a function can often be invoked in several
different ways, with different arguments relevant to different
invocations. I think it's good to show how this can be done in the
"usage" section, with comments, rather than deferring all practical
usage to "examples". For one thing, "usage" is near the top, and so
gives a very quick reminder without having to scroll through the entire
doco; for another, "usage" and "arguments" are visually adjacent,
whereas "examples" can be widely separated from "arguments".

My general point here is: the documentating process should be as
painless as possible, but not more so. Defaults that are likely to lead
to unhelpful documentation are perhaps best avoided.
For this general reason, I applaud R's fairly rigid documentation
standards, even though I frequently curse them. (And I would like to see
some bits more rigid, such as compulsory "how-to-use-this" documentation
for each package!)

The next version of 'mvbutils' will include various tools for easy "live
editing" and automated preparation of packages-- I've been using them
for a while, but still have to get round to finishing the documentation
;) 

Mark Bravington
CSIRO Mathematical & Information Sciences
Marine Laboratory
Castray Esplanade
Hobart 7001
TAS

ph (+61) 3 6232 5118
fax (+61) 3 6232 5012
mob (+61) 438 315 623

-----Original Message-----
From: r-devel-bounces at r-project.org 
[mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
Sent: Tuesday, 20 June 2006 12:39 AM
To: hadley wickham; R-devel
Subject: Re: [Rd] [R] Function hints

I've moved this from R-help to R-devel, where I think it is 
more appropriate, and interspersed comments below.

On 6/19/2006 8:51 AM, hadley wickham wrote:
One of the recurring themes in the recent UserR conference was that
many people find it difficult to find the functions they need for a
particular task.  Sandy Weisberg suggested a small idea he 
would like
to see: a hints function that given an object, lists likely
operations.  I've done my best to implement this function using the
tools currently available in R, and my code is included at 
the bottom
of this email (I hope that I haven't just duplicated 
something already
present in R).  I think Sandy's idea is genuinely useful, 
even in the
limited form provided by my implementation, and I have already
discovered a few useful functions that I was unaware of.

While developing and testing this function, I ran into a 
few problems
which, I think, represent underlying problems with the current
documentation system.  These are typified by the results of running
hints on a object produced by glm (having class c("glm", "lm")).  I
have outlined (very tersely) some possible solutions.  Please note
that while these solutions are largely technological, the problem is
at heart sociological: writing documentation is no easier 
(and perhaps
much harder) than writing a scientific publication, but the rewards
are fewer.

Problems:

 * Many functions share the same description (eg. head, tail).
Solution: each rdoc file should only describe one method. Problem:
Writing rdoc files is tedious, there is a lot of information
duplicated between the code and the documenation (eg. the usage
statement) and some functions share a lot of similar information.
Solution: make it easier to write documentation (eg. documentation
inline with code), and easier to include certain common descriptions
in multiple methods (eg. new include command)
I think it's bad to document dissimilar functions in the same 
file, but 
similar related functions *should* be documented together.  Not doing 
this just adds to the burden of documenting them, and the risk of 
modifying only part of the documentation so that it is inconsistent. 
The user also gets the benefit of seeing a common description all at 
once, rather than having to decide whether to follow "See also" links.

Your solutions would both be interesting on their own merits 
regardless 
of the above.  We did decide to work on preprocessing 
directives for .Rd 
files at the R core meetings; some sort of include directive may be 
possible.

I don't think I would want complete documentation mixed with the 
original source, but it would certainly be interesting to 
have partial 
documentation there.  (Complete documentation is too long, and would 
make it harder to read the source without a dedicated editor 
that could 
hide it.  Though ESS users may see it as a reasonable requirement to 
have everyone use the same editor, I don't think it is.)  
However, this 
is a lot of work, depending on infrastructure that is not in place.

 * It is difficult to tell which functions are commonly
used/important. Solution: break down by keywords. Problem: keywords
are not useful at the moment.  Solution:  make better list 
of keywords
available and encourage people to use it.  Problem: people won't
unless there is a strong incentive, plus good keywording requires
considerable expertise (especially in bulding up list).  This is
probably insoluable unless one person systematically keywords all of
the base packages.
I think it is worse than that.  There are concepts in 
packages that just 
don't arise in base R, and hence there would be no keywords for them 
other than "misc", even if someone redesigned the current system. 
Keywording is hard, and it's not clear to me how to do much 
better than 
we currently do.

We do already have user-defined keywords (via \concept), but 
these are 
not widely used.

 * Some functions aren't documented (eg. simulate.lm, formula.glm) -
typically, these are methods where the documentation is in the
generic.  Solution: these methods should all be aliased to 
the generic
(by default?), and R CMD check should be amended to check for this
situation.  You could also argue that this is a deficiency with my
function, and easily fixed by automatically referring to the generic
if the specific isn't documented.
I'd say it's a deficiency of your function.  You might want 
to look at 
the code in get("?") and .helpForCall() to see how those 
functions work 
out things like

?simulate(x)

where x is an lm object.  (But notice that .helpForCall is an 
undocumented internal function; don't depend on its implementation 
working forever).

 * It can't supply suggestions when there isn't an explicit method
(ie. .default is used), this makes it pretty useless for basic
vectors.  This may not really be a problem, as all possible 
operations
are probably too numerous to list.

 * Provides full name for function, when best practice is to use
generic part only when calling function.  However, getting precise
documentation may requires that full name. 
No, not if the call syntax above is used.

  I do the best I can
(returning the generic if specific is alias to a documentation file
with the same method name), but this reflects a deeper problem that
the name you should use when calling a function may be different to
the name you use to get documentation.

 * Can only display methods from currently loaded packages. 
 This is a
shortcoming of the methods function, but I suspect it is 
difficult to
find S3 methods without loading a package.

Relatively trivial problems:

 * Needs wide display to be effective.  Could be dealt with by
breaking description in a sensible manner (there may 
already by R code
to do this.  Please let me know if you know of any)
I think strwrap() may do what you want.
 * Doesn't currently include S4 methods.  Solution: add 
some more code
to wrap showMethods

 * Personally, I think sentence case is more aesthetically pleasing
(and more flexible) than title case.
It's quite hard to go from existing title case to sentence 
case, because 
we don't have any markup to indicate proper names.  One would 
think it 
would be easier to go in the opposite direction, but in fact the same 
problem arises:  "van Beethoven" for example, not "Van Beethoven".

Hadley

hints <- function(x) {
I don't like the name "hints".  I think we already have too many ways 
into the help system:

help
?
help.search
apropos
etc.?

I like your function, but I'd rather see it attached to one of the 
existing help functions, probably help.search().  For example,

help.search(x)

could look for functions designed to work with the class of 
x, if it had 
one.  (There's some ambiguity here:  perhaps x contains a 
string, and I 
want help on that string.)

Anyway, thanks for your efforts on this so far; I hope we end up with 
something that can make it into the next release.

Duncan Murdoch

	db <- eval(utils:::.hsearch_db())
	if (is.null(db)) {
		help.search("abcd!", rebuild=TRUE, agrep=FALSE)
		db <- eval(utils:::.hsearch_db())
	}

	base <- db$Base
	alias <- db$Aliases
	key <- db$Keywords

	m <- all.methods(class=class(x))
	m_id <- alias[match(m, alias[,1]), 2]
	keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])

	f.names <- cbind(m, base[match(m_id, base[,3]), 4])
	f.names <- unlist(lapply(1:nrow(f.names), function(i) {
		if (is.na(f.names[i, 2])) return(f.names[i, 1])
		a <- methodsplit(f.names[i, 1])
		b <- methodsplit(f.names[i, 2])

		if (a[1] == b[1]) f.names[i, 2] else f.names[i, 
1]		
	}))

	hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
	hints <- hints[order(tolower(hints[,1])),]
	hints <- rbind(    c("--------", "---------------"), hints)
	rownames(hints) <- rep("", nrow(hints))
	colnames(hints) <- c("Function", "Task")
	hints[is.na(hints)] <- "(Unknown)"

	class(hints) <- "hints"
	hints
}

print.hints <- function(x, ...) print(unclass(x), quote=FALSE)

all.methods <- function(classes) {
	methods <- do.call(rbind,lapply(classes, function(x) {
		m <- methods(class=x)
		t(sapply(as.vector(m), methodsplit)) #m[attr(m, 
"info")$visible]
	}))
	rownames(methods[!duplicated(methods[,1]),])
}

methodsplit <- function(m) {
	parts <- strsplit(m, "\\.")[[1]]
	if (length(parts) == 1) {
		c(name=m, class="")
	} else{
		c(name=paste(parts[-length(parts)], 
collapse="."), class=parts[length(parts)])
	}	
}

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html 

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel 

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel