Documentation issues [Was: Function hints]
Duncan Murdoch <murdoch at stats.uwo.ca> 06/20/06 11:58am >>>
On 6/20/2006 5:18 AM, Heather Turner wrote:
I would like to follow up on another one of the documentation issues raised in the discussion on function hints. Duncan mentioned that the R core were working on preprocessing directives for .Rd files, which could possibly include some sort of include directive. I was wondering if a "includeexamples" directive might also be considered. It often makes sense to use the same example to illustrate the use of different functions, or perhaps extend an example used to illustrate one function to illustrate another. One way to do this is simply to put example(fnA) in the \examples for fnB, but this is not particularly helpful for people reading the help pages as they either need to look at both help pages or run the example. The alternative is to maintain multiple copies of the same code, which is not ideal. So it would be useful to be able to put \includeexamples(fnA) so that the code is replicated in fnB.Rd. Perhaps an include directive could do this anyway, but it might be useful to have a special directive for examples so that RCMD check is set up to only check the original example to save time (and unnecessary effort).
Thanks, that's a good suggestion. My inclination would be towards just one type of \include; it could be surrounded by notation saying not to check it in all but one instance if the author wanted to save testing time. Fair enough, but at the moment I don't think such notation exists - using \dontrun would skip the check, but would also mean the code would not get run by example(), leading to missing/broken examples. You could introduce a \dontcheck directive but this might be dangerous! Heather
On a related issue, it would be nice if source() had an option to print comments contained in the source file, so that example() and demo() could print out annotation.
Yes, this has been a long-standing need, but it's somewhat tricky because of the way source currently works: it parses the whole file, then executes the parsed version. The first step loses the comments, so you see a deparsed version when executing. What I think it should do is have pointers back from the parsed version to the original source code, but that needs fairly low level changes. This is some of the missing "infrastructure" I mentioned below. Duncan Murdoch
Heather Dr H Turner Research Assistant Dept. of Statistics The University of Warwick Coventry CV4 7AL Tel: 024 76575870 Fax: 024 7652 4532 Url: www.warwick.ac.uk/go/heatherturner
<Mark.Bravington at csiro.au> 06/20/06 01:43am >>>
[This is not about the feasibility of a "hints" function-- which would
be incredibly useful, but perhaps very very hard to do-- but about some
of the other documentation issues raised in Hadley's post and in
Duncan's reply]
WRTO documentation & code together: for several years, I've successfully
used the 'mvbutils' package to keep every function definition & its
documentation together, editing them together in the same file--
function first, then documentation in plain-text (basically the format
you see if you use "vanilla help" inside R). Storage-wise, the
documentation is just kept as an attribute of the function (with a print
method that hides it by default)-- I also keep a text backup of the
combination. Any text editor will do. When it's time to create a
package, the Rd file is generated automatically.
For me, it's been extremely helpful to keep function & documentation
together during editing-- it greatly increases the chance that I will
actually update the doco when I change the code, rather than putting it
off until I've forgotten what I did. Also, writing Rd format is a
nightmare (again, personal opinion)-- being able to write plain-text
makes the whole documentation thing bearable.
The above is not quite to the point of the original post, I think, which
talks about storing the documentation as commented bits *inside* the
function code. However, I'm not sure the latter is really desirable;
there is some merit in forcing authors to write an explicit "Details" or
"Description" section that is not just a paraphrase of programming
comments, and such sections are unlikely to fit easily inside code. At
any rate, I wouldn't want to have to interpret my *own* programming
comments as a usage guide!
WRTO automatic "usage" sections: it is easy to write code to do this
('prompt', and there is also some in 'mvbutils'-- not sure if it's in
the current release though) but at least as far as the "usage" section
goes, I think people should be "vigorously encouraged" to write their
own, showing as far as possible how one might actually *use* the
function. For many functions, just duplicating the argument list is not
helpful to the user-- a function can often be invoked in several
different ways, with different arguments relevant to different
invocations. I think it's good to show how this can be done in the
"usage" section, with comments, rather than deferring all practical
usage to "examples". For one thing, "usage" is near the top, and so
gives a very quick reminder without having to scroll through the entire
doco; for another, "usage" and "arguments" are visually adjacent,
whereas "examples" can be widely separated from "arguments".
My general point here is: the documentating process should be as
painless as possible, but not more so. Defaults that are likely to lead
to unhelpful documentation are perhaps best avoided.
For this general reason, I applaud R's fairly rigid documentation
standards, even though I frequently curse them. (And I would like to see
some bits more rigid, such as compulsory "how-to-use-this" documentation
for each package!)
The next version of 'mvbutils' will include various tools for easy "live
editing" and automated preparation of packages-- I've been using them
for a while, but still have to get round to finishing the documentation
;)
Mark Bravington
CSIRO Mathematical & Information Sciences
Marine Laboratory
Castray Esplanade
Hobart 7001
TAS
ph (+61) 3 6232 5118
fax (+61) 3 6232 5012
mob (+61) 438 315 623
-----Original Message----- From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch Sent: Tuesday, 20 June 2006 12:39 AM To: hadley wickham; R-devel Subject: Re: [Rd] [R] Function hints I've moved this from R-help to R-devel, where I think it is more appropriate, and interspersed comments below. On 6/19/2006 8:51 AM, hadley wickham wrote:
One of the recurring themes in the recent UserR conference was that many people find it difficult to find the functions they need for a particular task. Sandy Weisberg suggested a small idea he
would like
to see: a hints function that given an object, lists likely operations. I've done my best to implement this function using the tools currently available in R, and my code is included at
the bottom
of this email (I hope that I haven't just duplicated
something already
present in R). I think Sandy's idea is genuinely useful,
even in the
limited form provided by my implementation, and I have already discovered a few useful functions that I was unaware of. While developing and testing this function, I ran into a
few problems
which, I think, represent underlying problems with the current
documentation system. These are typified by the results of running
hints on a object produced by glm (having class c("glm", "lm")). I
have outlined (very tersely) some possible solutions. Please note
that while these solutions are largely technological, the problem is
at heart sociological: writing documentation is no easier
(and perhaps
much harder) than writing a scientific publication, but the rewards are fewer. Problems: * Many functions share the same description (eg. head, tail). Solution: each rdoc file should only describe one method. Problem: Writing rdoc files is tedious, there is a lot of information duplicated between the code and the documenation (eg. the usage statement) and some functions share a lot of similar information. Solution: make it easier to write documentation (eg. documentation inline with code), and easier to include certain common descriptions in multiple methods (eg. new include command)
I think it's bad to document dissimilar functions in the same file, but similar related functions *should* be documented together. Not doing this just adds to the burden of documenting them, and the risk of modifying only part of the documentation so that it is inconsistent. The user also gets the benefit of seeing a common description all at once, rather than having to decide whether to follow "See also" links. Your solutions would both be interesting on their own merits regardless of the above. We did decide to work on preprocessing directives for .Rd files at the R core meetings; some sort of include directive may be possible. I don't think I would want complete documentation mixed with the original source, but it would certainly be interesting to have partial documentation there. (Complete documentation is too long, and would make it harder to read the source without a dedicated editor that could hide it. Though ESS users may see it as a reasonable requirement to have everyone use the same editor, I don't think it is.) However, this is a lot of work, depending on infrastructure that is not in place.
* It is difficult to tell which functions are commonly used/important. Solution: break down by keywords. Problem: keywords are not useful at the moment. Solution: make better list
of keywords
available and encourage people to use it. Problem: people won't unless there is a strong incentive, plus good keywording requires considerable expertise (especially in bulding up list). This is probably insoluable unless one person systematically keywords all of the base packages.
I think it is worse than that. There are concepts in packages that just don't arise in base R, and hence there would be no keywords for them other than "misc", even if someone redesigned the current system. Keywording is hard, and it's not clear to me how to do much better than we currently do. We do already have user-defined keywords (via \concept), but these are not widely used.
* Some functions aren't documented (eg. simulate.lm, formula.glm) - typically, these are methods where the documentation is in the generic. Solution: these methods should all be aliased to
the generic
(by default?), and R CMD check should be amended to check for this situation. You could also argue that this is a deficiency with my function, and easily fixed by automatically referring to the generic if the specific isn't documented.
I'd say it's a deficiency of your function. You might want
to look at
the code in get("?") and .helpForCall() to see how those
functions work
out things like
?simulate(x)
where x is an lm object. (But notice that .helpForCall is an
undocumented internal function; don't depend on its implementation
working forever).
* It can't supply suggestions when there isn't an explicit method (ie. .default is used), this makes it pretty useless for basic vectors. This may not really be a problem, as all possible
operations
are probably too numerous to list. * Provides full name for function, when best practice is to use generic part only when calling function. However, getting precise documentation may requires that full name.
No, not if the call syntax above is used. I do the best I can
(returning the generic if specific is alias to a documentation file with the same method name), but this reflects a deeper problem that the name you should use when calling a function may be different to the name you use to get documentation. * Can only display methods from currently loaded packages.
This is a
shortcoming of the methods function, but I suspect it is
difficult to
find S3 methods without loading a package. Relatively trivial problems: * Needs wide display to be effective. Could be dealt with by breaking description in a sensible manner (there may
already by R code
to do this. Please let me know if you know of any)
I think strwrap() may do what you want.
* Doesn't currently include S4 methods. Solution: add
some more code
to wrap showMethods * Personally, I think sentence case is more aesthetically pleasing (and more flexible) than title case.
It's quite hard to go from existing title case to sentence case, because we don't have any markup to indicate proper names. One would think it would be easier to go in the opposite direction, but in fact the same problem arises: "van Beethoven" for example, not "Van Beethoven".
Hadley
hints <- function(x) {
I don't like the name "hints". I think we already have too many ways into the help system: help ? help.search apropos etc.? I like your function, but I'd rather see it attached to one of the existing help functions, probably help.search(). For example, help.search(x) could look for functions designed to work with the class of x, if it had one. (There's some ambiguity here: perhaps x contains a string, and I want help on that string.) Anyway, thanks for your efforts on this so far; I hope we end up with something that can make it into the next release. Duncan Murdoch
db <- eval(utils:::.hsearch_db())
if (is.null(db)) {
help.search("abcd!", rebuild=TRUE, agrep=FALSE)
db <- eval(utils:::.hsearch_db())
}
base <- db$Base
alias <- db$Aliases
key <- db$Keywords
m <- all.methods(class=class(x))
m_id <- alias[match(m, alias[,1]), 2]
keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
f.names <- cbind(m, base[match(m_id, base[,3]), 4])
f.names <- unlist(lapply(1:nrow(f.names), function(i) {
if (is.na(f.names[i, 2])) return(f.names[i, 1])
a <- methodsplit(f.names[i, 1])
b <- methodsplit(f.names[i, 2])
if (a[1] == b[1]) f.names[i, 2] else f.names[i,
1]
}))
hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
hints <- hints[order(tolower(hints[,1])),]
hints <- rbind( c("--------", "---------------"), hints)
rownames(hints) <- rep("", nrow(hints))
colnames(hints) <- c("Function", "Task")
hints[is.na(hints)] <- "(Unknown)"
class(hints) <- "hints"
hints
}
print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
all.methods <- function(classes) {
methods <- do.call(rbind,lapply(classes, function(x) {
m <- methods(class=x)
t(sapply(as.vector(m), methodsplit)) #m[attr(m,
"info")$visible]
}))
rownames(methods[!duplicated(methods[,1]),])
}
methodsplit <- function(m) {
parts <- strsplit(m, "\\.")[[1]]
if (length(parts) == 1) {
c(name=m, class="")
} else{
c(name=paste(parts[-length(parts)],
collapse="."), class=parts[length(parts)])
} }
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html ______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel