-----Original Message-----
From: r-devel-bounces at r-project.org
[mailto:r-devel-bounces at r-project.org] On Behalf Of Duncan Murdoch
Sent: Tuesday, 20 June 2006 12:39 AM
To: hadley wickham; R-devel
Subject: Re: [Rd] [R] Function hints
I've moved this from R-help to R-devel, where I think it is
more appropriate, and interspersed comments below.
On 6/19/2006 8:51 AM, hadley wickham wrote:
One of the recurring themes in the recent UserR conference was that
many people find it difficult to find the functions they need for a
particular task. Sandy Weisberg suggested a small idea he
to see: a hints function that given an object, lists likely
operations. I've done my best to implement this function using the
tools currently available in R, and my code is included at
of this email (I hope that I haven't just duplicated
present in R). I think Sandy's idea is genuinely useful,
limited form provided by my implementation, and I have already
discovered a few useful functions that I was unaware of.
While developing and testing this function, I ran into a
which, I think, represent underlying problems with the current
documentation system. These are typified by the results of running
hints on a object produced by glm (having class c("glm", "lm")). I
have outlined (very tersely) some possible solutions. Please note
that while these solutions are largely technological, the problem is
at heart sociological: writing documentation is no easier
much harder) than writing a scientific publication, but the rewards
are fewer.
Problems:
* Many functions share the same description (eg. head, tail).
Solution: each rdoc file should only describe one method. Problem:
Writing rdoc files is tedious, there is a lot of information
duplicated between the code and the documenation (eg. the usage
statement) and some functions share a lot of similar information.
Solution: make it easier to write documentation (eg. documentation
inline with code), and easier to include certain common descriptions
in multiple methods (eg. new include command)
I think it's bad to document dissimilar functions in the same
file, but
similar related functions *should* be documented together. Not doing
this just adds to the burden of documenting them, and the risk of
modifying only part of the documentation so that it is inconsistent.
The user also gets the benefit of seeing a common description all at
once, rather than having to decide whether to follow "See also" links.
Your solutions would both be interesting on their own merits
regardless
of the above. We did decide to work on preprocessing
directives for .Rd
files at the R core meetings; some sort of include directive may be
possible.
I don't think I would want complete documentation mixed with the
original source, but it would certainly be interesting to
have partial
documentation there. (Complete documentation is too long, and would
make it harder to read the source without a dedicated editor
that could
hide it. Though ESS users may see it as a reasonable requirement to
have everyone use the same editor, I don't think it is.)
However, this
is a lot of work, depending on infrastructure that is not in place.
* It is difficult to tell which functions are commonly
used/important. Solution: break down by keywords. Problem: keywords
are not useful at the moment. Solution: make better list
available and encourage people to use it. Problem: people won't
unless there is a strong incentive, plus good keywording requires
considerable expertise (especially in bulding up list). This is
probably insoluable unless one person systematically keywords all of
the base packages.
I think it is worse than that. There are concepts in
packages that just
don't arise in base R, and hence there would be no keywords for them
other than "misc", even if someone redesigned the current system.
Keywording is hard, and it's not clear to me how to do much
better than
we currently do.
We do already have user-defined keywords (via \concept), but
these are
not widely used.
* Some functions aren't documented (eg. simulate.lm, formula.glm) -
typically, these are methods where the documentation is in the
generic. Solution: these methods should all be aliased to
(by default?), and R CMD check should be amended to check for this
situation. You could also argue that this is a deficiency with my
function, and easily fixed by automatically referring to the generic
if the specific isn't documented.
I'd say it's a deficiency of your function. You might want
to look at
the code in get("?") and .helpForCall() to see how those
functions work
out things like
?simulate(x)
where x is an lm object. (But notice that .helpForCall is an
undocumented internal function; don't depend on its implementation
working forever).
* It can't supply suggestions when there isn't an explicit method
(ie. .default is used), this makes it pretty useless for basic
vectors. This may not really be a problem, as all possible
are probably too numerous to list.
* Provides full name for function, when best practice is to use
generic part only when calling function. However, getting precise
documentation may requires that full name.
No, not if the call syntax above is used.
I do the best I can
(returning the generic if specific is alias to a documentation file
with the same method name), but this reflects a deeper problem that
the name you should use when calling a function may be different to
the name you use to get documentation.
* Can only display methods from currently loaded packages.
shortcoming of the methods function, but I suspect it is
find S3 methods without loading a package.
Relatively trivial problems:
* Needs wide display to be effective. Could be dealt with by
breaking description in a sensible manner (there may
to do this. Please let me know if you know of any)
I think strwrap() may do what you want.
* Doesn't currently include S4 methods. Solution: add
to wrap showMethods
* Personally, I think sentence case is more aesthetically pleasing
(and more flexible) than title case.
It's quite hard to go from existing title case to sentence
case, because
we don't have any markup to indicate proper names. One would
think it
would be easier to go in the opposite direction, but in fact the same
problem arises: "van Beethoven" for example, not "Van Beethoven".
Hadley
hints <- function(x) {
I don't like the name "hints". I think we already have too many ways
into the help system:
help
?
help.search
apropos
etc.?
I like your function, but I'd rather see it attached to one of the
existing help functions, probably help.search(). For example,
help.search(x)
could look for functions designed to work with the class of
x, if it had
one. (There's some ambiguity here: perhaps x contains a
string, and I
want help on that string.)
Anyway, thanks for your efforts on this so far; I hope we end up with
something that can make it into the next release.
Duncan Murdoch
db <- eval(utils:::.hsearch_db())
if (is.null(db)) {
help.search("abcd!", rebuild=TRUE, agrep=FALSE)
db <- eval(utils:::.hsearch_db())
}
base <- db$Base
alias <- db$Aliases
key <- db$Keywords
m <- all.methods(class=class(x))
m_id <- alias[match(m, alias[,1]), 2]
keywords <- lapply(m_id, function(id) key[key[,2] %in% id, 1])
f.names <- cbind(m, base[match(m_id, base[,3]), 4])
f.names <- unlist(lapply(1:nrow(f.names), function(i) {
if (is.na(f.names[i, 2])) return(f.names[i, 1])
a <- methodsplit(f.names[i, 1])
b <- methodsplit(f.names[i, 2])
if (a[1] == b[1]) f.names[i, 2] else f.names[i,
}))
hints <- cbind(f.names, base[match(m_id, base[,3]), 5])
hints <- hints[order(tolower(hints[,1])),]
hints <- rbind( c("--------", "---------------"), hints)
rownames(hints) <- rep("", nrow(hints))
colnames(hints) <- c("Function", "Task")
hints[is.na(hints)] <- "(Unknown)"
class(hints) <- "hints"
hints
}
print.hints <- function(x, ...) print(unclass(x), quote=FALSE)
all.methods <- function(classes) {
methods <- do.call(rbind,lapply(classes, function(x) {
m <- methods(class=x)
t(sapply(as.vector(m), methodsplit)) #m[attr(m,
}))
rownames(methods[!duplicated(methods[,1]),])
}
methodsplit <- function(m) {
parts <- strsplit(m, "\\.")[[1]]
if (length(parts) == 1) {
c(name=m, class="")
} else{
c(name=paste(parts[-length(parts)],
collapse="."), class=parts[length(parts)])