Could I voice my support for the sixth point raised by John Fox? Many users would find such a development to be enormously useful. " (6) As has been pointed out, e.g., by Duncan Murdoch, solving the function-locating problem is best done by a method or methods that automatically accommodate the growing and changing set of contributed packages on CRAN. Why not, as previously has been proposed, replace the current static (and, in my view, not very useful) set of keywords in R documentation with the requirement that package authors supply their own keywords for each documented object? I believe that this is the intent of the concept entries in Rd files, but their use certainly is not required or even actively encouraged. (They're just mentioned in passing in the Writing R Extensions manual.)" ********************************************************** Cliff Lunneborg, Professor Emeritus, Statistics & Psychology, University of Washington, Seattle cliff at ms.washington.edu
The hidden costs of GPL software?
9 messages · Cliff Lunneborg, Duncan Murdoch, John Fox +6 more
2 days later
On Fri, 19 Nov 2004 13:59:23 -0800, "Cliff Lunneborg" <cliff at ms.washington.edu> quoted John Fox:
Why not, as previously has been proposed, replace the current static (and, in my view, not very useful) set of keywords in R documentation with the requirement that package authors supply their own keywords for each documented object? I believe that this is the intent of the concept entries in Rd files, but their use certainly is not required or even actively encouraged. (They're just mentioned in passing in the Writing R Extensions manual.
That would not be easy and won't happen quickly. There are some problems: - The base packages mostly don't use \concept. (E.g. base has 365 man pages, only about 15 of them use it). Adding it to each file is a fairly time-consuming task. - Before we started, we'd need to agree as to what they are for. Right now, I think they are mainly used when the name of a concept doesn't match the name of the function that implements it, e.g. "modulo", "remainder", "promise", "argmin", "assertion". The need for this usage is pretty rare. If they were used for everything, what would they contain? - Keywording in a useful way is hard. There are spelling issues (e.g. optimise versus optimize); our fuzzy matching helps with those. But there are also multiple names for the same thing, and multiple meanings for the same name. Duncan Murdoch
Dear Duncan, I don't think that there is an automatic, nearly costless way of providing an effective solution to locating R resources. The problem seems to me to be analogous to indexing a book. There's an excellent description of what that process *should* look like in the Chicago Manual of Style, and it's a lot of work. In my experience, most book indexes are quite poor, and automatically generated indexes, while not useless, are even worse, since one should index concepts, not words. The ideal indexer is therefore the author of the book. I guess that the question boils down to how important is it to provide an analogue of a good index to R? As I said in a previous message, I believe that the current search facilities work pretty well -- about as well as one could expect of an automatic approach. I don't believe that there's an effective centralized solution, so doing something more ambitious than is currently available implies farming out the process to package authors. Of course, there's no guarantee that all package authors will be diligent indexers. Regards, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Duncan Murdoch Sent: Monday, November 22, 2004 8:55 AM To: Cliff Lunneborg Cc: r-help at stat.math.ethz.ch Subject: Re: [R] The hidden costs of GPL software? On Fri, 19 Nov 2004 13:59:23 -0800, "Cliff Lunneborg" <cliff at ms.washington.edu> quoted John Fox:
Why not, as previously has been proposed, replace the current static (and, in my view, not very useful) set of keywords in R
documentation
with the requirement that package authors supply their own
keywords for
each documented object? I believe that this is the intent of the concept entries in Rd files, but their use certainly is not
required or
even actively encouraged. (They're just mentioned in passing in the Writing R Extensions manual.
That would not be easy and won't happen quickly. There are some problems: - The base packages mostly don't use \concept. (E.g. base has 365 man pages, only about 15 of them use it). Adding it to each file is a fairly time-consuming task. - Before we started, we'd need to agree as to what they are for. Right now, I think they are mainly used when the name of a concept doesn't match the name of the function that implements it, e.g. "modulo", "remainder", "promise", "argmin", "assertion". The need for this usage is pretty rare. If they were used for everything, what would they contain? - Keywording in a useful way is hard. There are spelling issues (e.g. optimise versus optimize); our fuzzy matching helps with those. But there are also multiple names for the same thing, and multiple meanings for the same name. Duncan Murdoch
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Having just finished an index I would like to second John's comments. Even as an author, it is difficult to achieve some degree of completeness and consistency. Of course, maybe a real whizz at clustering could assemble something very useful quite easily. All of us who have had the frustration of searching for a forgotten function would be grateful. url: www.econ.uiuc.edu/~roger Roger Koenker email rkoenker at uiuc.edu Department of Economics vox: 217-333-4558 University of Illinois fax: 217-244-6678 Champaign, IL 61820
On Nov 23, 2004, at 7:48 AM, John Fox wrote:
Dear Duncan, I don't think that there is an automatic, nearly costless way of providing an effective solution to locating R resources. The problem seems to me to be analogous to indexing a book. There's an excellent description of what that process *should* look like in the Chicago Manual of Style, and it's a lot of work. In my experience, most book indexes are quite poor, and automatically generated indexes, while not useless, are even worse, since one should index concepts, not words. The ideal indexer is therefore the author of the book. I guess that the question boils down to how important is it to provide an analogue of a good index to R? As I said in a previous message, I believe that the current search facilities work pretty well -- about as well as one could expect of an automatic approach. I don't believe that there's an effective centralized solution, so doing something more ambitious than is currently available implies farming out the process to package authors. Of course, there's no guarantee that all package authors will be diligent indexers. Regards, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Duncan Murdoch Sent: Monday, November 22, 2004 8:55 AM To: Cliff Lunneborg Cc: r-help at stat.math.ethz.ch Subject: Re: [R] The hidden costs of GPL software? On Fri, 19 Nov 2004 13:59:23 -0800, "Cliff Lunneborg" <cliff at ms.washington.edu> quoted John Fox:
Why not, as previously has been proposed, replace the current static (and, in my view, not very useful) set of keywords in R
documentation
with the requirement that package authors supply their own
keywords for
each documented object? I believe that this is the intent of the concept entries in Rd files, but their use certainly is not
required or
even actively encouraged. (They're just mentioned in passing in the Writing R Extensions manual.
That would not be easy and won't happen quickly. There are some problems: - The base packages mostly don't use \concept. (E.g. base has 365 man pages, only about 15 of them use it). Adding it to each file is a fairly time-consuming task. - Before we started, we'd need to agree as to what they are for. Right now, I think they are mainly used when the name of a concept doesn't match the name of the function that implements it, e.g. "modulo", "remainder", "promise", "argmin", "assertion". The need for this usage is pretty rare. If they were used for everything, what would they contain? - Keywording in a useful way is hard. There are spelling issues (e.g. optimise versus optimize); our fuzzy matching helps with those. But there are also multiple names for the same thing, and multiple meanings for the same name. Duncan Murdoch
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Tue, 2004-11-23 at 17:40, roger koenker wrote:
Having just finished an index I would like to second John's comments. Even as an author, it is difficult to achieve some degree of completeness and consistency. Of course, maybe a real whizz at clustering could assemble something very useful quite easily. All of us who have had the frustration of searching for a forgotten function would be grateful.
You mean SOM?
Jari Oksanen <jarioksa at sun3.oulu.fi>
I think John has exactly the right image -- index to a book -- but I disagree with his conclusions. I read somewhere that an index should not be done by the author. It was probably written by someone who was bored of indexing, but the logic was precisely because indices should be about concepts. The author of a package will have one concept for a function but not all of the concepts that come from various fields of study. I suspect that no one outside of finance would think to index "sd" with "volatility" for (a not very good) example. There could be an index builder that accepts a search phrase and the function or package that is the successful answer to the search. If this were open, then R users could contribute to the index who don't feel qualified to submit code. It could also help diffuse the frustration of taking too long to find a function by allowing a way to insure that the exact same thing doesn't happen to others. Amazon has a function that says those who bought "The Chicago Manual of Style" also bought Strunk and White. In the same way, the R index could provide a list of terms that overlap the given search term. For example if we search for "goodness of fit", then "hypothesis test" might be one of the related terms that pops up. No, I'm not volunteering to build the system. Patrick Burns Burns Statistics patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User")
John Fox wrote:
Dear Duncan, I don't think that there is an automatic, nearly costless way of providing an effective solution to locating R resources. The problem seems to me to be analogous to indexing a book. There's an excellent description of what that process *should* look like in the Chicago Manual of Style, and it's a lot of work. In my experience, most book indexes are quite poor, and automatically generated indexes, while not useless, are even worse, since one should index concepts, not words. The ideal indexer is therefore the author of the book. I guess that the question boils down to how important is it to provide an analogue of a good index to R? As I said in a previous message, I believe that the current search facilities work pretty well -- about as well as one could expect of an automatic approach. I don't believe that there's an effective centralized solution, so doing something more ambitious than is currently available implies farming out the process to package authors. Of course, there's no guarantee that all package authors will be diligent indexers. Regards, John -------------------------------- John Fox Department of Sociology McMaster University Hamilton, Ontario Canada L8S 4M4 905-525-9140x23604 http://socserv.mcmaster.ca/jfox --------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Duncan Murdoch Sent: Monday, November 22, 2004 8:55 AM To: Cliff Lunneborg Cc: r-help at stat.math.ethz.ch Subject: Re: [R] The hidden costs of GPL software? On Fri, 19 Nov 2004 13:59:23 -0800, "Cliff Lunneborg" <cliff at ms.washington.edu> quoted John Fox:
Why not, as previously has been proposed, replace the current static
(and, in my view, not very useful) set of keywords in R
documentation
with the requirement that package authors supply their own
keywords for
each documented object? I believe that this is the intent of the
concept entries in Rd files, but their use certainly is not
required or
even actively encouraged. (They're just mentioned in passing in the
Writing R Extensions manual.
That would not be easy and won't happen quickly. There are some problems: - The base packages mostly don't use \concept. (E.g. base has 365 man pages, only about 15 of them use it). Adding it to each file is a fairly time-consuming task. - Before we started, we'd need to agree as to what they are for. Right now, I think they are mainly used when the name of a concept doesn't match the name of the function that implements it, e.g. "modulo", "remainder", "promise", "argmin", "assertion". The need for this usage is pretty rare. If they were used for everything, what would they contain? - Keywording in a useful way is hard. There are spelling issues (e.g. optimise versus optimize); our fuzzy matching helps with those. But there are also multiple names for the same thing, and multiple meanings for the same name. Duncan Murdoch
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Patrick Burns wrote:
[....] No, I'm not volunteering to build the system.
Too bad! ;-) Indeed, the idea to index tens of thousands of functions could not be appealing to many of us! Why not to consider to test such ideas at the package level? I mean, building a system that points out the packages of interest (those in CRAN, of course), given a search phrase would be a more resonable work. Then, looking at online help of that particular package would be the small additional effort required by the user. The problem here is with heterogeneous packages (the XXXXmisc, and the like)... And... No I'm not volunteering to build the system either. Best, Philippe Grosjean
At 11/23/2004 11:45 AM Tuesday, Patrick Burns wrote:
...There could be an index builder that accepts a search phrase and the function or package that is the successful answer to the search. If this were open, then R users could contribute to the index who don't feel qualified to submit code. It could also help diffuse the frustration of taking too long to find a function by allowing a way to insure that the exact same thing doesn't happen to others. [...] No, I'm not volunteering to build the system.
Nor am I, but as one of those users, I would very gladly contribute to it.
Michael Prager, Ph.D. Population Dynamics Team, NMFS SE Fisheries Science Center NOAA Center for Coastal Fisheries and Habitat Research Beaufort, North Carolina 28516 http://shrimp.ccfhrb.noaa.gov/~mprager/
On Tue, 23 Nov 2004, Philippe Grosjean wrote:
Patrick Burns wrote:
[....] No, I'm not volunteering to build the system.
Too bad! ;-) Indeed, the idea to index tens of thousands of functions could not be appealing to many of us! Why not to consider to test such ideas at the package level? I mean, building a system that points out the packages of interest (those in CRAN, of course), given a search phrase would be a more resonable work. Then, looking at online help of that particular package would be the small additional effort required by the user. The problem here is with heterogeneous packages (the XXXXmisc, and the like)...
This mail archive works well if the questions are well posed and answered:
help.search.archive<-function(string){
RURL="http://www.google.com/u/newcastlemaths"
RSearchURL=paste(RURL,"?q=",string,sep='')
browseURL(RSearchURL)
return(invisible(0))
}
help.search.google<-function(string){
RURL="http://www.google.com/search"
RSearchURL=paste(RURL,"?sitesearch=r-project.org&q=",string,sep='')
browseURL(RSearchURL)
return(invisible(0))
}
help.search.archive('volatility') # may soon show Dr. Harrell's example
help.search.google('volatility') # may show enough
Is there package data that is not searchable through the google search?
Dave
Dave Forrest
drf at vims.edu (804)684-7900w
drf5n at maplepark.com (804)642-0662h
http://maplepark.com/~drf5n/