Skip to content

Suggestions for packages / help / index (long mail)

6 messages · Eric Lecoutre, Gabor Grothendieck, Adaikalavan Ramasamy +1 more

#
Hi R-users and developers,

This month may have seen one of the biggest thread never seen on R-related 
mailing lists, the one about "GPL software" and "hidden costs" (at this 
day, thread is still open - and active!).
Lot's of mails in this thread are not really relevant to the original mail, 
send by Philippe Grosjean.
Nevertheless, most of the mails are of interest and one of my conclusions 
was that there is a real need in "help/index relating" stuff.
I have spent some times thinking about it. As everybody, I end up with: 
"this is not an easy problem at all" and "what we have *is* still very 
great". Indeed!
What you will find now is a sketch of thoughts/proposals . I tend to think 
some of those proposals are "low-cost" and could improve the life of R 
beginners.

First, I have to say I will put myself in the situation of a really 
beginner (say a first classes student):
A user who has practiced for some years will find easier to crawl all the 
rich available material. His experiment will help him find easily the 
package relevant to his problem, the function, has learned to use 
help.search() and so on. And he will wisely use R-help, following the 
guideline.
On the contrary, a beginneR will have more and more difficulties entering R 
world, as this one is constantly growing (leading to the famous supposed 
"hidden costs"). Appropriate poweR is not easy, specially if your daily 
task is specialized: you will have difficulties digging into all material 
to find those nuggets that will help you (and thanks to the community, 
there are so many nuggets... it may be hard to choose between gold or platine).

What we have for now is a document listing keywords. Advanced user will 
know those keywords are to be used by package maintainer, feeding the help 
system building chain.
This keyword database is very pertinent. It's content, which has been 
inherited in part from S, has previously beeing carefuly worked out. And 
that works well (try help.search("graphs") will provide you very 
interesting stuff - provided you have some packages installed...). I think 
that this keywords list may even have more uses.

1. As R community growths, it may be time to add some terms in this 
keywords list. Think about SciViews bundle on which Philippe is working. 
Most package in it are linked to GUI-stuff. Wouldn't the keyword GUI be 
useful? It could be worth offering for one month to the community the 
ability to suggest new entries (I am also thinking about econometry stuff). 
Then, R core team would choose if candidates are eligible or not.

2. DESCRIPTION files for packages may have a new field: keywords, allowing 
the author to add keywords to it's package (minimum one).


Here are some things we could end up:

package		keyword(s)		
---------------------------------------------
abind		Basics, manip, array
accuracy	Statistics
acepack		Statistics, regression
adapt		Mathematics
ade4		multivariate
...


3. Package keywords could be used to propose "automatic" bundles and/or 
lists of package (consider for that keywords as categories). Thus, CRAN 
sites could have a listing of all packages, but also a listing of all 
packages related to Mathematics, to multivariate (statistics) and so on. 
And one could propose to install a whole bunch of packages at one time. 
Thus (and provided the existence of adequate keywords), the beginner 
interested in multivariate statistics would easily install his R with 
adequate starting package. Same for econometrics, geostatistics, and any 
other field of application.

4. What would really be useful then (I think) is a sort of PACKAGES_INDEX 
that would come with R. Explanation: one package index would be it's 
keywords (with a high weight) plus all it's functins and their associated 
keywords functions (lower weights). When downloading and installing the 
newest R, there would be an flat text file containg that (not so so ...so 
big). We could also add a function that will refresh this file.

5. Then, we could update "help.search", that would begin to list 
information on "installed packages" PLUS potentially suggest other packages 
available on CRAN.

6. Final point has already been discussed in the past. It is about misc 
packages and pieces of code. I propose the creation of 5 packages:
	- miscGraphics (keywords: misc, Graphics)
	- miscStatistics (keywords: misc, Statistics)
	- miscMathematics (keywords: misc, Mathematics)
	- miscBasics (keywords: misc, Basics)
	- miscProgramming (keywords: misc, Programming)
With what I proposed before, they would be accessible as a bunch selecting 
package for categroy "misc" and each would also be listed in it's category 
("Graphics",...).
Each of those package would have a maintainer and a new mailing list (say 
R-misc) could be set up to talk about pieces of code that could enter such 
or such package. Yes, I am volonteer to maintain one of those.



There is some work here for all 6 points, but not so much. What is great is 
that we already have most of the necessary stuff. And we only use KEYWORDS 
file...
Please let me know what you think about those suggestions. If there is 
interest, I may ask for others volonteers to set one or more of those 
suggestions.

Eric

Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
lecoutre@stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward 
Tufte
#
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes:

: 6. Final point has already been discussed in the past. It is about misc 
: packages and pieces of code. I propose the creation of 5 packages:
: 	- miscGraphics (keywords: misc, Graphics)
: 	- miscStatistics (keywords: misc, Statistics)
: 	- miscMathematics (keywords: misc, Mathematics)
: 	- miscBasics (keywords: misc, Basics)
: 	- miscProgramming (keywords: misc, Programming)

Rather than preset the categories perhaps evolving them would
be better, just starting out with a single Misc package and then 
decomposing it into multiple packages as the categories become
clear.
#
At 15:06 24/11/2004, Gabor Grothendieck wrote:
Those categories are taken from KEYWORDS (master entries). I guess it 
wouldn't be difficult to still have substancial entries for those packages, 
if some misc package maintainer would make the job to break their package 
into pieces. BTW, I have to admit this choice is not easy to make for 
several reasons, the main one beeing to keep the ability to modify one's 
own contributions.
For those packages, a collaborative plattform such as SourceForge and so 
on, with Sync-ability, could be a good choice.

Eric


Eric Lecoutre
UCL /  Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium

tel: (+32)(0)10473050
lecoutre@stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre

If the statistics are boring, then you've got the wrong numbers. -Edward 
Tufte
#
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes:

:
: At 15:06 24/11/2004, Gabor Grothendieck wrote:
: >Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes:
: >
: >: 6. Final point has already been discussed in the past. It is about misc
: >: packages and pieces of code. I propose the creation of 5 packages:
: >:       - miscGraphics (keywords: misc, Graphics)
: >:       - miscStatistics (keywords: misc, Statistics)
: >:       - miscMathematics (keywords: misc, Mathematics)
: >:       - miscBasics (keywords: misc, Basics)
: >:       - miscProgramming (keywords: misc, Programming)
: >
: >Rather than preset the categories perhaps evolving them would
: >be better, just starting out with a single Misc package and then
: >decomposing it into multiple packages as the categories become
: >clear.
: 
: Those categories are taken from KEYWORDS (master entries). I guess it 
: wouldn't be difficult to still have substancial entries for those packages, 
: if some misc package maintainer would make the job to break their package 
: into pieces. BTW, I have to admit this choice is not easy to make for 
: several reasons, the main one beeing to keep the ability to modify one's 
: own contributions.
: For those packages, a collaborative plattform such as SourceForge and so 
: on, with Sync-ability, could be a good choice.
: 
: Eric

Sorry, I did not understand the keyword connection you were making.
My comment was based on the 80/20 idea that if 80% of the software
gets contributed to 20%, i.e. one of the packages, then perhaps having
5 is superfluous.  If the categories are made afterwards, rather than
before, one can construct them to esnure a more even number of routines.
3 days later
#
I am coming a bit late to the thread, so apologies if I am missing
something. I believe that it would be more useful to index functions to
particular keywords than a package itself. 

I think we may have over-looked Prof. Harrell's suggestion  
(https://stat.ethz.ch/pipermail/r-sig-gui/2004-November/000410.html)
during the "Hidden costs of GPL software" thread.

His site (http://biostat.mc.vanderbilt.edu/s/finder/finder.html) is
quite useful. If this was turned into a wiki or something similar,
perhaps it could have much more benefit.

Regards, Adai
On Wed, 2004-11-24 at 14:15, Eric Lecoutre wrote:
#
Adaikalavan Ramasamy wrote:

            
I'd rather support John Fox idea to encourage the use of \concept{} 
entries much stronger.

Uwe Ligges