Hi R-users and developers,
This month may have seen one of the biggest thread never seen on R-related
mailing lists, the one about "GPL software" and "hidden costs" (at this
day, thread is still open - and active!).
Lot's of mails in this thread are not really relevant to the original mail,
send by Philippe Grosjean.
Nevertheless, most of the mails are of interest and one of my conclusions
was that there is a real need in "help/index relating" stuff.
I have spent some times thinking about it. As everybody, I end up with:
"this is not an easy problem at all" and "what we have *is* still very
great". Indeed!
What you will find now is a sketch of thoughts/proposals . I tend to think
some of those proposals are "low-cost" and could improve the life of R
beginners.
First, I have to say I will put myself in the situation of a really
beginner (say a first classes student):
A user who has practiced for some years will find easier to crawl all the
rich available material. His experiment will help him find easily the
package relevant to his problem, the function, has learned to use
help.search() and so on. And he will wisely use R-help, following the
guideline.
On the contrary, a beginneR will have more and more difficulties entering R
world, as this one is constantly growing (leading to the famous supposed
"hidden costs"). Appropriate poweR is not easy, specially if your daily
task is specialized: you will have difficulties digging into all material
to find those nuggets that will help you (and thanks to the community,
there are so many nuggets... it may be hard to choose between gold or platine).
What we have for now is a document listing keywords. Advanced user will
know those keywords are to be used by package maintainer, feeding the help
system building chain.
This keyword database is very pertinent. It's content, which has been
inherited in part from S, has previously beeing carefuly worked out. And
that works well (try help.search("graphs") will provide you very
interesting stuff - provided you have some packages installed...). I think
that this keywords list may even have more uses.
1. As R community growths, it may be time to add some terms in this
keywords list. Think about SciViews bundle on which Philippe is working.
Most package in it are linked to GUI-stuff. Wouldn't the keyword GUI be
useful? It could be worth offering for one month to the community the
ability to suggest new entries (I am also thinking about econometry stuff).
Then, R core team would choose if candidates are eligible or not.
2. DESCRIPTION files for packages may have a new field: keywords, allowing
the author to add keywords to it's package (minimum one).
Here are some things we could end up:
package keyword(s)
---------------------------------------------
abind Basics, manip, array
accuracy Statistics
acepack Statistics, regression
adapt Mathematics
ade4 multivariate
...
3. Package keywords could be used to propose "automatic" bundles and/or
lists of package (consider for that keywords as categories). Thus, CRAN
sites could have a listing of all packages, but also a listing of all
packages related to Mathematics, to multivariate (statistics) and so on.
And one could propose to install a whole bunch of packages at one time.
Thus (and provided the existence of adequate keywords), the beginner
interested in multivariate statistics would easily install his R with
adequate starting package. Same for econometrics, geostatistics, and any
other field of application.
4. What would really be useful then (I think) is a sort of PACKAGES_INDEX
that would come with R. Explanation: one package index would be it's
keywords (with a high weight) plus all it's functins and their associated
keywords functions (lower weights). When downloading and installing the
newest R, there would be an flat text file containg that (not so so ...so
big). We could also add a function that will refresh this file.
5. Then, we could update "help.search", that would begin to list
information on "installed packages" PLUS potentially suggest other packages
available on CRAN.
6. Final point has already been discussed in the past. It is about misc
packages and pieces of code. I propose the creation of 5 packages:
- miscGraphics (keywords: misc, Graphics)
- miscStatistics (keywords: misc, Statistics)
- miscMathematics (keywords: misc, Mathematics)
- miscBasics (keywords: misc, Basics)
- miscProgramming (keywords: misc, Programming)
With what I proposed before, they would be accessible as a bunch selecting
package for categroy "misc" and each would also be listed in it's category
("Graphics",...).
Each of those package would have a maintainer and a new mailing list (say
R-misc) could be set up to talk about pieces of code that could enter such
or such package. Yes, I am volonteer to maintain one of those.
There is some work here for all 6 points, but not so much. What is great is
that we already have most of the necessary stuff. And we only use KEYWORDS
file...
Please let me know what you think about those suggestions. If there is
interest, I may ask for others volonteers to set one or more of those
suggestions.
Eric
Eric Lecoutre
UCL / Institut de Statistique
Voie du Roman Pays, 20
1348 Louvain-la-Neuve
Belgium
tel: (+32)(0)10473050
lecoutre@stat.ucl.ac.be
http://www.stat.ucl.ac.be/ISpersonnel/lecoutre
If the statistics are boring, then you've got the wrong numbers. -Edward
Tufte
Suggestions for packages / help / index (long mail)
6 messages · Eric Lecoutre, Gabor Grothendieck, Adaikalavan Ramasamy +1 more
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes: : 6. Final point has already been discussed in the past. It is about misc : packages and pieces of code. I propose the creation of 5 packages: : - miscGraphics (keywords: misc, Graphics) : - miscStatistics (keywords: misc, Statistics) : - miscMathematics (keywords: misc, Mathematics) : - miscBasics (keywords: misc, Basics) : - miscProgramming (keywords: misc, Programming) Rather than preset the categories perhaps evolving them would be better, just starting out with a single Misc package and then decomposing it into multiple packages as the categories become clear.
At 15:06 24/11/2004, Gabor Grothendieck wrote:
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes: : 6. Final point has already been discussed in the past. It is about misc : packages and pieces of code. I propose the creation of 5 packages: : - miscGraphics (keywords: misc, Graphics) : - miscStatistics (keywords: misc, Statistics) : - miscMathematics (keywords: misc, Mathematics) : - miscBasics (keywords: misc, Basics) : - miscProgramming (keywords: misc, Programming) Rather than preset the categories perhaps evolving them would be better, just starting out with a single Misc package and then decomposing it into multiple packages as the categories become clear.
Those categories are taken from KEYWORDS (master entries). I guess it wouldn't be difficult to still have substancial entries for those packages, if some misc package maintainer would make the job to break their package into pieces. BTW, I have to admit this choice is not easy to make for several reasons, the main one beeing to keep the ability to modify one's own contributions. For those packages, a collaborative plattform such as SourceForge and so on, with Sync-ability, could be a good choice. Eric Eric Lecoutre UCL / Institut de Statistique Voie du Roman Pays, 20 1348 Louvain-la-Neuve Belgium tel: (+32)(0)10473050 lecoutre@stat.ucl.ac.be http://www.stat.ucl.ac.be/ISpersonnel/lecoutre If the statistics are boring, then you've got the wrong numbers. -Edward Tufte
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes: :
: At 15:06 24/11/2004, Gabor Grothendieck wrote:
: >Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes: : > : >: 6. Final point has already been discussed in the past. It is about misc : >: packages and pieces of code. I propose the creation of 5 packages: : >: - miscGraphics (keywords: misc, Graphics) : >: - miscStatistics (keywords: misc, Statistics) : >: - miscMathematics (keywords: misc, Mathematics) : >: - miscBasics (keywords: misc, Basics) : >: - miscProgramming (keywords: misc, Programming) : > : >Rather than preset the categories perhaps evolving them would : >be better, just starting out with a single Misc package and then : >decomposing it into multiple packages as the categories become : >clear. : : Those categories are taken from KEYWORDS (master entries). I guess it : wouldn't be difficult to still have substancial entries for those packages, : if some misc package maintainer would make the job to break their package : into pieces. BTW, I have to admit this choice is not easy to make for : several reasons, the main one beeing to keep the ability to modify one's : own contributions. : For those packages, a collaborative plattform such as SourceForge and so : on, with Sync-ability, could be a good choice. : : Eric Sorry, I did not understand the keyword connection you were making. My comment was based on the 80/20 idea that if 80% of the software gets contributed to 20%, i.e. one of the packages, then perhaps having 5 is superfluous. If the categories are made afterwards, rather than before, one can construct them to esnure a more even number of routines.
3 days later
I am coming a bit late to the thread, so apologies if I am missing something. I believe that it would be more useful to index functions to particular keywords than a package itself. I think we may have over-looked Prof. Harrell's suggestion (https://stat.ethz.ch/pipermail/r-sig-gui/2004-November/000410.html) during the "Hidden costs of GPL software" thread. His site (http://biostat.mc.vanderbilt.edu/s/finder/finder.html) is quite useful. If this was turned into a wiki or something similar, perhaps it could have much more benefit. Regards, Adai
On Wed, 2004-11-24 at 14:15, Eric Lecoutre wrote:
At 15:06 24/11/2004, Gabor Grothendieck wrote:
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes: : 6. Final point has already been discussed in the past. It is about misc : packages and pieces of code. I propose the creation of 5 packages: : - miscGraphics (keywords: misc, Graphics) : - miscStatistics (keywords: misc, Statistics) : - miscMathematics (keywords: misc, Mathematics) : - miscBasics (keywords: misc, Basics) : - miscProgramming (keywords: misc, Programming) Rather than preset the categories perhaps evolving them would be better, just starting out with a single Misc package and then decomposing it into multiple packages as the categories become clear.
Those categories are taken from KEYWORDS (master entries). I guess it wouldn't be difficult to still have substancial entries for those packages, if some misc package maintainer would make the job to break their package into pieces. BTW, I have to admit this choice is not easy to make for several reasons, the main one beeing to keep the ability to modify one's own contributions. For those packages, a collaborative plattform such as SourceForge and so on, with Sync-ability, could be a good choice. Eric Eric Lecoutre UCL / Institut de Statistique Voie du Roman Pays, 20 1348 Louvain-la-Neuve Belgium tel: (+32)(0)10473050 lecoutre@stat.ucl.ac.be http://www.stat.ucl.ac.be/ISpersonnel/lecoutre If the statistics are boring, then you've got the wrong numbers. -Edward Tufte
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Adaikalavan Ramasamy wrote:
I am coming a bit late to the thread, so apologies if I am missing something. I believe that it would be more useful to index functions to particular keywords than a package itself. I think we may have over-looked Prof. Harrell's suggestion (https://stat.ethz.ch/pipermail/r-sig-gui/2004-November/000410.html) during the "Hidden costs of GPL software" thread. His site (http://biostat.mc.vanderbilt.edu/s/finder/finder.html) is quite useful. If this was turned into a wiki or something similar, perhaps it could have much more benefit. Regards, Adai
I'd rather support John Fox idea to encourage the use of \concept{}
entries much stronger.
Uwe Ligges
On Wed, 2004-11-24 at 14:15, Eric Lecoutre wrote:
At 15:06 24/11/2004, Gabor Grothendieck wrote:
Eric Lecoutre <lecoutre <at> stat.ucl.ac.be> writes: : 6. Final point has already been discussed in the past. It is about misc : packages and pieces of code. I propose the creation of 5 packages: : - miscGraphics (keywords: misc, Graphics) : - miscStatistics (keywords: misc, Statistics) : - miscMathematics (keywords: misc, Mathematics) : - miscBasics (keywords: misc, Basics) : - miscProgramming (keywords: misc, Programming) Rather than preset the categories perhaps evolving them would be better, just starting out with a single Misc package and then decomposing it into multiple packages as the categories become clear.
Those categories are taken from KEYWORDS (master entries). I guess it wouldn't be difficult to still have substancial entries for those packages, if some misc package maintainer would make the job to break their package into pieces. BTW, I have to admit this choice is not easy to make for several reasons, the main one beeing to keep the ability to modify one's own contributions. For those packages, a collaborative plattform such as SourceForge and so on, with Sync-ability, could be a good choice. Eric Eric Lecoutre UCL / Institut de Statistique Voie du Roman Pays, 20 1348 Louvain-la-Neuve Belgium tel: (+32)(0)10473050 lecoutre@stat.ucl.ac.be http://www.stat.ucl.ac.be/ISpersonnel/lecoutre If the statistics are boring, then you've got the wrong numbers. -Edward Tufte
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-devel