popular R packages - R-help

Gabor Grothendieck · 2009-03-07T19:57:54Z

I would like to get some idea of which R-packages are popular, and what R is used for in general. Are there any statistics available on which R packages are downloaded often, or is there something like a package-survey? Something similar to http://popcon.debian.org/ maybe? Any tips are welcome! ----- Jeroen Ooms * Dept. of Methodology and Statistics * Utrecht University Visit http://www.jeroenooms.com www.jeroenooms.com to explore some of my current projects. -- View this message i

(Ted Harding)

Sun, Mar 8, 2009 2:48 PM #

On 08-Mar-09 20:06:21, Rolf Turner wrote:

Maybe ... ! (I have sometimes got very good answers from bad data,
precisely by analysing how they were bad -- including ascertaining
a change of lab technician from playtykurtosis and, once, identifying
potential occasions of theft from delivery lorries from anomalies in
their cargo docs).

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09                                       Time: 21:48:43
------------------------------ XFMail ------------------------------

Emmanuel Charpentier

Sun, Mar 8, 2009 3:45 PM #

Le dimanche 08 mars 2009 ? 13:22 -0500, Dirk Eddelbuettel a ?crit :

I question 1) the usefulness of the effort necessary to get the data ;
and 2) the very concept of data mining, which seems to be the rationale
for this proposed effort.

Furthermore (but this is seriously off-topic), I seriously despise the
very idea of "popularity" in scientific debates... "Everybody does it"
is *not* a valid argument. Nor "Everyone knows...".

					Emmanuel Charpentier

Barry Rowlingson

Sun, Mar 8, 2009 4:02 PM #

2009/3/8 Emmanuel Charpentier <charpent at bacbuc.dyndns.org>:

As long as we agree that pacakge downloads != popularity then we have
useful data.

 Usefulness of the data? Let's think...

 Suppose we discover that spatstat is downloaded 100 times more than
splancs is. Both packages compute K-functions of spatial data. Pretend
there's an enhancement to K-function computation that could be
implemented in spatstat and/or splancs. Why bother doing it in
splancs?

 Currently the only usage stats we have are even worse measures such
as number of mentions in R-help or number of bug reports. Or maybe
citation counts, but who would make important decisions based on
those?

 I'd love to go 'Hmmm how many people are using my package?' and get
an exact answer. Given the impossibility of that information, I'd love
to go 'Hmmm how many people downloaded my package?', a good
approximation to which is not beyond the bounds of our technology. Web
pages have had annoying 'this piece of software has been downloaded
443535 times' banners (often enclosed in <blink> tags) since 1996.Yes
it would require some effort at each CRAN site, but maybe the CRAN
mirror site maintainers might be interested in doing this. If they
don't want to, then fine.

Barry

Dirk Eddelbuettel

Sun, Mar 8, 2009 4:11 PM #

On 8 March 2009 at 23:45, Emmanuel Charpentier wrote:

| Le dimanche 08 mars 2009   13:22 -0500, Dirk Eddelbuettel a  crit :
| > Once you have data, you have an option of using or discarding it. But if you
| > have no data, you have no option.  How is that better?
| 
| I question 1) the usefulness of the effort necessary to get the data ;
| and 2) the very concept of data mining, which seems to be the rationale
| for this proposed effort.
 
Re 1), Popcon is used for a few actual tasks as for example guiding in the
knapsack problem of which of the 20,000+ packages should be placed on the
first dvd, which on the second and so on simply to minimise disk swapping
when installing.  That's useful in my book, and solves a real problem.

Also, and back to R, consider the relevant page for 'r-base' on Debian (and
forgive them the ugly gnuplot chart)

	http://qa.debian.org/popcon.php?package=r-base

This clearly shows a couple of things:

 - about 3% of all machines participating have r-base-core [ the main R
   package ] installed

 - 89% of those also install r-recommended (which pulls in VR, lattice, ...)

 - 63% of those have the all-in package r-base installs (which pulls in
   r-recommended and documentation package)

 - r-mathlib is not very well used

 - the debug package r-base-core-dbg is possible underused [ it allows you to
   run gdb by installing this package containing matching debug symbols
   without having to rebuild; these dbg are very useful but eat up lots of
   mirror space, whether they could or should be removed was a recent
   internal question

Likewise, you can look at other CRAN package. Here is 

	http://qa.debian.org/popcon.php?package=lme4

which is only about 0.3% of all machines.

| Furthermore (but this is seriously off-topic), I seriously despise the
| very idea of "popularity" in scientific debates... "Everybody does it"
| is *not* a valid argument. Nor "Everyone knows...".

TTBOMK nobody suggested this. 

Dirk

Three out of two people have difficulties with fractions.

Hadley Wickham

Sun, Mar 8, 2009 6:00 PM #

Here's a few either uses that I would put the data to:

 * In my tenure case, grant applications etc, I can say how many
people have downloaded my packages.

 * If relatively few people are using a package, I'd know that I
either need to promote the package more, or improve it so that it is
useful to more people.

 * At a higher level, it would be interesting to see what types of
packages are most frequently download.  Modelling packages? Graphics
packages? Packages for particular applications? ...

Hadley

http://had.co.nz/