Skip to content

popular R packages

30 messages · Gabor Grothendieck, Thomas Adams, David Winsemius +13 more

Messages 26–30 of 30

#
On 08-Mar-09 20:06:21, Rolf Turner wrote:
Maybe ... ! (I have sometimes got very good answers from bad data,
precisely by analysing how they were bad -- including ascertaining
a change of lab technician from playtykurtosis and, once, identifying
potential occasions of theft from delivery lorries from anomalies in
their cargo docs).

Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 08-Mar-09                                       Time: 21:48:43
------------------------------ XFMail ------------------------------
#
Le dimanche 08 mars 2009 ? 13:22 -0500, Dirk Eddelbuettel a ?crit :
I question 1) the usefulness of the effort necessary to get the data ;
and 2) the very concept of data mining, which seems to be the rationale
for this proposed effort.

Furthermore (but this is seriously off-topic), I seriously despise the
very idea of "popularity" in scientific debates... "Everybody does it"
is *not* a valid argument. Nor "Everyone knows...".

					Emmanuel Charpentier
#
2009/3/8 Emmanuel Charpentier <charpent at bacbuc.dyndns.org>:
As long as we agree that pacakge downloads != popularity then we have
useful data.

 Usefulness of the data? Let's think...

 Suppose we discover that spatstat is downloaded 100 times more than
splancs is. Both packages compute K-functions of spatial data. Pretend
there's an enhancement to K-function computation that could be
implemented in spatstat and/or splancs. Why bother doing it in
splancs?

 Currently the only usage stats we have are even worse measures such
as number of mentions in R-help or number of bug reports. Or maybe
citation counts, but who would make important decisions based on
those?

 I'd love to go 'Hmmm how many people are using my package?' and get
an exact answer. Given the impossibility of that information, I'd love
to go 'Hmmm how many people downloaded my package?', a good
approximation to which is not beyond the bounds of our technology. Web
pages have had annoying 'this piece of software has been downloaded
443535 times' banners (often enclosed in <blink> tags) since 1996.Yes
it would require some effort at each CRAN site, but maybe the CRAN
mirror site maintainers might be interested in doing this. If they
don't want to, then fine.

Barry
#
On 8 March 2009 at 23:45, Emmanuel Charpentier wrote:
| Le dimanche 08 mars 2009   13:22 -0500, Dirk Eddelbuettel a  crit :
| > Once you have data, you have an option of using or discarding it. But if you
| > have no data, you have no option.  How is that better?
| 
| I question 1) the usefulness of the effort necessary to get the data ;
| and 2) the very concept of data mining, which seems to be the rationale
| for this proposed effort.
 
Re 1), Popcon is used for a few actual tasks as for example guiding in the
knapsack problem of which of the 20,000+ packages should be placed on the
first dvd, which on the second and so on simply to minimise disk swapping
when installing.  That's useful in my book, and solves a real problem.

Also, and back to R, consider the relevant page for 'r-base' on Debian (and
forgive them the ugly gnuplot chart)

	http://qa.debian.org/popcon.php?package=r-base

This clearly shows a couple of things:

 - about 3% of all machines participating have r-base-core [ the main R
   package ] installed

 - 89% of those also install r-recommended (which pulls in VR, lattice, ...)

 - 63% of those have the all-in package r-base installs (which pulls in
   r-recommended and documentation package)

 - r-mathlib is not very well used

 - the debug package r-base-core-dbg is possible underused [ it allows you to
   run gdb by installing this package containing matching debug symbols
   without having to rebuild; these dbg are very useful but eat up lots of
   mirror space, whether they could or should be removed was a recent
   internal question

Likewise, you can look at other CRAN package. Here is 

	http://qa.debian.org/popcon.php?package=lme4

which is only about 0.3% of all machines.

| Furthermore (but this is seriously off-topic), I seriously despise the
| very idea of "popularity" in scientific debates... "Everybody does it"
| is *not* a valid argument. Nor "Everyone knows...".

TTBOMK nobody suggested this. 

Dirk
#
Here's a few either uses that I would put the data to:

 * In my tenure case, grant applications etc, I can say how many
people have downloaded my packages.

 * If relatively few people are using a package, I'd know that I
either need to promote the package more, or improve it so that it is
useful to more people.

 * At a higher level, it would be interesting to see what types of
packages are most frequently download.  Modelling packages? Graphics
packages? Packages for particular applications? ...

Hadley