Skip to content
Prev 3402 / 21318 Next

[Bioc-devel] Package download stats inflated? (specifically cummeRbund)

Hi Loyal,

The high ratio between nb of downloads and nb of unique IPs should
not be a reason to doubt that these numbers are a true representation
of the downloads. We've already seen this before. See for example the
stats for the ChIPpeakAnno package:

   http://bioconductor.org/packages/stats/bioc/ChIPpeakAnno.html

The package got downloaded 67k times in Oct/Nov 2011 from only 573
distinct IPs, so here the ratio is 117 downloads / IP.

The first time we saw this kind of massive repetitive downloads was
for the biomaRt package more than 1 year ago. We investigated it and
discovered that most downloads (> 95%) were coming from a single IP
(the IP itself was from a University somewhere in the US). We don't
know for sure why they needed to download the same package again and
again thousands of times every day for more than 20 days in a row, but
one explanation could be that they were using some kind of dumb script
to install biomaRt on each node of a big cluster. What's strange though
is that we saw the deluge of downloads for a single package (biomaRt)
and not for a subset of Bioconductor packages (it sounds to me that
the people in charge of a cluster would typically install more than
1 BioC package). But maybe they were testing a script on 1 package,
then realized they could improve it (to download each package only
once), and then used the improved script to actually deploy Bioconductor
on their cluster. Hard to know...

Anyway, because those massive repetitive downloads are possible, maybe
we should put more emphasis on the nb of distinct IPs. This number is
probably more representative of the number of users and therefore is
a better indicator of how much a package is actually used.

Cheers,
H.
On 05/23/2012 02:54 PM, lgoff at csail.mit.edu wrote: