Hi Bioc-devel, I am the package maintainer for the cummeRbund package and since I'm not exactly sure to whom I should ask this question, I decided to post to the bioc-devel list. Since this is my first Bioc package I have been keenly interested in the download stats that are tracked and visible on the Bioconductor website, here: http://bioconductor.org/packages/stats/index.html Specifically, I'm noticing that the number of downloads for the cummeRbund package seems to far outpace the number of unique IP addresses downloading the package: http://bioconductor.org/packages/stats/bioc/cummeRbund.html For a few months there was a mean of between 10-20 downloads per unique IP address, and for the current month this is on track to be about 36 downloads/IP (and looks to be about 8.7% of the total BioC packages downloaded this month so far). Looking around at several other packages, this does not seem to be the case as most of the packages in the top 30 list have a ratio of about 1.8-3 downloads / IP. As ecstatic as these numbers make me, I'm certain that there is some underlying reason for this inflation that is not being appropriately represented here, but without anything else to go on, I'm not really sure where this is coming from. I would obviously like to have an honest representation of the number of downloads for my package, and I was hoping that someone with access to these data could help me track down the cause of this download inflation (unless these numbers are a true representation of the downloads, and then I would also very much like to find out more demographics if possible as well). Any and all advice or information is appreciated! Thanks to all, and a special thanks to everyone that helps to keep BioC such an amazing project. I have enjoyed the benefits of bioconductor for the past 5+ years and I'm very happy that I can finally start to contribute back to this wonderful project. (Also, I look forward to meeting some of you at BioC 2012 this year!) Thanks in advance! Cheers, Loyal Goff (lgoff at csail.mit.edu) NSF Postdoctoral Fellow Computer Science and Artificial Intelligence Laboratory, MIT & Stem Cells and Regenerative Biology Department, Harvard University & The Broad Institute
[Bioc-devel] Package download stats inflated? (specifically cummeRbund)
2 messages · lgoff at csail.mit.edu, Hervé Pagès
Hi Loyal, The high ratio between nb of downloads and nb of unique IPs should not be a reason to doubt that these numbers are a true representation of the downloads. We've already seen this before. See for example the stats for the ChIPpeakAnno package: http://bioconductor.org/packages/stats/bioc/ChIPpeakAnno.html The package got downloaded 67k times in Oct/Nov 2011 from only 573 distinct IPs, so here the ratio is 117 downloads / IP. The first time we saw this kind of massive repetitive downloads was for the biomaRt package more than 1 year ago. We investigated it and discovered that most downloads (> 95%) were coming from a single IP (the IP itself was from a University somewhere in the US). We don't know for sure why they needed to download the same package again and again thousands of times every day for more than 20 days in a row, but one explanation could be that they were using some kind of dumb script to install biomaRt on each node of a big cluster. What's strange though is that we saw the deluge of downloads for a single package (biomaRt) and not for a subset of Bioconductor packages (it sounds to me that the people in charge of a cluster would typically install more than 1 BioC package). But maybe they were testing a script on 1 package, then realized they could improve it (to download each package only once), and then used the improved script to actually deploy Bioconductor on their cluster. Hard to know... Anyway, because those massive repetitive downloads are possible, maybe we should put more emphasis on the nb of distinct IPs. This number is probably more representative of the number of users and therefore is a better indicator of how much a package is actually used. Cheers, H.
On 05/23/2012 02:54 PM, lgoff at csail.mit.edu wrote:
Hi Bioc-devel, I am the package maintainer for the cummeRbund package and since I'm not exactly sure to whom I should ask this question, I decided to post to the bioc-devel list. Since this is my first Bioc package I have been keenly interested in the download stats that are tracked and visible on the Bioconductor website, here: http://bioconductor.org/packages/stats/index.html Specifically, I'm noticing that the number of downloads for the cummeRbund package seems to far outpace the number of unique IP addresses downloading the package: http://bioconductor.org/packages/stats/bioc/cummeRbund.html For a few months there was a mean of between 10-20 downloads per unique IP address, and for the current month this is on track to be about 36 downloads/IP (and looks to be about 8.7% of the total BioC packages downloaded this month so far). Looking around at several other packages, this does not seem to be the case as most of the packages in the top 30 list have a ratio of about 1.8-3 downloads / IP. As ecstatic as these numbers make me, I'm certain that there is some underlying reason for this inflation that is not being appropriately represented here, but without anything else to go on, I'm not really sure where this is coming from. I would obviously like to have an honest representation of the number of downloads for my package, and I was hoping that someone with access to these data could help me track down the cause of this download inflation (unless these numbers are a true representation of the downloads, and then I would also very much like to find out more demographics if possible as well). Any and all advice or information is appreciated! Thanks to all, and a special thanks to everyone that helps to keep BioC such an amazing project. I have enjoyed the benefits of bioconductor for the past 5+ years and I'm very happy that I can finally start to contribute back to this wonderful project. (Also, I look forward to meeting some of you at BioC 2012 this year!) Thanks in advance! Cheers, Loyal Goff (lgoff at csail.mit.edu) NSF Postdoctoral Fellow Computer Science and Artificial Intelligence Laboratory, MIT & Stem Cells and Regenerative Biology Department, Harvard University & The Broad Institute
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319