R in the NY Times
on 01/08/2009 01:12 PM Andrew Choens wrote:
On Thu, 2009-01-08 at 10:42 -0600, Stas Kolenikov wrote:
A really good measure for R will be the total # of the downloads of r-base for all platforms from all CRAN mirrors (and I would expect that # can be found from the servers' logs). Given that it is so easy to download everything nice and clean and up to date, I would doubt anybody will be distributing CD-ROMs with R install files among friends and colleagues. SAS (and Stata, and SPSS, and Minitab, and...) should have their (internal) number of licenses sold (and yes those come on the disks initially), but those are badly blurred by the network licenses, and are commercial secrets, anyway.
The number of r-core downloads is definitely NOT representative of the number of people using R. If you use R on Windows or OS X, you will obviously download R from the mirrors. However, this methodology would effectively ignore many users of R on Linux. I use R on a regular basis and I have it installed on three separate systems, all running Ubuntu. In all of these cases, I am downloading and installing r-core from the Ubuntu Mirror in the USA, not from CRAN.
I would also note that R has been available via the Fedora yum repos for some time, which as with the Debian/Ubuntu repos, would be missed in just counting CRAN downloads. There are quite a few other Linux distributions that have a similar infrastructure in place where R is available as an 'add-on' or where the main distribution itself includes R. Additionally, there are many folks who will build R from source code, using the updated source tarballs via FTP or, as I do, by getting the source code right from the R subversion repo. These too would not be considered in a CRAN based count.
Of course, the number of Linux users is miniscule compared to the number of Windows users, but I think it is safe to say the Linux users are, in general, a more tech-savvy group than Windows users and are more likely to be comfortable using R's interactive programming interface. I think it is also fair to say that MANY (though not all) Linux users would be uncomfortable installing SPSS or SAS or Stata onto their open-source system and would prefer to use R. Thus, Linux users probably account for a higher proportion of R's user-base than they do in the general computing population. . . . although I do not claim to actually know this proportion. Ehh. Comparing the popularity of computer software is incredibly tricky to do, especially when some of the software being compared in open-source.
Correct. Trying extrapolate the number of users from any of these measures is quite complex, if doable at all. Even using the posting frequencies as I did yesterday, needs to be taken with a grain of salt in trying to attempt to get a sense of growth. As Dirk noted, the many R-SIG-* e-mail lists have offloaded some level of traffic from R-Help, which may account for the rate of growth in the R-Help posts declining somewhat since 2004 as Gabor pointed out, even though the absolute number of annual posts continues to increase. Reading the posts on SAS-L since yesterday via Google RSS, where the NYT article was also posted, some have noted that SAS itself offers online support forums (http://support.sas.com/forums/index.jspa). From a quick review, it looks like the SAS.com forums date back to perhaps early 2006, thus possibly accounting for some of the leveling of the posts on SAS-L recently. HTH, Marc Schwartz