Skip to content

# of users of R, and biological examples of the use of R

16 messages · Ramon Diaz-Uriarte, A.J. Rossini, Martin Maechler +9 more

#
Dear All,

With a colleague we are writing a paper where we show how R is a very nice
tool to deal with some issues in the analyses of data in evolutionary biology. 
For the intro, I wonder if

1) Anybody has any rough idea of how many people might be using R or how many
people have downloaded R, or similar (I am aware answering this question might
require divinatory powers...).

2) Have/are any of you using R in papers in the biological sciences (specially
evolutionary biology, ecology, behavior)?

Thanks,

Ramon
#
RD> 2) Have/are any of you using R in papers in the biological
    RD>    sciences (specially evolutionary biology, ecology, behavior)?

I'm working with a group of microbiologists for using it (for viral
evolution).

Papers won't be published for a bit, though, since it's only in the
beginning...

best,
-tony
#
Ramon> Dear All, With a colleague we are writing a paper where we show
    Ramon> how R is a very nice tool to deal with some issues in the
    Ramon> analyses of data in evolutionary biology.  For the intro, I
    Ramon> wonder if

    Ramon> 1) Anybody has any rough idea of how many people might be using
    Ramon> R or how many people have downloaded R, or similar (I am aware
    Ramon> answering this question might require divinatory powers...).

Without having such powers, I can report how the mailing lists look like :

    % cat r-announce r-help r-devel |sort|uniq|wc -l
    911
	(w/o sort|uniq  it's 1140);  
	r-help alone has 635

which indicates that 911 different e-mail addresses are subscribed to
the R mailing lists (very few of these are mailing lists; 
		     however, also quite a few will be pointing to the same
		     person)

Now, the mailing lists probably contain almost no undergraduate students,
and these *are* using R at least in many of the courses...

Further, graduate students and scientific staff in many organizations use R,
but only the more "aficionados" among them are subscribed to an R list.
[factor of 3 ?]

Other guesses?

[Then what would "uses R" mean at all?
 o  >= 1 hour per week ?
 o  (one of) your major tool(s) for statistical data analysis?
 o  ??
]

----
{Now some musings, don't take me too seriously: I'm tired, it's hot, ....}

At one moment in time I had dreamed of putting a "feature" into R which
`registered' a user automatically when using R for the first time
(by sending an e-mail to some "R counter" on the internet);
but even if for a good purpose, it feels too much like "Big Brother" 
and "Virus/Worm"like behavior.. (and looks painful for me as MS-ignorant
 to work on a home Win-PC which only occasionally connects to the Net).

Linux has (had) an optional Linux Users counter, via nice web interface;
however I think it had never reached a state where it counted more than a
tiny fraction of users....

Martin
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 20 Jun 2000, Martin Maechler wrote:

            
Yeah, but as long as you let the users know, it shouldn't be that bad. It
might also be worthwhile to have a look at the Pine First-use statistics:
<URL:http://staff.washington.edu/corey/pine-stats/>

Best,

Kjetil
#
KK> Yeah, but as long as you let the users know, it shouldn't be
    KK> that bad. It might also be worthwhile to have a look at the
    KK> Pine First-use statistics:
    KK> <URL:http://staff.washington.edu/corey/pine-stats/>

That's semi-biased data.  I can't count the number of times I end up
punching the wrong keys when I've "got to" send mail using pine, and end
up registering myself yet again...  over the years, it must be in the
low 40s, by now...  (I usually nuke the pine configuration files in my
directory by force of habit).
#
Martin Maechler <maechler at stat.math.ethz.ch> writes:
Robert once guesstimated 10000 users, which would mean that roughly
one in 10 signs up for mailing lists. That could well be the case, the
mailing list traffic seems comparable to early days of s-news.
<We could also count the "sold items" since we're on both SuSE and
RedHat CDs (and Debian but do their sales get counted?). Of course one
thing is buying a program another is using it, but hey!, has that ever
stopped others?>
Estimated 1% it seems. It's still there (and has me as #4115 out of
148315) at counter.li.org.
#
On 20 Jun 2000, Peter Dalgaard BSA wrote:

            
Sounds plausible, and the MathSoft estimated ratio is more like 30 (but
then their user base will have a more hierarchical support structure).
As I don't know who here is signed up to R-help (but I do for S-news)
I am guessing a bit, but I'd say we had a 20:1 ratio (at uses R for several
hours per year) in my dept for each R and S.
#
On Tue, Jun 20, 2000 at 06:41:39PM +0200, Martin Maechler wrote:
My 0.01Euro.

  I have checked one moment ago. From my department, there are 
  2 subscriptions to r-announce and 1 (me) to r-help and r-devel.
  Thinking to the users, my estimate are
  (a) staff and post-graduate students: about 20
  (b) under-graduate students: about 300 (many of these (50 to 100?) 
      can be regarded as 'regular users', e.g., R installed on their
      home computer and used not only for the courses for which
      R is required).

  Looking to the subscriptions from other Stats. department in Italy,
  I suspect that our ratio between subscribers and users is one of
  the highest.

  guido




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
OK - my 0.01UKP's worth ...

As statisticians we should above all be able to make a good estimate (with
confidence intervals) of the R use.  I see three methods for sizing:

1 Taking a sample from the list, which will itself be biased (probably a
self-selected and therefore biased further) and asking these people to
estimate/count how many users there are,

2 In a future version, storing automatically in /usr/lib/R or wherever a
list of the users the first time they use R (ie when .R is set up).  This
number of unique entries on this list can then be requested via an email
to the installers email address (requested at download time).

3 A snowball sample, starting with the present 911 list members, where
people indicate (a) their applications area, (b) platform, (c...y) other
information and (z) nominate other users by email address, returning this
to a dedicated list address. The information is processed automatically,
checked against already known addresses and any new addresses emailed with
the questionnaire.

Method (1) seems to be under discussion at the moment, method (2) would
only asymptote as people downloaded the new versions but is simplest and
method (3) could provide useful further information re expert users etc
since it would be from the actual users rather than possibly the sysadmin
person.

This is not only of academic interest but could be used if necessary when
looking for commercial sponsors, equipment, grants etc.  The email
addresses should of course be kept private - we wouldn't want them
escaping into Outlook Express or a commercial list.

John

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Just another random datapoint: Debian has a little known package called
popularity-contest. When this optional package is present, and if the survey
participation is enabled, a list of installed packages is emailed out. [1] 
The aggregated results are on http://www.debian.org/~apenwarr/popcon/

As of last night, it reported 735 participating hosts. Of these, 35 used R.
(Select the 'math' section to fin the r-base package.)

With a few brave assumptions, [2] we get a guestimate of 107,000 R users on
Linux alone. [3] 

Dirk


[1] This is as anonymous as it can be given the constraints. A random md5sum
hash is used to distinguish between the participating hosts and mail-headers
are dropped as soon as possible. 

[2] Let's assume that the ratio estimate is not biased and that Debian has
15% of the Linux installations which itself now stand at 15 million users.

[3] Mind the ~/.signature taken from fortune(1). I did say brave assumptions.
#
Useful info from Dirk but this measures number of R Linux installations,
not users.  If 100K is an approximate number of R Linux installations,
then the likely number of Linux R users could be somewhat more.  In my
case, I have one installation and one user since only I use it but I am
sure that there are many cases with 2, 10, 100 etc users per installation.

In addition, as Dirk says, there will be a substantial number of non-Linux
users on other Unix platforms as well as Windows.

Anyone care to make a guess?  I would estimate that there are 1/2 million
actual R users but this could be out by a sd */2 at least.  

John
On Wed, 21 Jun 2000, Dirk Eddelbuettel wrote:

            
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Wed, 21 Jun 2000 j.logsdon at lancaster.ac.uk wrote:

            
[...]

I guess that there also are some cases with several installations per
user; I have seven installations; one sparc-linux, three i386-linux,
and three Windows! On only five machines, though.

G?ran

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Wed, 21 Jun 2000 j.logsdon at lancaster.ac.uk wrote:

            
Also cases with fewer than 1 user, ie "Hmm. R looks like an interesting
package. Maybe I should install it in case I need to do some statistics
some time."
or
"We're a math department, we want all of the mathematical packages"
or even people who install everything.

Obviously that last group will be overrepresented in the Debian popularity
contest.


	-thomas

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
OK, we're statisticians so let's use some real data and not only
guestimates ... in

	http://www.ci.tuwien.ac.at/~leisch/cran-http.report/

you find some usage statistics about the CRAN *master* site (with all
traffic inside our domain removed). Beware that not every hit is a
potential user as search engines (``crawlers'') heavily bias the log
files. It's alo only data on our server, no cran.(ch|dk|uk|us|...) or
statlib (with it's own mirrors). Also obviously all people using the
version from their linux distribution are missing),

But it's something to play with :-)

Have fun,
Fritz

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
What I forgot: FTP and rsync are also missing (webalizer seems not to
like their log files).

.f
FL> OK, we're statisticians so let's use some real data and not only
FL> guestimates ... in

FL> 	http://www.ci.tuwien.ac.at/~leisch/cran-http.report/

FL> you find some usage statistics about the CRAN *master* site (with all
FL> traffic inside our domain removed). Beware that not every hit is a
FL> potential user as search engines (``crawlers'') heavily bias the log
FL> files. It's alo only data on our server, no cran.(ch|dk|uk|us|...) or
FL> statlib (with it's own mirrors). Also obviously all people using the
FL> version from their linux distribution are missing),

FL> But it's something to play with :-)

FL> Have fun,
FL> Fritz

FL> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
FL> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
FL> Send "info", "help", or "[un]subscribe"
FL> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
FL> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Wed, Jun 21, 2000 at 08:36:23AM -0700, Thomas Lumley wrote:
Well, popularity-contest tries to be a little smarter and uses atime and
ctime as reported by find(1) to differentiate between 'used' and 'installed
but not used' as well as 'recently installed' software.

The full result for r-base is 35 'recently used', 49 'installed but not
used' and 8 'recently upgraded'. 

However, your critic might still be valid here as upon installation, the
postinst script uses perl to adjust the R_PAPERSIZE based on paperconf(1).
This mightprobably triggers the atime upon package upgrade. 

Dirk