Validation of R

Fri, Apr 18, 2003 3:27 PM

Jim_Garrett at bd.com writes, quoting a software tester at BD.com:

Does she know how many such software testers are actively involved in
testing the accuracy/reliability of statistical procedures in SAS or
is she just assuming that there will be a large number.

People often assume that a commercial software company has legions of
programmers working on program development and testing and frequently
this is not the case.  In a typical software company there are many
more employees working on marketing, customer support, etc. than on
development and testing.

I remember when a person told me that they expected that MathSoft (now
Insightful) would have 'at least a dozen' people working on the
development of lme and nlme.  I knew that the actual number was 0
because Jos? Pinheiro and I wrote and contributed that code and
neither of us work for Insightful.

I'm sure that most informal guesses of the number of professional
software testers working on accuracy/reliability of statistical
procedures in SAS will be overestimates.

I'm surprised that in this discussion of validation no one has quoted
ideas from "The Cathedral and the Bazaar" by Eric Raymond
(http://www.catb.org/~esr/writings/).  He has some very perceptive
observations in that essay including the observation that bug
detection and fixing is one of the few aspects of software development
that can be parallelized (provided, of course, that those detecting
the bugs have access to the sources).  A succinct expression is that
"Given enough eyeballs, all bugs are shallow".

In that sense I think it could be said that there are a lot more
software testers working on R than on any other statistical software
system.

Another important consideration in assessing the reliability of open
source software is that the people who develop this software do so
because they are interested in it, not because it is "just a job".
This makes it much more likely that the person developing open source
software will work on getting it "right" and not just getting it ready
to ship out the door.  A person once asked me why the functions for
probability densities, cumulative distribution functions, and
quantiles in R were demonstrably better than those in commercial
software packages.  I said that it was because we had an unfair
advantage - they just have a bunch of programmers working on their
code and we have Martin (Maechler).  To the other programmers getting
good answers is a job requirement; to Martin getting the best possible
answer is a passion.

Validation of R

Thread (4 messages)