Skip to content

The hidden costs of GPL software?

37 messages · Philippe GROSJEAN, Jan P. Smit, (Ted Harding) +17 more

Messages 1–25 of 37

#
Hello,

In the latest 'Scientific Computing World' magazine (issue 78, p. 22), there
is a review on free statistical software by Felix Grant ("doesn't have to
pay good money to obtain good statistics software"). As far as I know, this
is the first time that R is even mentioned in this magazine, given that it
usually discuss commercial products.

In this article, the analysis of R is interesting. It is admitted that R is
a great software with lots of potentials, but: "All in all, R was a good
lesson in the price that may have to be paid for free software: I spent many
hours relearning some quite basic things taken for granted in the commercial
package." Those basic things are releated with data import, obtention of
basic plots, etc... with a claim for a missing more intuitive GUI in order
to smooth a little bit the learning curve.

There are several R GUI projects ongoing, but these are progressing very
slowly. The main reason is, I believe, that a relatively low number of
programmers working on R are interested by this field. Most people wanting
such a GUI are basic user that do not (cannot) contribute... And if they
eventually become more knowledgeable, they tend to have other interests.

So, is this analysis correct: are there hidden costs for free software like
R in the time required to learn it? At least currently, for the people I
know (biologists, ecologists, oceanographers, ...), this is perfectly true.
This is even an insurmountable barrier for many of them I know, and they
have given up (they come back to Statistica, Systat, or S-PLUS using
exclusively functions they can reach through menus/dialog boxes).

Of course, the solution is to have a decent GUI for R, but this is a lot of
work, and I wonder if the intrinsic mechanism of GPL is not working against
such a development (leading to a very low pool of programmers actively
involved in the elaboration of such a GUI, in comparison to the very large
pool of competent developers working on R itself).

Do not misunderstand me: I don't give up with my GUI project, I am just
wondering if there is a general, ineluctable mechanism that leads to the
current R / R GUI situation as it stands,... and consequently to a "general
rule" that there are indeed most of the time "hidden costs" in free
software, due to the larger time required to learn it. I am sure there are
counter-examples, however, my feeling is that, for Linux, Apache, etc... the
GUI (if there is one) is often a way back in comparison to the potentials in
the software, leading to a steep learning curve in order to use all these
features.

I would be interested by your impressions and ideas on this topic.

Best regards,

Philippe Grosjean  

..............................................<??}))><........
 ) ) ) ) )
( ( ( ( (    Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (    Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons-Hainaut University, Pentagone
( ( ( ( (    Academie Universitaire Wallonie-Bruxelles
 ) ) ) ) )   6, av du Champ de Mars, 7000 Mons, Belgium  
( ( ( ( (       
 ) ) ) ) )   phone: + 32.65.37.34.97, fax: + 32.65.37.33.12
( ( ( ( (    email: Philippe.Grosjean at umh.ac.be
 ) ) ) ) )      
( ( ( ( (    web:   http://www.umh.ac.be/~econum
 ) ) ) ) )
..............................................................
#
Dear Phillippe,

Very interesting. The URL of the article is 
http://www.scientific-computing.com/scwsepoct04free_statistics.html.

Best regards,

Jan Smit
Philippe Grosjean wrote:
#
On 17-Nov-04 Philippe Grosjean wrote:
Hi Philippe,
Thanks for a most interesting post on this question. Further
comments below. Felix Grant's article is excellent, and well
balanced.
It would better represent the balanced view of the article
to further quote:

  "In fact, the whole file menu in R looks either elegantly
   uncluttered of frightenly obscure, depending on your point
   of view."

  "It [the effort of learning] is the price paid, just as the
   dollars or euros for a commercial package would be. For
   that price, I've learned a great deal -- and nor only
   about R. And I shall remember it when I next have to find
   a heavyweight solution for a big problem presented by a
   small charitable client with an invisible budget. It's a
   huge, awe-inspiring package -- easier to perceive as such
   because the power is not hidden beneath a cosmetic veneer."

This last remark is, in my view, particularly significant.
See below.
Non-GUI vs GUI is not intrinsically linked to Free Software
as such. There are well-known FS programs which are essentially
GUI-based -- as an easy example, consider all the FS Web
Browsers such as Netscape, Mozilla, ... . If you want the
graphics experiences offered by the Web, you're in a graphics
screen anyway, and so it may as well be programmed around
a GUI. Others, such as OpenOffice, have deliberately built
on a GUI approach in order to emulate The Other Thing.

There are a lot of FS programs which offer a GUI, usually
somewhat on the basic side, which nonetheless encapsulates
the entire functionality of the program and saves the user
the task of composing a possibly complex command-line or
even a script.

The comment "hidden beneath a cosmetic veneer" is, in my
view, somewhat directly linked to commercial software.
If you sell software, you want a big market. So you want
to include the people who will never learn how to work
software from a command line; and the sweeter the taste of
the eye candy, the more such people will feel enjoyment
in using the software. The fact that their usage is limited
to what has been pre-programmed into the menus is not going
to affect many such people, since typically their useage
is limited to a very small subset of what is in fact possible.
This in turn leads, of course, to the phenomenon of
"software-driven analysis", where people only do what the
GUI allows (or, more precisely, easily allows); and this
leads on in turn to a culture in which people tend to believe
that Statistics is what they can do with a particular
software package.

S-Plus does its best to compromise: as well as GUI access
to a pretty wide range of functions, there is the Command
Line Window where the user can explicitly type in commands.
(I dare say many R users, in S-Plus, may tend to work in
the latter since they are already used to it.) But, as always
in a GUI, one can tend to get lost in the ramifications.
Also, things like the big arrays of tiny icons you get when
you click on the "2D Plots" or "3D Plots" buttons in the
S-Plus toolbar can be trying on the eyes and time-consuming
to pick through.
Often, I think, in the Free Software world, people get involved
because they want to produce something which achieves a task.
Once they have a program which does that, then their aim is
satisfied. The GUI, in many cases, would be additional work
which would add nothing to what the software can do in terms
of tasks to be achieved. So in such cases, yes, I would tend
to agree that there is an intrinsic mechanism that discourages
work on a GUI for its own sake. You can add to that the fact
that once a developer has got to the point of creating such
software, successful in the tasks, they may have got beyond
the point at which they can readily sympathise with users who
have not acquired such skills: they no longer perceive, from
their own experience, that there is a problem.

However, this leaves people like you, having colleagues who
"come back to Statistica, Systat, or S-PLUS using exclusively
functions they can reach through menus/dialog boxes." By this
experience, you are aware of the problem, and rightly feel
that they would be helped by having access to the sort of
GUI/Menu interface that they are used to using.

One genuine benefit that the GUI offers, especially to
beginners with a particular software package, is that the
resources of the software can perhaps more easily and rapidly
be explored through the GUI, rather than searching laboriously
through the documentation of functions, extra packages, and
so on. This means that they more readily come to perceive
what is available though of course this is limited to what
the GUI will show them. But a good "Help" window can break
that barrier.

Perhaps R itself is less helpful than it might be in this
respect. The R-help list bristles with queries of the form
"How can I do X?", which I think is evidence of a problem.
While some of these queries clearly originate from people
who have taken no trouble to explore readily accessible
information, many others can not be so easily dismissed.

If you know something about what you're after, once you
realise that a judiciously formulated "help.search" can
throw up a lot of possibilities you are well on your way.
So, for instance (as in a recent query about 2-D Fourier
transform for spatial data) 'help.search("fourier")' gives
relevant information.

This, though, still fails for information in packages which
you have not installed. Perhaps I'm about to reveal my own
culpable ignorance here, but I'm not aware of a "full R info"
package which would be installed as part of R-base, being
a database of info about R-base itself and also every current
additional package, such that a "help.search" would show
all resources -- including those not installed -- which
match a query (and flag the non-installed ones as such so
that the user knows what to install for a particular purpose).

Whether this needs to be supplemented by a GUI is a point
that could be discussed from several points of view.
Philippe's biological/oceanographic users no doubt would
be considerably helped, provided they can in due course
come to the point where they can start to work "beyond
the GUI" (if indeed they need to).

Personally, however, I find that GUI work is slower and
more error-prone than command-line work. Swanning the
mouse around the screen, visually idebtifying icons and
buttons, clicking on this and that in order to see whether
it's what you want, and so on, is much more time-consuming
than typiing in a command.
And God help you if you accidentally click on something
destructive!

I'll close with an immortal quotation (from Charles Curran,
of the UK Unix Users Group):

  "I can touch-type, but I can't touch-mouse"

Best wishes to all,
Ted.
/\
                                                 /   |
  .............................<??}))><........  :)    >=---
                                                 \   |
                                                   \/

Best wishes to all,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 17-Nov-04                                       Time: 12:34:31
------------------------------ XFMail ------------------------------
#
On 11/17/04 12:34, Ted Harding wrote:
This is one of the purpose of my R search page.  I have all
packages installed.  You can also search the help list, etc., in
the same search.  Some people have bookmarks for it.  Of course
you need to be connected to the internet.

I think that any attempt to replicate this for a single user, or
even the packages, would be difficult.

BUT, it might help to install just the help pages for all
packages, without the packages themselves.  Then help.search()
would find things.  (I have no interest in figuring out how to do
this, but maybe someone else does.)

Jon
#
I'm a big advocate -- perhaps even fanatic -- of  making R easier for
novices in order to spread its use, but I'm not convinced that  a GUI
(at least in the traditional form) is the most valuable approach.

Perhaps an overly harsh summary of some of Ted Harding's statements
is: You can make a truck easier to get into by taking off the wheels, but
that doesn't make it more useful.

In terms of GUIs, I think what R should focus on is the ability for  user's
to make their own specialized GUI.  So that a knowledgeable programmer
at an installation can create a system that is easy for unsophisticated
users for the limited number of tasks that are to be done.  The ultimate
users may not even need to know that R exists.

I think Ted Harding was on  the mark when he said that it is the help
system that needs enhancement.  I can imagine a system that gets the
user to the right function and then helps fill in the arguments; all of the
time pointing them towards the command line rather than away from
it.

The author of the referenced article highlighted some hidden costs of R,
but did not highlight the hidden benefits (because they were hidden from
him).  A big benefit of R is all of the bugs that aren't in it (which may or
may not be due to its free status).

Patrick Burns

Burns Statistics
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
Jan P. Smit wrote:

            
[ ...]
#
On Wed, 17 Nov 2004 14:27:49 +0000, Patrick Burns
<pburns at pburns.seanet.com> wrote :
I think there is (slow) movement towards that.  Certainly it's
possible now (you can add menus to Rgui in Windows, you can do nice
things like Rcmdr using TCL/TK on any platform).   However, designing
a nice GUI is very hard work.
That would be helpful, and the only really difficult part would be the
first part:  getting the user to the right function.  help.search()
sometimes works, but often people ask for the wrong thing.

After that, R knows a lot about the structure of its help files, so it
could display all of the arguments with their defaults and the help
text that corresponds to each argument, as well as the help text for
the rest of the help file.

Probably the main obstacle to getting this is finding someone with the
time and interest to do it.

Duncan Murdoch
#
All:

I have much enjoyed the discussion. Thanks to all who have contibuted.

Two quick comments:

1. The problem of designing a GUI to make R's functionality more accessible
is, I believe just one component of the larger issue of making
statistical/data analysis functionality available to those who need to use
it but do not have sufficient understanding and background to do so
properly. I certainly include myself in this category in many circumstances.
A willingness and commitment to learning ( = hard work!) is the only
rational solution here, and saying that one doesn't have the time really
doesn't cut it for me. Ditto for R language functionality?

2. However, R has many attractive features for data manipulation and
graphics that make it attractive for common tasks that are now done most
frequently with (ugh!) Excel (NOT Statistica, Systat, et. al.). For this
subset of R's functionality a GUI would be attractive. However, writing a
good GUI for graphing that even begins to take advantage of R's flexibility
and power in this arena is an enormous -- perhaps an impossible -- task.
Witness the S-Plus graphics GUI, which I think is truly awful (and appears
to thwart more than it helps, at least from many of the queries one sees on
that news list). So I'm not sanguine.

Again, thanks to all for a thoughful and enjoyable discussion.

-- Bert Gunter
Genentech Non-Clinical Statistics
South San Francisco, CA
 
"The business of the statistician is to catalyze the scientific learning
process."  - George E. P. Box
#
I agree with Bert.  Thanks to all who contributed.  I'd like to 
add one comment I didn't see in the thread so far: 

      The corporate legal where I work is deathly afraid of the GNU 
General Public License (GPL), because if we touch GPL software 
inappropriately with our commercial software, our copyrights are 
replaced by the GPL.  This in turn means we can't charge royalties, 
which means we can't repay the investors who covered our initial 
development costs, and we file for bankruptcy.  The rabid capitalists 
meet the rabid socialists and walk away, shaking their heads.  (Sec. 2.b 
of the GPL:  "You must cause any work that you distribute or publish, 
that in whole or in part contains or is derived from the Program or any 
part thereof, to be licensed as a whole at no charge to all third 
parties under the terms of this License."  We can get around this by 
packaging accesses to GPL software as separately installed add-on(s), 
because then only the add-on(s) would be covered by the GPL.)  Our 
corporate legal is more concerned about a possible law suit from a 
possible competitor than from the R Foundation, but the threat is still 
real and still being adjudicated in other cases. 

      If the GPL were not so tight on this point, someone could 
commercialize a GUI for R without having to offer their source code 
under the GPL. 

      However, even without this change, R seems to be the platform of 
choice for new statistical algorithm development by a growing portion of 
the international scientific community.  Moreover, from my experience 
with this listserve, the technical support here is far superior to 
anything I've experienced with any other software in the 40+ years since 
I wrote my first Fortran code. 

      Best Wishes,
      spencer graves
Berton Gunter wrote:

            

  
    
#
This has been an interesting discussion. I make the following comment with 
hesitation, since I have neither the time nor the ability to implement it 
myself.

Using CLI software, an infrequent user has trouble remembering the known 
functions needed and trouble finding new ones (especially as that user gets 
older).  What might help is an added help facility more oriented towards 
tasks, rather than structured around functions or packages.

Such a help facility might have a tree structure.

Want help?  Are you looking for information on (1) data manipulation or (2) 
analysis?  If (1), do you want to to (3) import or export data, (4) 
transform data, (5) reshape data, or (6) select data?  If (2), do you want 
to (7) fit a model or (8) make a graph?  And so on....

Once appropriate function(s) are located, the user would be directed (by 
hyperlinks) to the existing help framework.

That could help the problem of knowing what you want to do, but not what it 
is called.  I think that "Introductory Statistics with R" is a step in that 
direction for the basics, as MASS is for more complex matters.  The 
question is whether such material can be incorporated into a help system 
that will allow users to find, more easily, what they need.  That largely 
depends, it seems to me, on a great deal of work by volunteers.

I agree also with the suggestion that a dedicated editor (or add-in) that 
could supply arguments for functions might be considerable help.

MHP
#
Thank you all (+ a couple of offline comments) on this topic.
To summarize your comments:

- "Hidden" costs, may be better called "indirect" costs are not so easy to
calculate. In the cited paper
http://www.scientific-computing.com/scwsepoct04free_statistics.html, there
is an interesting advice from a people used to test and wrote about
commercial software. Indeed, the whole context around the use of a
(statistical) software should be taken into account, which would reveil also
indirect costs for commercial packages. Indeed, it is the Total Cost of
Ownership (TCO) that should be better considered in this context.

- This discussion is connected with the many discussions pro/cons for a R
GUI, or any other tool that will facilitate use of R, but loosing one big
advantage: currently, you have to know what you are doing to get a result
with R... What kind of nonsenses would we get from naive people if they can
obtain results with no, or little knowledge?

- R is viewed by some as a statistical development platform, mainly for the
scientific community. It excels there, but, is it even desirable to get it
also used "by the mass"?

- ***Many of you claim for a better help system to find a function more
easily, than for a GUI***. I think this point is very important and should
be placed somewhere high in the "to do" list in order to make R more
accessible to beginners/occasional users!

- There is no possibility to make a commercial GUI for R (thanks to the
GPL), and volunteer R developers tend to work on a problem until they get
the solution they need... And this rarely lead to the development of a GUI
on top of it, conserning statistical analyses. In this way, yes, there is an
intrinsic mechanism that makes R a program by experts, for experts.

- A GUI could cover only the bare essentials, is rather unflexible, etc...
For all these reasons, how would it help to learn such a feature-rich
environment as R? This is not the solution to the problem.

- It is more a question of education: it takes so much time to find a
function in a menu/dialog box, than to consult help pages to find the right
function. However, some categories of people are more accustomished to click
and drag that to read help pages!

- GUIs, by providing access to a limited amount of analyses in an inflexible
way, lead to the phenomenon of "software-driven analysis" where the way data
are analyzed is dependent on the software used.

- Only commercial software care about eye candy stuffs to get clients more
happy to use their software (and thus to sell more); "hidden beneath a
cosmetic veneer" in the original paper. R does not care, because there is
nothing to sell. So, as a consequence, you face the bare power, but sorry,
no eye candy!

- GUI work is slower and more error-prone... So, this should be considered
in the hidden costs AFTER the learning stage... in favor of R!

- "User-friendly" software tend to make a lot of assumptions (to present the
analysis in an easier way), and does not tell about it. These could lead to
nonsenses in some case, and the user even don't know, precisely because
these assumptions are not explained!

- The author of the paper talks about hidden costs, but he does not talk
about hidden benefits, because he does even not notice them: ***all the bugs
that aren't in it*** (I add: transparence in code + possibility for everyone
to propose a patch = a big part of the success of Open Source software,
especially for data analysis software)!

That's all, I think, for the summary!

Otherwise:
Patrick Burns <pburns at pburns.seanet.com> wrote :
Duncan Murdoch [murdoch at stats.uwo.ca] answered:
Humm, excuse me, but I think that SciViews and JGR *already* do that,... So
it appears that at least two people already spend their time and got their
interest focused on this topic. Also, functions for such purposes will be
added to the R GUI API... Meaning they will be available for a wider use.
And I am close to a solution under Windows where hitting a combination of
keys in ANY program will display a function tip with arguments, or a
contextual completion list for R code.

Finally:

It seems that a GUI for R is not just lacking, it is purposedly lacking...
And there are many argument in favor of this lack. OK for most R users. But
could you, please, consider these examples:

1) I teach basic biostat with R/SciViews-R/R Commander. It is a frank
success and almost all my students install it on their computer and start
using it...
So, the next year, I teach them an advanced biostat course with R. I decide
to give up with the GUI and to present analyses like PCA, MDS, LDA,
clustering, etc... directly in R. For each analysis, I make a small script
(10 lines or so), I explain it and show them how it works and how they can
edit it to analyze other data. It is a fiasco! It seems that a psychological
barrier induced by this unfamiliar object (the script) tends to obscure
everything in the mind of my students. I got returns in this way: most of
the students that started to use R seem disgusted after this second course,
and they switch back to another software with a GUI! When I ask them, they
say: SciViews-R/R commander is nice but limited to simple analyses. For
other analyses, the R scripts are just too complex for me, so I prefer to
use a different software.

2) Second case: I write an original analysis and I want to make it widely
available for oceanographers. Most of them do not want, and will never learn
the S language. They obviously need a simple and easy GUI on top of my R
function, because they want to run the analysis without knowing all the
details...

Obviously, these are concrete examples where a GUI should be a benefit...
unless one consider that R should be restricted to experts only!

Best regards,

Philippe Grosjean

..............................................<??}))><........
 ) ) ) ) )
( ( ( ( (    Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (    Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons-Hainaut University, Pentagone
( ( ( ( (    Academie Universitaire Wallonie-Bruxelles
 ) ) ) ) )   6, av du Champ de Mars, 7000 Mons, Belgium  
( ( ( ( (       
 ) ) ) ) )   phone: + 32.65.37.34.97, fax: + 32.65.37.33.12
( ( ( ( (    email: Philippe.Grosjean at umh.ac.be
 ) ) ) ) )      
( ( ( ( (    web:   http://www.umh.ac.be/~econum
 ) ) ) ) )
..............................................................
#
Patrick Burns wrote:
I really agree with you Patrick.  To me the keys are having better help 
search capabilities, linking help files to case studies or at least 
detailed examples, having a navigator by keywords (a rudimentary one is 
at http://biostat.mc.vanderbilt.edu/s/finder/finder.html), having a 
great library of examples keyed by statistical goals (a la BUGS examples 
guides), and having a menu-driven skeleton code generator that gives 
beginners a starting script to edit to use their variable names, etc. 
Also I think we need a discussion board that has a better "memory" for 
new users, like some of the user forums currently on the web, or using a 
wiki.

Frank
#
Hopefully my experience with R may add something to this discussion.

I majored in computer science in 1983, with minors in mathematics and
statistics.  As this was in the days when computers were largely big
centralised boxes with remote terminals, I didn't get to use computers
for stats while I was at uni.

Fast forward to a couple of years ago, and I've got to start "doing
statistics" on the computer for the type of work I now do.  A friend
pointed me to R, so off I went.  Between 1983 and then, I did a lot of
development, testing, documentation, management, troubleshooting, etc
work, so I think it's fair to say that, while my statistics knowledge
needed a top up, my computing background was very strong.

As of today, after approx 2 years of using R for relatively ad-hoc
tasks every few weeks, here's my thoughts about it:
- it's extremely powerful and well-maintained; kudos to everyone involved
- it's extremely concise; you can do a huge amount of work in very few
lines of code
- provided a particular task is close to one I've already done before,
using R I can extract info from a set of data at an amazing rate. 
Tasks that would take me an hour or so with another programming
language or toolset, may take me under a minute using R (obviously
depending on the size of the dataset)

Problems arise whenever I need to step outside my existing R knowledge
base, and use a feature or function that I haven't used before:
- the help documentation in general desperately needs work,
particularly the examples.  My thinking is that examples should pretty
much lead you through a trivial exercise using the tool being
discussed.  This is very rarely the case with R, and the examples seem
to assume you fully understand how e.g. a library works and just need
a simple reminder of the syntax.  For the purposes of comparison,
compare the documentation that comes with the Perl language; even if
you don't know what a function or keyword does, you can pretty much
read through the given examples and work it out without difficulty
- the GUI is pretty much just a working area on the screen; it's just
not "helpful".  It would probably be reasonably simple to add menu or
toolbar options to help a user identify how they can actually achieve
a particular task in R (e.g. select a function from a drop-down list,
and get one-liner documentation about what it does), but that hasn't
been done.  Many of the questions asked on this list (which are often
answered with "RTFM") are of the nature "I've got this conceptually
simple task to do, but I can't find out how to do it using R.  Please
help"; this is gratifying to me personally, since I frequently
encounter the same problem.  These issues are extremely frustrating,
as you often know the answer will be a one-liner but you may struggle
for hours or days trying to find it

As I said above, once you understand how to do a particular task in R,
you can leverage that knowledge to do similar tasks amazingly quickly;
the productivity that comes with using R in this context is
incredible.  However, that productivity tends to disappear when you
need to take even a small step outside your existing R knowledge base.

Now maybe I'm the only occasional R user out here, and everyone else
is using it 8 hours a day and acquired my 2 years' worth of knowledge
in their first week of use.  I doubt that is actually the case, and
the rest of us could really do with some help from the GUI.

Finally, please don't think I don't appreciate the mass of effort
required to get R to its current state.  I do, and it's made my life a
lot easier than it would otherwise have been.

Regards

Dave Mitchell
#
On 17 Nov 2004, at 2:27 pm, Patrick Burns wrote:

            
I think this is spot on.  My situation is that I am a scientist turned 
system administrator, and R is a package which I am increasingly being 
asked to install for the use of scientists at this Institute.  I am by 
no means a statistician;  the statistics I learned in A-level maths 
almost 20 years ago were as far as I got, and most of that I have 
forgotten.  But I like to have some understanding of the software 
packages I am asked to support, so I've been looking at R with a view 
to learning some of its more basic functions.  It looks potentially 
very useful to me anyway for summarising activity on the supercomputing 
cluster that I run.

So I'm a newbie to R, armed with only a very basic knowledge of 
statistics (I know the difference between a Normal and a Poisson 
distribution at least, and with a bit of prodding could probably 
remember a binomial distribution too).  I'm an experienced programmer 
in several languages, and a PhD-level scientist.

And yet I have still found R really quite hard to learn, and this is 
principally because the on-line help is a reference manual.  I'm sure 
it's a fabulous resource if you're a statistician who uses R every day, 
but for me it's not very helpful.

The R Intro PDF is good, but it would be nice if it were integrated 
better, with hyperlinks to the reference documentation, or to other 
parts of the introduction, for those platforms that support such things 
(it looks like this was intended for MacOS X, which is the version I am 
playing with for my own use, although the version I maintain for users 
is on Linux [ and would be on Alpha/Tru64 too if I could get it to pass 
its tests ]) but the on-line help link to the Intro on the Aqua R 
version brings up a blank page, so I'm using the generic PDF document 
instead.

I think the GUI question has nothing to do with the hidden costs of the 
GPL, or otherwise.  This is the age-old ease-of-use versus power and 
capability argument.

I don't think a fancy GUI is necessary - the GUI aspects that have been 
added to R on Mac OS X are sufficient.  I get the impression that the 
real power of R is the fact that really it's a programming language, 
and should probably be treated and learned as such.  Quite apart from 
the fact that a GUI will necessarily be a somewhat restricted subset of 
the total functionality, and a lot slower to use once you've taken the 
effort to learn the software, I think there is another danger, which I 
have already seen in other pieces of software in the bioinformatics 
community.  Users frequently run completely pointless analyses through 
the GUI wrappers we provide.  The users using the command line 
interfaces typically do much more sensible things.

If you make a piece of software trivial for a user to use without 
thinking about what they're doing, then the users won't think.  I may 
not know much about statistics, but what little I do know is that 
understanding exactly what form of analysis or significance test is 
required to be meaningful is a real skill that takes a lot of 
experience to master.   Having to perform that analysis with written 
commands means that your method is recorded, and could be published, 
and more importantly be checked and reproduced by other researchers.  
It also gives you ample time to think about what you're doing, rather 
than just bashing out a pretty graph which actually has no real meaning 
whatsoever.

Any GUI to R could (and should) be able to store the command line 
equivalent to what it has just done, to satisfy the reproducible 
criterion above, but I suspect it could still lead to some pretty 
shoddy work being done by careless and lazy scientists, and we get 
enough of that already.

Tim
#
Hmmmm, interesting thread and minds will not be
changed but regarding GUIs...I thought S (aka R) was a
PROGRAMMING LANGUAGE with a statistical and numerical
slant, and not a statistics application. ;O)  

Certainly there is an important place for GUIs but I
believe that it is very much overemphasized in modern
computer culture. My experience and bias--and I
started in the 1960's-- is that except for 'trivial'
uses, GUIs are a detriment to any reasonably complex
CREATIVE computational task. They are adequate for the
simple, common task. But even then, typing a command
or two is not overly taxing--- particularly when
compared to navigating layer upon layer of submenus as
is some times needed. If I need to, I will add a
little syntactical sugaring when coding and move on. 

GUIs encourage a passive approach to using computers
when solving problems. In addition, it is regretable
that a lot of people in the 'workplace' will carry out
incomplete and/or incorrect quantitative work because
of the real or perceived limitations of the particular
(GUI) apps they are using. There is no inclination to
go beyond the menu and even then many menu items
gather 'electronic dust'.

Finally, there are times for many of us when work
'goes home' at the end of the day. That just comes
with the territory. I (and most others) can not afford
the luxury of S-plus, Statistica, SPSS, etc. at home.
So in a sense there is a very real 'loss of
productivity' cost associated with using commercial
software. Now that does bring us around to supporting
R doesn't it? (Mea culpa. And I resolve to do better!)
What value does one put on the vitality of the R
community?

Best regards,
Michael Grant, Ph.D. 

* The requirements for creating packages are on
target,  and have the desired impact on both the
quality and breadth of R.
--- Philippe Grosjean <phgrosjean at sciviews.org> wrote:

            
#
On 18 Nov 2004, at 10:27 am, Tim Cutts wrote:

            
I should correct myself here, and note that there are some 
cross-references within the PDF document, it's not completely devoid of 
them.

Tim
#
Tim Cutts schrieb:
In that respect you should have a look at Emacs/XEmacs/ESS package. 
This package combines the power of command line and reproducibility of 
what has been done to generate graphs or whatever you like. Its also 
equipped with a nice ref-card-pdf which is very helpful to learn 
common shortcuts to increase your productivity levels. I wouldn't call 
ESS necessarily a GUI in a traditional sense, though.


When I started using R I was inclined to use the RCommander-GUI. After 
fiddling with this for a while I came to the conclusion that its 
possibilities are, at least for the moment, really limited. 
Furthermore some things increased my irritation levels, i.e. 
orientation to push the correct buttons to achieve a specific task. If 
I hit a false button I hardly wasn't able to find out what actually 
went wrong.

Nevertheless, for me as a beginner in GNU R, who never used S before, 
but primarily SPSS and BMDP in early times, it is a long way to gain 
some control of advanced aspects of using R. This is also true despite 
the fact that I took statistics courses for several years and do have 
experiences in research projects (social sciences and epidemiology), 
so I'll would agree that using GNU R has some hidden costs for me!

To sum up, what I am in need to is an extensive example based 
help-system, focused on how to do things in R. In parts this is 
already there, i.e. SimpleR from Verzani (contributed docs area) etc.

Hopefully I can  contribute to this in future, since it is seems to me 
invaluable to learn R by going through example-based lessons (some are 
found in vignette() ).
These are much more comprehensible to me than those short reference 
like entries in the current help-system, mostly due to their very 
technical approach (same is to be said about the official GNU R 
manuals, especially "The R Language", which wasn't a great help for me 
when I took my first look at GNU R). In this context something like 
the GuideMaps of Vista come to my mind!

But to be as clear as possible, I think GNU R is great and I 
appreciate all the efforts done by the R core team and associates!

Nevertheless it seems to be valuable to re-think the help-system in R 
with respect to those who may have a good understanding in statistics, 
but lacking some basic experiences in how to introduce themselves to 
sophisticated world of R/S languages.



Regards

Thomas
#
At 11/18/2004 07:01 AM Thursday, Thomas Sch??nhoff wrote:

            
(I posted similar material before, but it was moved to R-devel, and I 
wanted to express a bit of it here.)

I have frequently felt, like Thomas, that what could make R easier to use 
is not a GUI, but a help system more focused on tasks and examples, rather 
than on functions and packages.  This has obvious and large costs of 
development, and I am unlikely to contribute much myself, for reasons of 
time and ability.  Yet, I mention it for the sake of this discussion.

Such a help system could be a tree (or key) structure in which through 
making choices, the user's description of the desired task is gradually 
narrowed.  At the end of each twig of the tree would be a list of suggested 
functions for solving the problem, hyperlinked into the existing help 
system (which in many ways is outstanding and has evolved just as fast as R 
itself).  This could be coupled with the continued expansion of the number 
of examples in the help system.

Now I must express appreciation for what exists already that helps in this 
regard:  MASS (in its many editions), Introductory Statistics with R, 
Simple R, and the other free documentation that so many authors have 
generously provided.  Not to mention the superlative contribution of R 
itself, and the work of the R development team.  It is beyond my 
understanding how something so valuable and well thought out has been 
created by people with so many other responsibilities.

Mike
#
Hello,

I appreciate many comments and the various points of view, especially
because there are a couple of clear explanations why several people do not
need (or even do not want) a GUI for R!

Another part of the discussion seems to switch to the never-ending question
of "what kind of GUI"... which will never be answered, because there is not
one best GUI, and it also depends on the use (both the application and the
user). It's a long time I hesitate to propose in R-SIG-GUI + the R GUI
projects web site to place a description for one or several "prototype"
GUI(s) we would like for R, with the intention to include all the good ideas
everybody has in this list.

I never did that, because I am pretty sure it is useless! Now, I feel that
one guy, with a clear view of what he wants, a lot of free time, a lot of
energy, and some decent skills in programming, is actually required to make
real what he has in his head! Indeed, it is such a huge work that several
people are required! Here are the topics currently developed (sorry if I
don't cite Bioconductor stuff: I don't know it):

- Most of the "low-level" work is done, I think, like interface with
graphical toolkits: tcltk by Peter Dalgaard, of course, but many others
(Gtk, wxPython, ...), a better control of Rgui under Windows (ongoing,
Duncan Murdoch), ESS, ... All this is already available, even if one could
always argue that it is not optimal in some respects.

- A better console (multiple-lines editing, syntax coloring, code tip
presenting the syntax of a function when you type it, contextual completion
list, ...). This is ongoing project in both JGR and SciViews-R.

- A better table editor: RKward team.

- A classical menus/dialog box approach: John Fox's R commander,

- An object explorer: JGR, RKward, SciViews-R, experimental functions in R,

- A "plug-in" approach, that is, a piece of code that brings a GUI for a
targeted analysis and builds R code for you: RKward team, but also some
functions in svDialogs (part of the SciViews bundle, R GUI API),

- Interactive documents mixing formatted text, graphs, etc... with R
input/output: Rpad, Sweave (not interactive), and some other,

- Rich-formatted output of R objects (in/out, views, reporting,...): Eric
Lecoutre's R2HTML + SciViews-R,

- Code editor with interaction with R: Tinn-R, WinEdt, Emacs, and many
others, 

- IDE (humm, some code editors are not so far away from an IDE, but there is
still some lack here),

- A R GUI API: SciViews.

I hope all these projects will continue, will mature, and their developers
will ultimately realize that they provide complementary pieces of a giant
puzzle and start to work together. This is when it will become most
exciting! I hope also that it will result in an original GUI that keeps most
of the spirit of R, that is, not a simplified point&click UI, leading to
meaningless analyses by lazy people, but a real tool whose goal is to make R
easier and faster to learn for beginner, and pretty usable for occasional
users.

May be, I am just a dreamer, but all I read in this discussion reinforce my
conviction that an **innovative** GUI would be a good addition to R: most
criticisms clearly relate to the kind of inflexible GUI, with a forest of
menus and submenus, and other bad things one could find. I never, and will
never advocate for such a GUI!

For sure, the alternate GUI will only support you in writing R code, and
will deliver plenty of help to achieve this goal. I think it is possible...
with enough people collaborating in a common project! I think the later
point is really the problem: not enough people, too many projects! Is it a
consequence of the way R is developed (GPL)? Well, I think so, but only
partly. It is also the consequence of ego (everybody wants to be the leader
of his own project), and a lack of communication (R-SIG-GUI is not what one
would call an active list!) Or, may be, a "good GUI" for R is a fuzzy target
and it is not possible to cristallize enough power around a common goal: to
reach it!

Anyway, despite R GUI projects are progressing very slowly, I think only
when we would have a "good GUI" available for R, we would be able to
evaluate if there are really "hidden costs" in R, as Felix Grant suggests in
his paper.

Best regards and thank you all for your comments and suggestions.

Philippe Grosjean
#
On Thu, 2004-11-18 at 03:24 -0800, Michael Grant wrote:
"R is a language and environment for statistical computing and
graphics."


I think that this is a critical point and that there is, to my mind, a
false predicate at play here.

That predicate is that somehow one should be able to rapidly learn R (or
any programming language for that matter) solely via the available
online reference help or via the freely provided documentation (whether
via R Core or via Contributors).

How many people here have learned to use C, FORTRAN, SAS, VBA, Perl or
any other language strictly by using built-in reference help systems. If
any, it will be a very small proportion.

Sure, SAS comes with documentation that can be measured in hernia
inducing tonnage, but at a substantial annual cost, which I have
referenced here and elsewhere previously. R is free.

Is there anyone who has learned to code in C that does not have a copy
of K&R someplace on their shelf, probably along with copies of other
both general and application specific C references published by
Prentice-Hall, Addison-Wesley, McGraw-Hill or Hayden?

It has been years since I actively coded in C, but I have almost 3
shelves filled with C reference books. I have books dating back to the
early 80's for 80x86 Assembly, MS-DOS/BIOS interrupts and Windows API
technical references and other such books that I used to use on a daily
basis in a former life.

For Linux, I have two shelves filled with various O'Reilly and other
references running the gambit from general Linux stuff to Perl,
Procmail, Postfix, Bash, Regex, Emacs, Admin, Firewalls and others.

For R, I have most of a shelf filled with multiple references, including
three of the four editions of MASS (somehow I missed the 2nd edition). I
have a copy of Peter's ISwR (because on occasion I have an acute attack
of cerebral flatulence and have to go back to basics) along with copies
of Pinheiro & Bates, Fox, Maindonald & Braun, Krause & Olson, Everitt &
Rabe-Hesketh and V&R's S Programming. I have copies of the "White Book"
and the "Green Book" and I have copies of Harrell and Therneau &
Grambsch for specific applications of R.

There are a fair number of already published books on R/S with more
coming by Faraway, Heiberger & Holland, Verzani and others including a
new series from Springer.

My point being that the old philosophy of "No Pain, No Gain" is a
component of the learning curve with R. R is not going to be for
everybody. That's why there are other "point and click" statistical
_applications_ like JMP (albeit not cheap). They are relatively easy,
but at the same time, they are self-limiting. No single math/statistical
"product" is going to meet the needs of the entire spectrum of the
potential user space.

As I have mentioned previously, I am a firm believer in Pareto's 80/20
Rule. In this case, you develop a "product" to meet the needs of 80% of
your target user space, because you will go "bankrupt" meeting the needs
of the other 20%. Said differently, meeting the needs of the other 20%
will consume 80% of your development resources, restricting your ability
to meet the needs of the larger audience.

Having spent 12 years previously with a commercial medical software
company, I will also suggest that typically 20% of your user base will
consume 80% of your support resources.

I will also note that having been on both sides of that equation, the
support provided here within this community is superb and has no peer in
the commercial arena.

In R's case, the 80% of the user space has perhaps been extended by the
kind offerings of those who have made specialty packages available via
CRAN, BioC and others.

It takes a certain level of commitment and time with R to become
effective with it.

That commitment includes, in my mind, supplementing the available _free_
documentation that has kindly been provided by R Core and others, with
other available resources. That does not mean that everyone needs to get
on Amazon.com and spend hundreds of $YOUR_MONETARY_UNIT on books. Many
are available via libraries and/or other resources, especially for those
here in academic environments.

This is a community effort folks and not everything is going to be
provided to you free of charge, with that notion being either in actual
financial cost or time.

It appears that, since this is not the first time this subject has come
up, there is strong interest in building a c("new", "different",
"better", ...) documentation/help system for R. That's fine. For those
that have interest in pursuing this, perhaps the time has come for a
group to form a new r-sig-doc list and move forward with the development
of a framework for a new system that can be developed and implemented by
that same group and then provided back to the community. 

Writing technical and user documentation is a specialty skill set unto
itself and perhaps those with the requisite skill sets will contribute
them for the benefit of all.

For those that do not have the skills and/or the time to contribute, I
would urge you to financially contribute to the R Foundation in whatever
way you can afford. Through that mechanism you will support the
community at large and the future development and enhancement of R.

There is no "hidden cost" here and certainly not one that is unique to
GPL software. The cost is self-evident and it is measured in time and 
$YOUR_MONETARY_UNITs. "Time is money" as they say and that is the same
whether you are using GPL software or a commercial proprietary product. 

A key difference here if any, is that none of us have paid anything for
R, where a portion of that "revenue" would go to support a dedicated
documentation team. In this case, it is "If you want it, you will need
to design and build it."

Best regards,

Marc Schwartz
#
On 17-Nov-04 Patrick Burns wrote:
Yes, perhaps overly harsh ... but if you had said instead
"by deflating the tyres" then I think I'd agree that you were spot on!

Otherwise I agree with your other comments.

All best wishes,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861  [NB: New number!]
Date: 18-Nov-04                                       Time: 16:57:20
------------------------------ XFMail ------------------------------
#
On Wed, 17 Nov 2004, Mike Prager wrote:
...
...

Another good (non-GUI) tool for the CLI is keyword completion.  R in ESS
does this, giving you lists of possible functions, variables and objects,
or feedback if there isn't any.  R's CLI completes, but only with
filenames in the current directory.

Dave
#
Mike Prager wrote:
...

I second all of that.  What you are describing Mike could be done with 
a community-maintained wiki, with easy to add hyperlinks to other sites. 
  Just think what a great value it would be to the statistical community 
to have an ever-growing set of examples with all code and output, taking 
a cue from the BUGS examples guides.  The content could be broken down 
by major areas (data import examples, data manipulation examples, many 
analysis topics, many graphics topics, etc.).  Ultimately the more 
elaborate case studies could be peer-reviewied (a la the Journal of 
Statistical Software) and updated.

Frank
#
> To sum up, what I am in need to is an extensive example
    > based help-system, focused on how to do things in R. In
    > parts this is already there, i.e. SimpleR from Verzani
    > (contributed docs area) etc.

I have a nice set of extensive help with documentation sitting on
my shelf:

   - Peter Dalgaard. Introductory Statistics with R. Springer,
     2002. ISBN 0-387-9 

   - William N. Venables and Brian D. Ripley. Modern Applied
     Statistics with S. Fourth Edition. Springer, 2002. ISBN
     0-387-95457-0.  

   - Jose C. Pinheiro and Douglas M. Bates. Mixed-Effects Models
     in S and S-Plus. Springer, 2000. ISBN 0-387-98957-0.  

I suspect that I would have spent the money on these books even
if I'd started by spending money for S-plus, instead of R.  But
I've never seen the S-plus help system, so I may be wrong.

See http://www.r-project.org/doc/bib/R-publications.html and
http://www.r-project.org/doc/bib/R_bib.html for yet more.

Mike
#
On Thu, 18 Nov 2004, Frank E Harrell Jr wrote:
...
There is a wiki at http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl but it
doesn't seem to get much use.

Last time I was hunting for help on R, I made the page
http://fawn.unibw-hamburg.de/cgi-bin/Rwiki.pl?SearchFunctions
 and in particular:

help.search.archive<-function(string){
   RURL="http://www.google.com/u/newcastlemaths"
   RSearchURL=paste(RURL,"?q=",string,sep='')
   browseURL(RSearchURL)
   return(invisible(0))
 }

help.search.archive('wiki') # example

Dave