Skip to content

R's documentation

11 messages · Kjetil Halvorsen, Romain Francois, Patrick Burns +5 more

#
Zheng, Xin (NIH) [C] wrote:
Please be specific about __how__ you are confused by __which__ documents 
and I am convinced you will get sarcasm-free replies.

Romain
#
Zheng, Xin (NIH) [C] wrote:
I absolutely agree.

This is an area where novices as well
as seasoned R users can give to the R
project.

If you find some documentation that is
confusing, then you can write a message
about it that states:

1) Precisely what you find confusing.

2) (optional but very helpful) A
proposed rewrite of the passage that
you think fixes the problem.


Once you attempt step 2 a few times, you
may come to appreciate the difficulty of
writing good documentation.

Help files are a particularly difficult
medium -- they need to be clear (to novices,
not just to the person who wrote the function),
and they need to be short enough that people
might read them.


Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of "The R Inferno" and "A Guide for the Unwilling S User")
#
Which version of R is this?  From the BUGS section in NEWS for R 2.9.0 
patched:

     o	median.default() was altered in 2.8.1 to use sum() rather
 	than mean(), although it was still documented to use mean().
 	This caused problems for POSIXt objects, for which mean() but
 	not sum() makes sense, so the change has been reverted.

So this was a bug, but it has already been fixed (see the comments in 
the posting guide and especially the FAQ about checking if things have 
already been changed).
On Fri, 29 May 2009, Zheng, Xin (NIH) [C] wrote:

            

  
    
#
On Fri, May 29, 2009 at 11:01 AM, Zheng, Xin (NIH) [C]
<zhengxin at mail.nih.gov> wrote:
The svn log shows when each file was updated and its pretty
clear that they are constantly being updated.
http://developer.r-project.org/R.svnlog.2009

There are third party introductions and many books available.
#
Thanks for your information. I am curious why in 2.8.1 the document was not changed. That's easier than writing something in NEWS. 

But I DO see 'sum' rather than 'mean' in 2.9.0 now. It's not reverted yet as the NEWS saying. And it's still documented to use 'mean'. 

-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk] 
Sent: Friday, May 29, 2009 12:28 PM
To: Zheng, Xin (NIH) [C]
Cc: r-help at r-project.org
Subject: Re: [R] R's documentation

Which version of R is this?  From the BUGS section in NEWS for R 2.9.0 
patched:

     o	median.default() was altered in 2.8.1 to use sum() rather
 	than mean(), although it was still documented to use mean().
 	This caused problems for POSIXt objects, for which mean() but
 	not sum() makes sense, so the change has been reverted.

So this was a bug, but it has already been fixed (see the comments in 
the posting guide and especially the FAQ about checking if things have 
already been changed).
On Fri, 29 May 2009, Zheng, Xin (NIH) [C] wrote:

            

  
    
#
On Fri, May 29, 2009 at 05:20:24PM +0100, Patrick Burns wrote:
I think that some kind of a glossary would be helpful.  Then I would know
whether certain words or phrases are R-specific or whether they come from
statistics, so I'd at least know where should I continue to dig further.

A text explaining how data frames *are meant to be used* would be helpful.
The intro to data frames is clear (collection of vectors of same length),
but it left me clueless about how functions interpret the data inside.  It
finally clicked for me when I was reading some intro about lattice graphics
and where I actually had to display the builtin data-set.  Such a basic
concept should be explained somewhere without the user needing to basically
reverse-engineer the concept.  In other words, the "Introduction to R"
should contain something about "long" and "wide" data formats.  Or at least
links to proper descriptions should be given (plyr, reshape packages).

Implicit conversions are vague.  If variable x is a factor, what does
x==8 do?   Convert 8 to string and compare to one of the levels of x?
Compare as.numeric(x) with 8?  Simple experiment reveals this, but
help("==") does not shed light on the issue. (".. or other objects
for which methods have been written.")  This raises a bunch of questions:
What kind of objects are there in R?  How do I find object's methods?
How do I find overload of == that compares factors and integers (or at
least HELP for a particular overload)?  The help on "==" is precise, but
utterly useless for somebody who does not already know 1) what == does,
and 2) all the other wider concepts mentioned in the help text.

[And so on.. this was just the example that was lately bothering me.  In
general, more cross-referencing between documentation topics might be helpful.
"SEE ALSO" is not sufficient; hyperlinking would be much more effective because
it hints at whether a topic is documented or not.]

I'm an experienced developer, yet it took me three months to go over from
5-dimensional arrays and fudging with apply() margins to "proper" use of
data-frames.  Had I needed somewhat simpler data manipulation or graphics, I
would have thrown out R out of the window, as I have many times before.   
Things *should not* be that way.  For an example of what I consider to be
well-structured documentation, please see

http://doc.qtsoftware.com/4.5/how-to-learn-qt.html

which made it possible for me to figure out reasonably quickly how to do what I
needed without the need for internet searches or asking on mailing lists.

[And so on, and so on.. I can only describe the help text as "opaque".  Reading
it feels like reading a foreign language that I'm not very proficient in.]
#
Zeljko Vrba wrote:
http://www.burns-stat.com/pages/Spoetry/glossary.pdf

may (partially) satisfy this part of your wishlist.

Pat
#
On May 30, 2009, at 2:01 AM, Zeljko Vrba wrote:

            
Actually the help page also says:
"If the two arguments are atomic vectors of different types, one is  
coerced to the type of the other, the (decreasing) order of precedence  
being character, complex, numeric, integer, logical and raw."

But the recurring puzzle of "what factors really are" is at work here:

 > x <- factor(8)
 > typeof(x)
[1] "integer"
 > as.integer(x)
[1] 1
 > x == 8
[1] TRUE
 > is.numeric(x)
[1] FALSE

 > typeof(x)
[1] "integer"
 > is.integer(x)
[1] FALSE

 > mode(x)
[1] "numeric"
 > is.numeric(x)
[1] FALSE
Actually objects have classes (which determine what functions can be  
applied), while functions have methods:
?class
?methods
I remember similar problems in getting to the point where I could use  
dataframes. In frustration I decided to construct a PowerPoint  
(actually a OO.o Presentation) that assembled the various accessor and  
constructor functions for the components of dataframes.
The usual advice is to get a book. In years gone by Venables and  
Ripley's MASS was the standard, but more recently Dalgaard and others  
have offered their best efforts at an "intro to R". My library  
includes MASS ed 2, Sarkar's Lattice, Spector's Data Manipulation with  
R, Harrell's RMS, Wood's "GAMs: An Intro with R".