An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090529/9cc8f47b/attachment-0001.pl>
R's documentation
11 messages · Kjetil Halvorsen, Romain Francois, Patrick Burns +5 more
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090529/4a9b76bb/attachment-0001.pl>
Zheng, Xin (NIH) [C] wrote:
Sometimes I get confused with R's documentation. It seems the documents is not maintained and updated well. Anyone has similar feeling? I don't mean to offend anyone. I hope R would get better and better. But documentation is really one very important factor which could attract people coming or drive people away. Xin Zheng
Please be specific about __how__ you are confused by __which__ documents and I am convinced you will get sarcasm-free replies. Romain
Romain Francois Independent R Consultant +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20090529/76d89c86/attachment-0001.pl>
Zheng, Xin (NIH) [C] wrote:
Sometimes I get confused with R's documentation. It seems the documents is not maintained and updated well. Anyone has similar feeling? I don't mean to offend anyone. I hope R would get better and better. But documentation is really one very important factor which could attract people coming or drive people away.
I absolutely agree. This is an area where novices as well as seasoned R users can give to the R project. If you find some documentation that is confusing, then you can write a message about it that states: 1) Precisely what you find confusing. 2) (optional but very helpful) A proposed rewrite of the passage that you think fixes the problem. Once you attempt step 2 a few times, you may come to appreciate the difficulty of writing good documentation. Help files are a particularly difficult medium -- they need to be clear (to novices, not just to the person who wrote the function), and they need to be short enough that people might read them. Patrick Burns patrick at burns-stat.com +44 (0)20 8525 0696 http://www.burns-stat.com (home of "The R Inferno" and "A Guide for the Unwilling S User")
Xin Zheng [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Which version of R is this? From the BUGS section in NEWS for R 2.9.0
patched:
o median.default() was altered in 2.8.1 to use sum() rather
than mean(), although it was still documented to use mean().
This caused problems for POSIXt objects, for which mean() but
not sum() makes sense, so the change has been reverted.
So this was a bug, but it has already been fixed (see the comments in
the posting guide and especially the FAQ about checking if things have
already been changed).
On Fri, 29 May 2009, Zheng, Xin (NIH) [C] wrote:
Ok, pls take a look at '?median'. Can you see "However, the default
method makes use of 'sort' and 'mean'"? Then let's look at
'median.default'. I do see 'sort'. But where is 'mean'? At first
glance I didn't catch the exact point of its algorithm. Why not say
more clearly that "make use of partial sorting and calculating mean
of middle values".
Maybe I'm too strict. It's not a really bug. Let's stop this thread.
From: Kjetil Halvorsen [mailto:kjetilbrinchmannhalvorsen at gmail.com]
Sent: Friday, May 29, 2009 11:16 AM
To: Zheng, Xin (NIH) [C]
Cc: r-help at r-project.org
Subject: Re: [R] R's documentation
On Fri, May 29, 2009 at 11:01 AM, Zheng, Xin (NIH) [C] <zhengxin at mail.nih.gov<mailto:zhengxin at mail.nih.gov>> wrote:
Sometimes I get confused with R's documentation. It seems the documents is not maintained and updated well. Anyone has similar feeling? I don't mean to offend anyone. I hope R would get better and better. But documentation is really one very important factor which could attract people coming or drive people away.
??Do you have any specific example? Without examples it is difficult to say much.
??Do you refer to the core part of R, or to contributed packages?
My impression is that when some functions are changed, the documentation is generally
changed at the same time. And I am sure that is the goal, so if you know a case where it did not happen you should
file a bug report.
kjetil
Xin Zheng
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Fri, May 29, 2009 at 11:01 AM, Zheng, Xin (NIH) [C]
<zhengxin at mail.nih.gov> wrote:
Sometimes I get confused with R's documentation. It seems the documents is not maintained and updated well. Anyone has similar feeling? I don't mean to offend anyone. I hope R would get better and better. But documentation is really one very important factor which could attract people coming or drive people away.
The svn log shows when each file was updated and its pretty clear that they are constantly being updated. http://developer.r-project.org/R.svnlog.2009 There are third party introductions and many books available.
Thanks for your information. I am curious why in 2.8.1 the document was not changed. That's easier than writing something in NEWS.
But I DO see 'sum' rather than 'mean' in 2.9.0 now. It's not reverted yet as the NEWS saying. And it's still documented to use 'mean'.
-----Original Message-----
From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
Sent: Friday, May 29, 2009 12:28 PM
To: Zheng, Xin (NIH) [C]
Cc: r-help at r-project.org
Subject: Re: [R] R's documentation
Which version of R is this? From the BUGS section in NEWS for R 2.9.0
patched:
o median.default() was altered in 2.8.1 to use sum() rather
than mean(), although it was still documented to use mean().
This caused problems for POSIXt objects, for which mean() but
not sum() makes sense, so the change has been reverted.
So this was a bug, but it has already been fixed (see the comments in
the posting guide and especially the FAQ about checking if things have
already been changed).
On Fri, 29 May 2009, Zheng, Xin (NIH) [C] wrote:
Ok, pls take a look at '?median'. Can you see "However, the default
method makes use of 'sort' and 'mean'"? Then let's look at
'median.default'. I do see 'sort'. But where is 'mean'? At first
glance I didn't catch the exact point of its algorithm. Why not say
more clearly that "make use of partial sorting and calculating mean
of middle values".
Maybe I'm too strict. It's not a really bug. Let's stop this thread.
From: Kjetil Halvorsen [mailto:kjetilbrinchmannhalvorsen at gmail.com]
Sent: Friday, May 29, 2009 11:16 AM
To: Zheng, Xin (NIH) [C]
Cc: r-help at r-project.org
Subject: Re: [R] R's documentation
On Fri, May 29, 2009 at 11:01 AM, Zheng, Xin (NIH) [C] <zhengxin at mail.nih.gov<mailto:zhengxin at mail.nih.gov>> wrote:
Sometimes I get confused with R's documentation. It seems the documents is not maintained and updated well. Anyone has similar feeling? I don't mean to offend anyone. I hope R would get better and better. But documentation is really one very important factor which could attract people coming or drive people away.
??Do you have any specific example? Without examples it is difficult to say much.
??Do you refer to the core part of R, or to contributed packages?
My impression is that when some functions are changed, the documentation is generally
changed at the same time. And I am sure that is the goal, so if you know a case where it did not happen you should
file a bug report.
kjetil
Xin Zheng
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org<mailto:R-help at r-project.org> mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
On Fri, May 29, 2009 at 05:20:24PM +0100, Patrick Burns wrote:
If you find some documentation that is confusing, then you can write a message about it that states:
I think that some kind of a glossary would be helpful. Then I would know
whether certain words or phrases are R-specific or whether they come from
statistics, so I'd at least know where should I continue to dig further.
A text explaining how data frames *are meant to be used* would be helpful.
The intro to data frames is clear (collection of vectors of same length),
but it left me clueless about how functions interpret the data inside. It
finally clicked for me when I was reading some intro about lattice graphics
and where I actually had to display the builtin data-set. Such a basic
concept should be explained somewhere without the user needing to basically
reverse-engineer the concept. In other words, the "Introduction to R"
should contain something about "long" and "wide" data formats. Or at least
links to proper descriptions should be given (plyr, reshape packages).
Implicit conversions are vague. If variable x is a factor, what does
x==8 do? Convert 8 to string and compare to one of the levels of x?
Compare as.numeric(x) with 8? Simple experiment reveals this, but
help("==") does not shed light on the issue. (".. or other objects
for which methods have been written.") This raises a bunch of questions:
What kind of objects are there in R? How do I find object's methods?
How do I find overload of == that compares factors and integers (or at
least HELP for a particular overload)? The help on "==" is precise, but
utterly useless for somebody who does not already know 1) what == does,
and 2) all the other wider concepts mentioned in the help text.
[And so on.. this was just the example that was lately bothering me. In
general, more cross-referencing between documentation topics might be helpful.
"SEE ALSO" is not sufficient; hyperlinking would be much more effective because
it hints at whether a topic is documented or not.]
I'm an experienced developer, yet it took me three months to go over from
5-dimensional arrays and fudging with apply() margins to "proper" use of
data-frames. Had I needed somewhat simpler data manipulation or graphics, I
would have thrown out R out of the window, as I have many times before.
Things *should not* be that way. For an example of what I consider to be
well-structured documentation, please see
http://doc.qtsoftware.com/4.5/how-to-learn-qt.html
which made it possible for me to figure out reasonably quickly how to do what I
needed without the need for internet searches or asking on mailing lists.
[And so on, and so on.. I can only describe the help text as "opaque". Reading
it feels like reading a foreign language that I'm not very proficient in.]
Zeljko Vrba wrote:
On Fri, May 29, 2009 at 05:20:24PM +0100, Patrick Burns wrote:
If you find some documentation that is confusing, then you can write a message about it that states:
I think that some kind of a glossary would be helpful. Then I would know whether certain words or phrases are R-specific or whether they come from statistics, so I'd at least know where should I continue to dig further.
http://www.burns-stat.com/pages/Spoetry/glossary.pdf may (partially) satisfy this part of your wishlist. Pat
A text explaining how data frames *are meant to be used* would be helpful.
The intro to data frames is clear (collection of vectors of same length),
but it left me clueless about how functions interpret the data inside. It
finally clicked for me when I was reading some intro about lattice graphics
and where I actually had to display the builtin data-set. Such a basic
concept should be explained somewhere without the user needing to basically
reverse-engineer the concept. In other words, the "Introduction to R"
should contain something about "long" and "wide" data formats. Or at least
links to proper descriptions should be given (plyr, reshape packages).
Implicit conversions are vague. If variable x is a factor, what does
x==8 do? Convert 8 to string and compare to one of the levels of x?
Compare as.numeric(x) with 8? Simple experiment reveals this, but
help("==") does not shed light on the issue. (".. or other objects
for which methods have been written.") This raises a bunch of questions:
What kind of objects are there in R? How do I find object's methods?
How do I find overload of == that compares factors and integers (or at
least HELP for a particular overload)? The help on "==" is precise, but
utterly useless for somebody who does not already know 1) what == does,
and 2) all the other wider concepts mentioned in the help text.
[And so on.. this was just the example that was lately bothering me. In
general, more cross-referencing between documentation topics might be helpful.
"SEE ALSO" is not sufficient; hyperlinking would be much more effective because
it hints at whether a topic is documented or not.]
I'm an experienced developer, yet it took me three months to go over from
5-dimensional arrays and fudging with apply() margins to "proper" use of
data-frames. Had I needed somewhat simpler data manipulation or graphics, I
would have thrown out R out of the window, as I have many times before.
Things *should not* be that way. For an example of what I consider to be
well-structured documentation, please see
http://doc.qtsoftware.com/4.5/how-to-learn-qt.html
which made it possible for me to figure out reasonably quickly how to do what I
needed without the need for internet searches or asking on mailing lists.
[And so on, and so on.. I can only describe the help text as "opaque". Reading
it feels like reading a foreign language that I'm not very proficient in.]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On May 30, 2009, at 2:01 AM, Zeljko Vrba wrote:
On Fri, May 29, 2009 at 05:20:24PM +0100, Patrick Burns wrote:
If you find some documentation that is confusing, then you can write a message about it that states:
I think that some kind of a glossary would be helpful. Then I would
know
whether certain words or phrases are R-specific or whether they come
from
statistics, so I'd at least know where should I continue to dig
further.
A text explaining how data frames *are meant to be used* would be
helpful.
The intro to data frames is clear (collection of vectors of same
length),
but it left me clueless about how functions interpret the data
inside. It
finally clicked for me when I was reading some intro about lattice
graphics
and where I actually had to display the builtin data-set. Such a
basic
concept should be explained somewhere without the user needing to
basically
reverse-engineer the concept. In other words, the "Introduction to R"
should contain something about "long" and "wide" data formats. Or
at least
links to proper descriptions should be given (plyr, reshape packages).
Implicit conversions are vague. If variable x is a factor, what does
x==8 do? Convert 8 to string and compare to one of the levels of x?
Compare as.numeric(x) with 8? Simple experiment reveals this, but
help("==") does not shed light on the issue. (".. or other objects
for which methods have been written.")
Actually the help page also says: "If the two arguments are atomic vectors of different types, one is coerced to the type of the other, the (decreasing) order of precedence being character, complex, numeric, integer, logical and raw." But the recurring puzzle of "what factors really are" is at work here: > x <- factor(8) > typeof(x) [1] "integer" > as.integer(x) [1] 1 > x == 8 [1] TRUE > is.numeric(x) [1] FALSE > typeof(x) [1] "integer" > is.integer(x) [1] FALSE > mode(x) [1] "numeric" > is.numeric(x) [1] FALSE
This raises a bunch of questions: What kind of objects are there in R? How do I find object's methods?
Actually objects have classes (which determine what functions can be applied), while functions have methods: ?class ?methods
How do I find overload of == that compares factors and integers (or at least HELP for a particular overload)? The help on "==" is precise, but utterly useless for somebody who does not already know 1) what == does, and 2) all the other wider concepts mentioned in the help text. [And so on.. this was just the example that was lately bothering me. In general, more cross-referencing between documentation topics might be helpful. "SEE ALSO" is not sufficient; hyperlinking would be much more effective because it hints at whether a topic is documented or not.] I'm an experienced developer, yet it took me three months to go over from 5-dimensional arrays and fudging with apply() margins to "proper" use of data-frames.
I remember similar problems in getting to the point where I could use dataframes. In frustration I decided to construct a PowerPoint (actually a OO.o Presentation) that assembled the various accessor and constructor functions for the components of dataframes.
Had I needed somewhat simpler data manipulation or graphics, I would have thrown out R out of the window, as I have many times before. Things *should not* be that way. For an example of what I consider to be well-structured documentation, please see http://doc.qtsoftware.com/4.5/how-to-learn-qt.html which made it possible for me to figure out reasonably quickly how to do what I needed without the need for internet searches or asking on mailing lists. [And so on, and so on.. I can only describe the help text as "opaque". Reading it feels like reading a foreign language that I'm not very proficient in.]
The usual advice is to get a book. In years gone by Venables and Ripley's MASS was the standard, but more recently Dalgaard and others have offered their best efforts at an "intro to R". My library includes MASS ed 2, Sarkar's Lattice, Spector's Data Manipulation with R, Harrell's RMS, Wood's "GAMs: An Intro with R".
David Winsemius, MD Heritage Laboratories West Hartford, CT