I think there are a couple of things in ?hist that are not quite as
clear as they could be.
(1)
freq: logical; if 'TRUE', the histogram graphic is a representation
of frequencies, the 'counts' component of the result; if
'FALSE', _relative_ frequencies ("probabilities"), component
'density', are plotted. Defaults to 'TRUE' _iff_ 'breaks'
are equidistant (and 'probability' is not specified).
Unless I'm missing something, the 'density' component is NOT relative
frequency or 'probability' in any reasonable sense, country-specific
biases notwithstanding, except in the very special case where
all(diff(breaks) == 1). Thus, the above description is confusing and
probably even wrong.
Also, it seems to me that hist cannot draw a relative frequency
histogram at all (which is not a bad thing, but it's of course very
important to the undergrads we're teaching intro stats and R to). This
should be explicitly mentioned.
(2)
breaks: one of:
...
* a single number giving the number of cells for the
histogram,
...
This is not quite true. 'breaks' is used in 'pretty', so it's more a
suggestion than an exact specification. I'm not sure whether or not
the behaviour should be changed (what's the point of having ``pretty''
breakpoints anyway?), but if not, the documentation should be
clarified.
I'll be happy to provide a patch if these changes are considered reasonable.
Deepayan
hist.default documentation
4 messages · Deepayan Sarkar, Duncan Murdoch
On 6/17/2005 8:58 AM, Deepayan Sarkar wrote:
I think there are a couple of things in ?hist that are not quite as
clear as they could be.
(1)
freq: logical; if 'TRUE', the histogram graphic is a representation
of frequencies, the 'counts' component of the result; if
'FALSE', _relative_ frequencies ("probabilities"), component
'density', are plotted. Defaults to 'TRUE' _iff_ 'breaks'
are equidistant (and 'probability' is not specified).
Unless I'm missing something, the 'density' component is NOT relative
frequency or 'probability' in any reasonable sense, country-specific
biases notwithstanding, except in the very special case where
all(diff(breaks) == 1). Thus, the above description is confusing and
probably even wrong.
I agree.
Also, it seems to me that hist cannot draw a relative frequency histogram at all (which is not a bad thing, but it's of course very important to the undergrads we're teaching intro stats and R to). This should be explicitly mentioned.
I'm not sure about this. Is it really worth mentioning something if you can't do it? Are you thinking of just giving a reference to barplot?
(2)
breaks: one of:
...
* a single number giving the number of cells for the
histogram,
...
This is not quite true. 'breaks' is used in 'pretty', so it's more a
suggestion than an exact specification. I'm not sure whether or not
the behaviour should be changed (what's the point of having ``pretty''
breakpoints anyway?), but if not, the documentation should be
clarified.
I like the pretty breakpoints. It is good to label the breakpoints, and ugly to have labels at other than pretty points. I'd clarify by changing "giving" to "suggesting".
I'll be happy to provide a patch if these changes are considered reasonable.
Please do. Duncan Murdoch
On 6/17/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 6/17/2005 8:58 AM, Deepayan Sarkar wrote:
I think there are a couple of things in ?hist that are not quite as
clear as they could be.
(1)
freq: logical; if 'TRUE', the histogram graphic is a representation
of frequencies, the 'counts' component of the result; if
'FALSE', _relative_ frequencies ("probabilities"), component
'density', are plotted. Defaults to 'TRUE' _iff_ 'breaks'
are equidistant (and 'probability' is not specified).
Unless I'm missing something, the 'density' component is NOT relative
frequency or 'probability' in any reasonable sense, country-specific
biases notwithstanding, except in the very special case where
all(diff(breaks) == 1). Thus, the above description is confusing and
probably even wrong.
I agree.
Also, it seems to me that hist cannot draw a relative frequency histogram at all (which is not a bad thing, but it's of course very important to the undergrads we're teaching intro stats and R to). This should be explicitly mentioned.
I'm not sure about this. Is it really worth mentioning something if you can't do it? Are you thinking of just giving a reference to barplot?
Not mentioning it is fine.
(2)
breaks: one of:
...
* a single number giving the number of cells for the
histogram,
...
This is not quite true. 'breaks' is used in 'pretty', so it's more a
suggestion than an exact specification. I'm not sure whether or not
the behaviour should be changed (what's the point of having ``pretty''
breakpoints anyway?), but if not, the documentation should be
clarified.
I like the pretty breakpoints. It is good to label the breakpoints, and ugly to have labels at other than pretty points. I'd clarify by changing "giving" to "suggesting".
Actually, I missed the remark just below this:
In the last three cases the number is a suggestion only.
so this is fine as it is.
I'll be happy to provide a patch if these changes are considered reasonable.
Please do.
Here's the output of svn diff. Is this a reasonable way of providing a patch?
Index: hist.Rd
===================================================================
--- hist.Rd (revision 34748)
+++ hist.Rd (working copy)
@@ -28,9 +28,9 @@
}
\item{freq}{logical; if \code{TRUE}, the histogram graphic is a
representation of frequencies, the \code{counts} component of
- the result; if \code{FALSE}, \emph{relative} frequencies
- (\dQuote{probabilities}), component \code{density},
- are plotted. Defaults to \code{TRUE} \emph{iff} \code{breaks} are
+ the result; if \code{FALSE}, probability densities, component
+ \code{density}, are plotted (so that the histogram has a total area
+ of one). Defaults to \code{TRUE} \emph{iff} \code{breaks} are
equidistant (and \code{probability} is not specified).}
\item{probability}{an \emph{alias} for \code{!freq}, for S compatibility.}
\item{include.lowest}{logical; if \code{TRUE}, an \code{x[i]} equal to
Deepayan
Thanks, I've committed the change. Duncan Murdoch
On 6/17/2005 10:30 AM, Deepayan Sarkar wrote:
On 6/17/05, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:
On 6/17/2005 8:58 AM, Deepayan Sarkar wrote:
I think there are a couple of things in ?hist that are not quite as
clear as they could be.
(1)
freq: logical; if 'TRUE', the histogram graphic is a representation
of frequencies, the 'counts' component of the result; if
'FALSE', _relative_ frequencies ("probabilities"), component
'density', are plotted. Defaults to 'TRUE' _iff_ 'breaks'
are equidistant (and 'probability' is not specified).
Unless I'm missing something, the 'density' component is NOT relative
frequency or 'probability' in any reasonable sense, country-specific
biases notwithstanding, except in the very special case where
all(diff(breaks) == 1). Thus, the above description is confusing and
probably even wrong.
I agree.
Also, it seems to me that hist cannot draw a relative frequency histogram at all (which is not a bad thing, but it's of course very important to the undergrads we're teaching intro stats and R to). This should be explicitly mentioned.
I'm not sure about this. Is it really worth mentioning something if you can't do it? Are you thinking of just giving a reference to barplot?
Not mentioning it is fine.
(2)
breaks: one of:
...
* a single number giving the number of cells for the
histogram,
...
This is not quite true. 'breaks' is used in 'pretty', so it's more a
suggestion than an exact specification. I'm not sure whether or not
the behaviour should be changed (what's the point of having ``pretty''
breakpoints anyway?), but if not, the documentation should be
clarified.
I like the pretty breakpoints. It is good to label the breakpoints, and ugly to have labels at other than pretty points. I'd clarify by changing "giving" to "suggesting".
Actually, I missed the remark just below this:
In the last three cases the number is a suggestion only.
so this is fine as it is.
I'll be happy to provide a patch if these changes are considered reasonable.
Please do.
Here's the output of svn diff. Is this a reasonable way of providing a patch?
Index: hist.Rd
===================================================================
--- hist.Rd (revision 34748)
+++ hist.Rd (working copy)
@@ -28,9 +28,9 @@
}
\item{freq}{logical; if \code{TRUE}, the histogram graphic is a
representation of frequencies, the \code{counts} component of
- the result; if \code{FALSE}, \emph{relative} frequencies
- (\dQuote{probabilities}), component \code{density},
- are plotted. Defaults to \code{TRUE} \emph{iff} \code{breaks} are
+ the result; if \code{FALSE}, probability densities, component
+ \code{density}, are plotted (so that the histogram has a total area
+ of one). Defaults to \code{TRUE} \emph{iff} \code{breaks} are
equidistant (and \code{probability} is not specified).}
\item{probability}{an \emph{alias} for \code{!freq}, for S compatibility.}
\item{include.lowest}{logical; if \code{TRUE}, an \code{x[i]} equal to
Deepayan