Hi everyone, I need help. I want to have a "uniform" kind distribution. When I used sample function I got almost twice many zeros compared to other numbers. What's wrong with my command ? temp <-sample(0:12, 2000, replace=T,prob=(rep(1/13,13))) hist(temp) Thanks in advance, Taka,
sample function
6 messages · mirage sell, Martin C. Martin, David Scott +2 more
"hist" is lumping things together. Try: sum(temp == 0) compare to the height of the left most bar. Is this a bug in hist? - Martin
mirage sell wrote:
Hi everyone, I need help. I want to have a "uniform" kind distribution. When I used sample function I got almost twice many zeros compared to other numbers. What's wrong with my command ? temp <-sample(0:12, 2000, replace=T,prob=(rep(1/13,13))) hist(temp) Thanks in advance, Taka,
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Thu, 10 Mar 2005, mirage sell wrote:
Hi everyone, I need help. I want to have a "uniform" kind distribution. When I used sample function I got almost twice many zeros compared to other numbers. What's wrong with my command ?
Nothing is wrong with your sampling, it is the display in the histogram. Try temp <-sample(0:12, 2000, replace=T,prob=(rep(1/13,13))) table(temp) David Scott _________________________________________________________________ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: d.scott at auckland.ac.nz Graduate Officer, Department of Statistics
On Thu, 10 Mar 2005, Martin C. Martin wrote:
"hist" is lumping things together. Try: sum(temp == 0) compare to the height of the left most bar. Is this a bug in hist?
No, hist is the wrong thing to use to display this data. Try temp <-sample(0:12, 2000, replace=T,prob=(rep(1/13,13))) barplot(table(temp)) David Scott _________________________________________________________________ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: d.scott at auckland.ac.nz Graduate Officer, Department of Statistics
On Thu, 2005-03-10 at 20:54 -0600, mirage sell wrote:
Hi everyone, I need help. I want to have a "uniform" kind distribution. When I used sample function I got almost twice many zeros compared to other numbers. What's wrong with my command ? temp <-sample(0:12, 2000, replace=T,prob=(rep(1/13,13))) hist(temp) Thanks in advance,
Hint: take note that there are only 12 cells in the plot, not 13... However, note that the frequency of the 13 elements are appropriate:
table(sample(0:12, 2000, replace=T))
0 1 2 3 4 5 6 7 8 9 10 11 12 158 156 151 163 156 158 146 154 134 158 146 147 173 Review the details of how the breaks are selected in ?hist. BTW, you do not need to specify the 'prob' argument if you want equal probabilities as per my example above. HTH, Marc Schwartz
On 11-Mar-05 Martin C. Martin wrote:
"hist" is lumping things together. Try: sum(temp == 0) compare to the height of the left most bar. Is this a bug in hist? - Martin
Well, not a bug strictly speaking since "it works as documented",
but I do think it's not necessarily a happy choice.
The unsuspecting (like Martin) will step into holes even after
reading "?hist", since the truths are rather deeply (and I think
somewhat obliquely) hidden ("?hist" leads you to look up
"?nclass.Sturges" which in turn only mentions "Sturges' formula"
and invites you to read V&R's MASS book and other references
in the hope of further clarification -- all a bit much when
you just want to draw a histogram, which ought to be kid's
stuff! Not to mention the things to do with parameters
"include.lowest" and "right" whose combined effect is not
too obvious).
I'd like to repeat the sort of hint I occasionally give:
In using R, if there's any doubt it is best to spell out exactly
what you want rather than expecting the functions to agree with
what you want. R functions are often more complex and subtle
than you might suspect.
In this particular case,
hist(temp,breaks= -0.5+(-0:14) )
will produce the sort of thing which is wanted. One could
interpret the results which Martin reported as due to a
sort of "confusion" (but on whose part -- R or Martin?)
over the fact that "hist" is designed to deal with
"continuous" values, while his sample consists of integers.
For that particular case, one could also use "table" or
"barchart", as has been suggested by David Scott, which
would produce a plot of similar appearance; but this is
not in the "histogram family" despite appearances, since
it is not primarily a "quantitative" plot (i.e. respecting
the numerical values and their numerical comparisons), but
more a "catefory count". In particular, natural variants
of the above "hist" command such as
hist(temp,breaks= -0.5+2*(0:7) )
(which corresponds to binning by different intervals) do
not lie so easily in the "table" or "barchart" domain.
And I don't agree with David's comment that "No, hist
is the wrong thing to use to display this data."
In so far as these data are considered to be numerical
values of which one wants a view of their distribution,
then "hist" is entirely appropriate, as for any other
numerical variable. The only question is how to get
this to happen appropriately.
Would David make the same comment about data sampled
from (0:5000) instead of (0:12)?
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 11-Mar-05 Time: 10:59:55
------------------------------ XFMail ------------------------------