Dear list, I'm new to R, please bear with my silly questions. I'm trying to get an understanding of why the results I get from a call to hist() are not as I thought I would get. When I use the parameter freq=FALSE, I think the plot will contain bars that none of them is larger than 1, because they're probabilities. But for my code, the bars exceeded 1. The actual data seems immaterial. I tried with dummy data: > hist(runif(1000), freq=FALSE) and the histogram includes bars well over 1 in height. The man page says that freq=FALSE produces densities, so that the total area is 1. Clearly if all the values are between 0 and 1, as is the case here, some of the bars stand out above 1, for the area to be 1. I thought that it is the sum of the bar heights that would be 1, so that the bars reflect probabilities for each interval, rather than densities. So, the answer to my question would be "because it's densities, not probabilities", but then the question is, why densities and not probabilities? Regards, L.
Question about histogram
4 messages · Longe, Jonathan P Daily, Greg Snow +1 more
Because a histogram is descriptive and makes no assumptions about what it
describes? Attaching a probability to the bars assumes that some random
draw is being made. Suppose my data is a count of computers running a
particular OS. What would be the value in reporting this as a probability
that a randomly chosen computer is running Ubuntu? Density is more
universal, IMO.
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
the thing itself have purpose? Or do we, what's the word... imbue it."
- Jubal Early, Firefly
r-help-bounces at r-project.org wrote on 01/13/2011 01:37:01 PM:
[image removed] [R] Question about histogram Longe to: r-help 01/13/2011 03:11 PM Sent by: r-help-bounces at r-project.org Dear list, I'm new to R, please bear with my silly questions. I'm trying to get an
understanding of why the results I get from a call to hist() are not as I thought I would get. When I use the parameter freq=FALSE, I think the
plot will contain bars that none of them is larger than 1, because they're probabilities. But for my code, the bars exceeded 1. The actual data seems immaterial. I tried with dummy data:
> hist(runif(1000), freq=FALSE)
and the histogram includes bars well over 1 in height. The man page says that freq=FALSE produces densities, so that the total area is 1. Clearly if all the values are between 0 and 1, as is the case here, some
of the bars stand out above 1, for the area to be 1. I thought that it is the sum of the bar heights that would be 1, so that the bars reflect probabilities for each interval, rather than densities. So, the answer to my question would be "because it's densities, not probabilities", but
then the question is, why densities and not probabilities? Regards, L.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Densities allow you to then plot a reference distribution, or the result of a call to density, or other density based lines on top of your histogram and everything is appropriately scaled and is fairly easy.
Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Longe > Sent: Thursday, January 13, 2011 11:37 AM > To: r-help at r-project.org > Subject: [R] Question about histogram > > Dear list, > > I'm new to R, please bear with my silly questions. I'm trying to get > an > understanding of why the results I get from a call to hist() are not as > I thought I would get. When I use the parameter freq=FALSE, I think > the > plot will contain bars that none of them is larger than 1, because > they're probabilities. But for my code, the bars exceeded 1. > > The actual data seems immaterial. I tried with dummy data: > > > hist(runif(1000), freq=FALSE) > > and the histogram includes bars well over 1 in height. The man page > says that freq=FALSE produces densities, so that the total area is 1. > Clearly if all the values are between 0 and 1, as is the case here, > some > of the bars stand out above 1, for the area to be 1. I thought that it > is the sum of the bar heights that would be 1, so that the bars reflect > probabilities for each interval, rather than densities. So, the answer > to my question would be "because it's densities, not probabilities", > but > then the question is, why densities and not probabilities? > > Regards, > L. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110113/a2e60605/attachment.pl>