Skip to content

Question about histogram

4 messages · Longe, Jonathan P Daily, Greg Snow +1 more

#
Dear list,

I'm new to R, please bear with my silly questions.  I'm trying to get an 
understanding of why the results I get from a call to hist() are not as 
I thought I would get.  When I use the parameter freq=FALSE, I think the 
plot will contain bars that none of them is larger than 1, because 
they're probabilities.  But for my code, the bars exceeded 1.

The actual data seems immaterial.  I tried with dummy data:

 > hist(runif(1000), freq=FALSE)

and the histogram includes bars well over 1 in height.  The man page 
says that freq=FALSE produces densities, so that the total area is 1.  
Clearly if all the values are between 0 and 1, as is the case here, some 
of the bars stand out above 1, for the area to be 1.  I thought that it 
is the sum of the bar heights that would be 1, so that the bars reflect 
probabilities for each interval, rather than densities.  So, the answer 
to my question would be "because it's densities, not probabilities", but 
then the question is, why densities and not probabilities?

Regards,
L.
#
Because a histogram is descriptive and makes no assumptions about what it 
describes? Attaching a probability to the bars assumes that some random 
draw is being made. Suppose my data is a count of computers running a 
particular OS. What would be the value in reporting this as a probability 
that a randomly chosen computer is running Ubuntu? Density is more 
universal, IMO.
--------------------------------------
Jonathan P. Daily
Technician - USGS Leetown Science Center
11649 Leetown Road
Kearneysville WV, 25430
(304) 724-4480
"Is the room still a room when its empty? Does the room,
 the thing itself have purpose? Or do we, what's the word... imbue it."
     - Jubal Early, Firefly

r-help-bounces at r-project.org wrote on 01/13/2011 01:37:01 PM:
http://www.R-project.org/posting-guide.html
#
Densities allow you to then plot a reference distribution, or the result of a call to density, or other density based lines on top of your histogram and everything is appropriately scaled and is fairly easy.