Unexpected behavior from hist()
Density means that the AREAS of the bars add to 1, not the HEIGHTS of the bars. You probably have intervals that are less than 1. Eg:
set.seed(42) x <- rpois(1000, 5)/100 info <- hist(x, prob=TRUE) info
$breaks [1] 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 $counts [1] 42 88 151 177 178 131 97 70 43 14 6 2 1 $density [1] 4.2 8.8 15.1 17.7 17.8 13.1 9.7 7.0 4.3 1.4 0.6 0.2 0.1 $mids [1] 0.005 0.015 0.025 0.035 0.045 0.055 0.065 0.075 0.085 0.095 0.105 0.115 [13] 0.125 $xname [1] "x" $equidist [1] TRUE attr(,"class") [1] "histogram"
diff(info$breaks)*info$density # Areas of each bar
[1] 0.042 0.088 0.151 0.177 0.178 0.131 0.097 0.070 0.043 0.014 0.006 0.002 [13] 0.001
sum(diff(info$breaks)*info$density) # Sum of the areas
[1] 1 ------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Sarah Goslee Sent: Thursday, June 13, 2013 10:36 AM To: Mohamed Badawy Cc: r-help at r-project.org Subject: Re: [R] Unexpected behavior from hist() Hi, On Thu, Jun 13, 2013 at 11:13 AM, Mohamed Badawy
<mbadawy at pm-engr.com> wrote:
Hi... I'm still a beginner in R. While doing some curve-fitting
with a raw data set of length 22,000, here is what I had:
hist(y,col="red")
gives me the frequency histogram, 13 total rectangles, highest is
near 5000.
You don't provide a reproducible example, so here's some fake data: somedata <- runif(1000)
Now
hist(y,prob=TRUE,col="red",ylim=c(0,1.5))
gives me the density (probability?) histogram, same number f
rectangles, but the highest rectangle is obviously higher than 1,
how can this be?!!!
Because you misread the help. using freq=FALSE (equivalent to
prob=TRUE, which is a legacy option), you are getting:
freq: logical; if 'TRUE', the histogram graphic is a representation
of frequencies, the 'counts' component of the result; if
'FALSE', probability densities, component 'density', are
plotted (so that the histogram has a total area of one).
Defaults to 'TRUE' _if and only if_ 'breaks' are
equidistant
(and 'probability' is not specified).
It sounds like what you actually want is:
somehist <- hist(somedata, plot=FALSE)
somehist$counts <- somehist$counts/sum(somehist$counts)
plot(somehist)
P.S. I had to post this thread via email as it got rejected as I
posted it from Nabble, reason was "Message rejected by filter rule match" Nabble is not the R-help mailing list. Posting via email is the correct thing to do. Sarah
Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.