Binning of integers with hist() function odd results (PR#14046)
gug at fnal.gov wrote:
Full_Name: Gerald Guglielmo Version: 2.8.1 (2008-12-22) OS: OSX Leopard Submission from: (NULL) (131.225.103.35) When I attempt to use the hist() function to bin integers the behavior seems very odd as the bin boundary seems inconsistent across the various bins. For some bins the upper boundary includes the next integer value, while in others it does not. If I add 0.1 to every value, then the hist() binning behavior is what I would normally expect.
h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)) h1$mids
[1] 1.5 2.5 3.5 4.5
h1$counts
[1] 3 3 4 5
h2<-hist(c(1.1,2.1,2.1,3.1,3.1,3.1,4.1,4.1,4.1,4.1,5.1,5.1,5.1,5.1,5.1)) h2$mids
[1] 1.5 2.5 3.5 4.5 5.5
h2$counts
[1] 1 2 3 4 5 Naively I would have expected the same distribution of counts in the two cases, but clearly that is not happening. This is a simple example to illustrate the behavior, originally I noticed this while binning a large data sample where I had set the breaks=c(0,24,1).
This is as documented. See the include.lowest argument. Annoying, but not a bug. (It is arguably a design error that hist() is looking for "pretty" breakpoints rather than pretty midpoints, or maybe something more advanced to handle cases where the data are effectively tied to a lattice. It's been around "forever", though.)
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907