Skip to content
Prev 34740 / 63421 Next

Binning of integers with hist() function odd results (P (PR#14047)

On 06-Nov-09 23:30:12, gug at fnal.gov wrote:
This is the correct intended bahaviour. By default, values which are
exactly on the boundary between two bins are counted in the bin which
is just below the boundary value. Except that the bottom-most break
will count values on it into the bin just above it.

Hence 1,2,2 all go into the [1,2] bin; 3,3,3 into (2,3];
4,4,4,4 into (3,4]; and 5,5,5,5,5 into (4,5]. Hence the counts 3,3,4,5.

Since you did not set breaks in
  h1<-hist(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,5)),
they were set using the default method, and you can see what they are
with

  h1$breaks
  [1] 1 2 3 4 5

When you add 0.1 to each value, you push the values on the boundaries
up into the next bin. Now each value is inside its bin, and not on
any boundary. Hence 1.1 is in (1,2]; 2.1,2.1 in (2,3];
3.1,3.1,3.1 in (3,4]; 4.1,4.1,4.1,4.1 in (4,5]; and
5.1,5.1,5.1,5.1,5.1 in (5,6], giving counts 1,2,3,4,5 as you observe.

The default behaviour described above is defined by the default options

  include.lowest = TRUE, right = TRUE

where:

include.lowest: logical; if 'TRUE', an 'x[i]' equal to the 'breaks'
          value will be included in the first (or last, for 'right =
          FALSE') bar.  This will be ignored (with a warning) unless
          'breaks' is a vector.

   right: logical; if 'TRUE', the histograms cells are right-closed
          (left open) intervals.

See '?hist'. You can change this behaviour by shanging the options.

Hoping this helps,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Nov-09                                       Time: 13:57:07
------------------------------ XFMail ------------------------------