I have two large datasets (156K and 2.06M records). Each row has the
hour that an event happened, represented by an integer from 0 to 23.
R's histogram is combining some data.
Here's the command I ran to get the histogram:
histinfo <- hist(crashes$hour, right=FALSE)
$breaks
?[1] ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
$counts
?[1] ?4755 ?4618 ?5959 ?3292 ?2378 ?2715 ?4592 ?6144 ?6860 ?5598 ?5601
?6596 ?7152 ?7490 ?8166
[16] ?9758 11301 11745 ?9943 ?7494 ?6272 ?6220 11669
$intensities
?[1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
0.02937602 0.03930449
?[9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
0.05223967 0.06242403
[17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
0.07464911
$density
?[1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844
0.02937602 0.03930449
?[9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515
0.05223967 0.06242403
[17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068
0.07464911
$mids
?[1] ?0.5 ?1.5 ?2.5 ?3.5 ?4.5 ?5.5 ?6.5 ?7.5 ?8.5 ?9.5 10.5 11.5 12.5
13.5 14.5 15.5 16.5 17.5
[19] 18.5 19.5 20.5 21.5 22.5
$xname
[1] "crashes$hour"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
Note how the last value in counts is?11669. It's relevant to the
output of table(crashes$hour):
? ? 0 ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? 7 ? ? 8 ? ? 9 ? ?10
11 ? ?12 ? ?13 ? ?14
?4755 ?4618 ?5959 ?3292 ?2378 ?2715 ?4592 ?6144 ?6860 ?5598 ?5601
6596 ?7152 ?7490 ?8166
? ?15 ? ?16 ? ?17 ? ?18 ? ?19 ? ?20 ? ?21 ? ?22 ? ?23
?9758 11301 11745 ?9943 ?7494 ?6272 ?6220 ?6000 ?5669
Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is
that correct for the histogram to combine hours 22 and 23? Since I
specified right = FALSE, I figured there's no way 23 would be combined
with 22?
Adding?breaks=24 to the hist makes no difference; it's still stuck at
23 breaks. I also tried breaks=25 and 23?and several other values, in
case I am misinterpreting breaks's meaning, but none of them make a
difference.
I imagine this is a n00b question, so my apologies if this is obvious.
Aren