Histogram omitting/collapsing groups
Hi, I think you're not understanding quite what's going on with hist. Reread the help, and take a look at this small example. The solution I'd use is the last item.
x <- rep(1:10, times=1:10) table(x)
x 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
hist(x, plot=FALSE, right=TRUE)$counts
[1] 3 3 4 5 6 7 8 9 10
hist(x, plot=FALSE, right=TRUE)$breaks
[1] 1 2 3 4 5 6 7 8 9 10
hist(x, plot=FALSE, right=TRUE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
hist(x, plot=FALSE, right=FALSE)$counts
[1] 1 2 3 4 5 6 7 8 19
hist(x, plot=FALSE, right=FALSE)$breaks
[1] 1 2 3 4 5 6 7 8 9 10
hist(x, plot=FALSE, right=FALSE)$mids
[1] 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5
hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$counts
[1] 1 2 3 4 5 6 7 8 9 10
hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$breaks
[1] 0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5
hist(x, plot=FALSE, breaks=seq(.5, 10.5, by=1))$mids
[1] 1 2 3 4 5 6 7 8 9 10 Sarah
On Sat, Dec 31, 2011 at 10:25 AM, Aren Cambre <aren at arencambre.com> wrote:
I have two large datasets (156K and 2.06M records). Each row has the hour that an event happened, represented by an integer from 0 to 23. R's histogram is combining some data. Here's the command I ran to get the histogram:
histinfo <- hist(crashes$hour, right=FALSE)
Here's histinfo:
histinfo
$breaks ?[1] ?0 ?1 ?2 ?3 ?4 ?5 ?6 ?7 ?8 ?9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 $counts ?[1] ?4755 ?4618 ?5959 ?3292 ?2378 ?2715 ?4592 ?6144 ?6860 ?5598 ?5601 ?6596 ?7152 ?7490 ?8166 [16] ?9758 11301 11745 ?9943 ?7494 ?6272 ?6220 11669 $intensities ?[1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844 0.02937602 0.03930449 ?[9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515 0.05223967 0.06242403 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068 0.07464911 $density ?[1] 0.03041876 0.02954234 0.03812101 0.02105963 0.01521258 0.01736844 0.02937602 0.03930449 ?[9] 0.04388490 0.03581161 0.03583081 0.04219604 0.04575289 0.04791515 0.05223967 0.06242403 [17] 0.07229494 0.07513530 0.06360752 0.04794074 0.04012334 0.03979068 0.07464911 $mids ?[1] ?0.5 ?1.5 ?2.5 ?3.5 ?4.5 ?5.5 ?6.5 ?7.5 ?8.5 ?9.5 10.5 11.5 12.5 13.5 14.5 15.5 16.5 17.5 [19] 18.5 19.5 20.5 21.5 22.5 $xname [1] "crashes$hour" $equidist [1] TRUE attr(,"class") [1] "histogram" Note how the last value in counts is?11669. It's relevant to the output of table(crashes$hour): ? ? 0 ? ? 1 ? ? 2 ? ? 3 ? ? 4 ? ? 5 ? ? 6 ? ? 7 ? ? 8 ? ? 9 ? ?10 11 ? ?12 ? ?13 ? ?14 ?4755 ?4618 ?5959 ?3292 ?2378 ?2715 ?4592 ?6144 ?6860 ?5598 ?5601 6596 ?7152 ?7490 ?8166 ? ?15 ? ?16 ? ?17 ? ?18 ? ?19 ? ?20 ? ?21 ? ?22 ? ?23 ?9758 11301 11745 ?9943 ?7494 ?6272 ?6220 ?6000 ?5669 Notice how the sum of 22 and 23 from table(crashes$hour) is 11669? Is that correct for the histogram to combine hours 22 and 23? Since I specified right = FALSE, I figured there's no way 23 would be combined with 22? Adding?breaks=24 to the hist makes no difference; it's still stuck at 23 breaks. I also tried breaks=25 and 23?and several other values, in case I am misinterpreting breaks's meaning, but none of them make a difference. I imagine this is a n00b question, so my apologies if this is obvious. Aren
Sarah Goslee http://www.functionaldiversity.org