Skip to content

frequencies of a discrete numeric variable, including zeros

10 messages · Michael Friendly, Marc Schwartz, Richard M. Heiberger +5 more

#
The data vector, art, given below using dput(),  gives a set of discrete 
numeric values for 915 observations,
in the range of 0:19.  I want to make some plots of the frequency 
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom 
plot due to 0 frequencies
for some of the 0:19 counts.

table() excludes the values of art that occur with zero frequency, and 
these are excluded in
barplot()
 > table(art)
art
   0   1   2   3   4   5   6   7   8   9  10  11  12  16  19
275 246 178  84  67  27  17  12   1   2   1   1   2   1   1
 > barplot(table(art))


A direct calculation, using colSums of outer() gives me the values I 
want, but this seems unnecessarily
complicated for this simple task.

 > (art.freq <- colSums(outer(art, 0:19, `==`)))
  [1] 275 246 178  84  67  27  17  12   1   2   1   1   2   0   0 0   
1   0   0   1
 >  barplot(art.freq, names.arg=0:19)


Moreover, I was surprised by the result of hist() on this data, because 
the 0 & 1 counts from
the above were combined in this call:

 > art.hist <- hist(art, breaks=0:19, plot=FALSE)
 > art.hist$breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
 > art.hist$counts
  [1] 521 178  84  67  27  17  12   1   2   1   1   2   0   0   0 1   
0   0   1

Is there some option I missed here?

The data:

 > dput(art)
c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 9L, 9L, 10L,
11L, 12L, 12L, 16L, 19L)
#
On Sep 2, 2014, at 12:29 PM, Michael Friendly <friendly at yorku.ca> wrote:

            
Micheal,

Corece the vector to be tabulated to a factor, that contains all of the levels 0:19, then use barplot():

art.fac <- factor(art, levels = 0:19)
art.fac
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
275 246 178  84  67  27  17  12   1   2   1   1   2   0   0   0   1   0 
 18  19 
  0   1 


barplot(table(art.fac), cex.names = 0.5)


Thanks for providing the data above.

Regards,

Marc Schwartz
#
I like Marc's answer, and I occasionaly have need for a different idiom.

old <-
structure(list(`0` = 275L, `1` = 246L, `2` = 178L, `3` = 84L,
    `4` = 67L, `5` = 27L, `6` = 17L, `7` = 12L, `8` = 1L, `9` = 2L,
    `10` = 1L, `11` = 1L, `12` = 2L, `16` = 1L, `19` = 1L), .Names = c("0",
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"16", "19"), row.names = 2L, class = "data.frame")

new <- rep(0,20)
names(new) <- 0:19
new[names(old)] <- as.numeric(old)
new

Rich
On Tue, Sep 2, 2014 at 1:36 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
#
On 02/09/2014, 1:29 PM, Michael Friendly wrote:
hist() is mainly aimed at continuous data, where values generally don't
show up on the boundaries.  Since you have integer data and integer
breaks, all values show up on the boundaries, and since you didn't
override the include.lowest argument, the bars are for intervals [0,1],
(1,2], (2,3], etc, i.e. the leftmost one includes its left end, but none
of the others do.

As Marc said, barplot is what you want, but you need to declare your
data to be a factor to include all the levels.

Duncan Murdoch
#
Hello,

As for table, the help page says that "It is best to supply factors 
rather than rely on coercion.", So if you want to include elements in 
the range 0:19 with a count of zero, try

table(factor(art, levels = 0:19))


As for hist, use option right = FALSE.

art.hist <- hist(art, breaks=0:19, plot=FALSE, right = FALSE)
art.hist$breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
art.hist$counts
  [1] 275 246 178  84  67  27  17  12   1   2   1   1   2   0   0   0 
1   0   1


Hope this helps,

Rui Barradas

Em 02-09-2014 18:29, Michael Friendly escreveu:
#
Another approach using barplot:

barplot(table(cut(art, breaks= -1:19, labels=0:19)))

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rui Barradas
Sent: Tuesday, September 2, 2014 12:59 PM
To: Michael Friendly; R-help
Subject: Re: [R] frequencies of a discrete numeric variable, including zeros

Hello,

As for table, the help page says that "It is best to supply factors 
rather than rely on coercion.", So if you want to include elements in 
the range 0:19 with a count of zero, try

table(factor(art, levels = 0:19))


As for hist, use option right = FALSE.

art.hist <- hist(art, breaks=0:19, plot=FALSE, right = FALSE)
art.hist$breaks
  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
art.hist$counts
  [1] 275 246 178  84  67  27  17  12   1   2   1   1   2   0   0   0 
1   0   1


Hope this helps,

Rui Barradas

Em 02-09-2014 18:29, Michael Friendly escreveu:
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
Hi Michael,

I think that histograms are intrinsically misleading for discrete data, and
that while bar graphs are an improvement, they also invite
misinterpretation. I generally do something like this:

f <- table(factor(art, levels=0:19))
plot(as.numeric(names(f)), as.numeric(f), type="h",
    xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=0:19)
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)


Actually, I prefer omitting the points corresponding to 0 counts, which is
even simpler:

f <- table(art)
plot(as.numeric(names(f)), as.numeric(f), type="h",
    xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=min(art):max(art))
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)


Best,
 John

-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/
#
The built-in table method for plot() makes a decent looking plot as
well.  Look at
  plot(table(art), ylab="Count")
  plot(table(factor(art, levels=0:19)), ylab="Count")
  plot(table(LETTERS[art+1]), ylab="Count")
  plot(table(factor(LETTERS[art+1], levels=LETTERS[1:20])), ylab="Count")
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Tue, Sep 2, 2014 at 12:49 PM, John Fox <jfox at mcmaster.ca> wrote:
#
Dear Bill,

Yes, that's better -- essentially similar to what I suggested but much less work. I wasn't aware of it. You could even add the points at the tops of the spikes via a follow-up points() command.

Thanks,
 John
#
Thanks to all who replied to this thread.

To summarize, John Fox and William Dunlap's suggestion amounts to this 
plot, where it
becomes *crucial* to eliminate the zeros (otherwise they would not be 
distinguishable
from the counts of 1, with points()):

# Fox/Dunlap plot, using plot.table method
art.tab0 <- table(art)
plot(art.tab0, ylab="Frequency", xlab="Number of articles")
points(as.numeric(names(art.tab0)), art.tab0, pch=16)

Here, I actually prefer the barplot, using factor() to retain the zeros:

# coerce to a factor, then use table()
art.fac <- factor(art, levels = 0:19)
art.tab <- table(art.fac)
barplot(art.tab, ylab="Frequency", xlab="Number of articles")

However, the frequencies for small values of art dominate the display, 
and I'm contemplating a
Poisson regression anyway, so why not plot on a log scale:

# plot on log scale, but start at 1 to avoid log(0)
barplot(art.tab+1, ylab="log(Frequency+1)", xlab="Number of articles", 
log="y")

# plot on log scale, directly
barplot(log(art.tab+1), ylab="log(Frequency+1)", xlab="Number of articles")

The first method, using log="y" gives axis labels on the scale of frequency.

best,
-Michael
On 9/2/2014 3:49 PM, John Fox wrote: