Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web:http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA
On Sep 2, 2014, at 12:29 PM, Michael Friendly <friendly at yorku.ca> wrote:
The data vector, art, given below using dput(), gives a set of discrete numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and these are excluded in
barplot()
Micheal,
Corece the vector to be tabulated to a factor, that contains all of the levels 0:19, then use barplot():
art.fac <- factor(art, levels = 0:19)
I like Marc's answer, and I occasionaly have need for a different idiom.
old <-
structure(list(`0` = 275L, `1` = 246L, `2` = 178L, `3` = 84L,
`4` = 67L, `5` = 27L, `6` = 17L, `7` = 12L, `8` = 1L, `9` = 2L,
`10` = 1L, `11` = 1L, `12` = 2L, `16` = 1L, `19` = 1L), .Names = c("0",
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"16", "19"), row.names = 2L, class = "data.frame")
new <- rep(0,20)
names(new) <- 0:19
new[names(old)] <- as.numeric(old)
new
Rich
On Tue, Sep 2, 2014 at 1:36 PM, Marc Schwartz <marc_schwartz at me.com> wrote:
On Sep 2, 2014, at 12:29 PM, Michael Friendly <friendly at yorku.ca> wrote:
The data vector, art, given below using dput(), gives a set of discrete numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and these are excluded in
barplot()
Micheal,
Corece the vector to be tabulated to a factor, that contains all of the levels 0:19, then use barplot():
art.fac <- factor(art, levels = 0:19)
The data vector, art, given below using dput(), gives a set of discrete
numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom
plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and
these are excluded in
barplot()
Moreover, I was surprised by the result of hist() on this data, because
the 0 & 1 counts from
the above were combined in this call:
hist() is mainly aimed at continuous data, where values generally don't
show up on the boundaries. Since you have integer data and integer
breaks, all values show up on the boundaries, and since you didn't
override the include.lowest argument, the bars are for intervals [0,1],
(1,2], (2,3], etc, i.e. the leftmost one includes its left end, but none
of the others do.
As Marc said, barplot is what you want, but you need to declare your
data to be a factor to include all the levels.
Duncan Murdoch
Hello,
As for table, the help page says that "It is best to supply factors
rather than rely on coercion.", So if you want to include elements in
the range 0:19 with a count of zero, try
table(factor(art, levels = 0:19))
As for hist, use option right = FALSE.
art.hist <- hist(art, breaks=0:19, plot=FALSE, right = FALSE)
art.hist$breaks
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
art.hist$counts
[1] 275 246 178 84 67 27 17 12 1 2 1 1 2 0 0 0
1 0 1
Hope this helps,
Rui Barradas
Em 02-09-2014 18:29, Michael Friendly escreveu:
The data vector, art, given below using dput(), gives a set of discrete
numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom
plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and
these are excluded in
barplot()
Another approach using barplot:
barplot(table(cut(art, breaks= -1:19, labels=0:19)))
-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Rui Barradas
Sent: Tuesday, September 2, 2014 12:59 PM
To: Michael Friendly; R-help
Subject: Re: [R] frequencies of a discrete numeric variable, including zeros
Hello,
As for table, the help page says that "It is best to supply factors
rather than rely on coercion.", So if you want to include elements in
the range 0:19 with a count of zero, try
table(factor(art, levels = 0:19))
As for hist, use option right = FALSE.
art.hist <- hist(art, breaks=0:19, plot=FALSE, right = FALSE)
art.hist$breaks
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
art.hist$counts
[1] 275 246 178 84 67 27 17 12 1 2 1 1 2 0 0 0
1 0 1
Hope this helps,
Rui Barradas
Em 02-09-2014 18:29, Michael Friendly escreveu:
The data vector, art, given below using dput(), gives a set of discrete
numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom
plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and
these are excluded in
barplot()
Hi Michael,
I think that histograms are intrinsically misleading for discrete data, and
that while bar graphs are an improvement, they also invite
misinterpretation. I generally do something like this:
f <- table(factor(art, levels=0:19))
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=0:19)
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Actually, I prefer omitting the points corresponding to 0 counts, which is
even simpler:
f <- table(art)
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=min(art):max(art))
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Best,
John
-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Michael Friendly
Sent: Tuesday, September 02, 2014 1:29 PM
To: R-help
Subject: [R] frequencies of a discrete numeric variable, including
zeros
The data vector, art, given below using dput(), gives a set of
discrete
numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom
plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and
these are excluded in
barplot()
The built-in table method for plot() makes a decent looking plot as
well. Look at
plot(table(art), ylab="Count")
plot(table(factor(art, levels=0:19)), ylab="Count")
plot(table(LETTERS[art+1]), ylab="Count")
plot(table(factor(LETTERS[art+1], levels=LETTERS[1:20])), ylab="Count")
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Tue, Sep 2, 2014 at 12:49 PM, John Fox <jfox at mcmaster.ca> wrote:
Hi Michael,
I think that histograms are intrinsically misleading for discrete data, and
that while bar graphs are an improvement, they also invite
misinterpretation. I generally do something like this:
f <- table(factor(art, levels=0:19))
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=0:19)
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Actually, I prefer omitting the points corresponding to 0 counts, which is
even simpler:
f <- table(art)
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=min(art):max(art))
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Best,
John
-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Michael Friendly
Sent: Tuesday, September 02, 2014 1:29 PM
To: R-help
Subject: [R] frequencies of a discrete numeric variable, including
zeros
The data vector, art, given below using dput(), gives a set of
discrete
numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom
plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and
these are excluded in
barplot()
Dear Bill,
Yes, that's better -- essentially similar to what I suggested but much less work. I wasn't aware of it. You could even add the points at the tops of the spikes via a follow-up points() command.
Thanks,
John
-----Original Message-----
From: William Dunlap [mailto:wdunlap at tibco.com]
Sent: Tuesday, September 02, 2014 4:14 PM
To: John Fox
Cc: Michael Friendly; R-help
Subject: Re: [R] frequencies of a discrete numeric variable, including
zeros
The built-in table method for plot() makes a decent looking plot as
well. Look at
plot(table(art), ylab="Count")
plot(table(factor(art, levels=0:19)), ylab="Count")
plot(table(LETTERS[art+1]), ylab="Count")
plot(table(factor(LETTERS[art+1], levels=LETTERS[1:20])),
ylab="Count")
Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Tue, Sep 2, 2014 at 12:49 PM, John Fox <jfox at mcmaster.ca> wrote:
Hi Michael,
I think that histograms are intrinsically misleading for discrete
data, and
that while bar graphs are an improvement, they also invite
misinterpretation. I generally do something like this:
f <- table(factor(art, levels=0:19))
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=0:19)
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Actually, I prefer omitting the points corresponding to 0 counts,
which is
even simpler:
f <- table(art)
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=min(art):max(art))
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Best,
John
-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Michael Friendly
Sent: Tuesday, September 02, 2014 1:29 PM
To: R-help
Subject: [R] frequencies of a discrete numeric variable, including
zeros
The data vector, art, given below using dput(), gives a set of
discrete
numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a
custom
plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency,
Thanks to all who replied to this thread.
To summarize, John Fox and William Dunlap's suggestion amounts to this
plot, where it
becomes *crucial* to eliminate the zeros (otherwise they would not be
distinguishable
from the counts of 1, with points()):
# Fox/Dunlap plot, using plot.table method
art.tab0 <- table(art)
plot(art.tab0, ylab="Frequency", xlab="Number of articles")
points(as.numeric(names(art.tab0)), art.tab0, pch=16)
Here, I actually prefer the barplot, using factor() to retain the zeros:
# coerce to a factor, then use table()
art.fac <- factor(art, levels = 0:19)
art.tab <- table(art.fac)
barplot(art.tab, ylab="Frequency", xlab="Number of articles")
However, the frequencies for small values of art dominate the display,
and I'm contemplating a
Poisson regression anyway, so why not plot on a log scale:
# plot on log scale, but start at 1 to avoid log(0)
barplot(art.tab+1, ylab="log(Frequency+1)", xlab="Number of articles",
log="y")
# plot on log scale, directly
barplot(log(art.tab+1), ylab="log(Frequency+1)", xlab="Number of articles")
The first method, using log="y" gives axis labels on the scale of frequency.
best,
-Michael
On 9/2/2014 3:49 PM, John Fox wrote:
Hi Michael,
I think that histograms are intrinsically misleading for discrete data, and
that while bar graphs are an improvement, they also invite
misinterpretation. I generally do something like this:
f <- table(factor(art, levels=0:19))
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=0:19)
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Actually, I prefer omitting the points corresponding to 0 counts, which is
even simpler:
f <- table(art)
plot(as.numeric(names(f)), as.numeric(f), type="h",
xlab="art", ylab="frequency", axes=FALSE)
axis(1, pos=0, at=min(art):max(art))
axis(2)
points(as.numeric(names(f)), f, pch=16)
abline(h=0)
Best,
John
-----------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.socsci.mcmaster.ca/jfox/
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of Michael Friendly
Sent: Tuesday, September 02, 2014 1:29 PM
To: R-help
Subject: [R] frequencies of a discrete numeric variable, including
zeros
The data vector, art, given below using dput(), gives a set of
discrete
numeric values for 915 observations,
in the range of 0:19. I want to make some plots of the frequency
distribution, but the standard
tools (hist, barplot, table) don't give me what I want to make a custom
plot due to 0 frequencies
for some of the 0:19 counts.
table() excludes the values of art that occur with zero frequency, and
these are excluded in
barplot()
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street Web:http://www.datavis.ca
Toronto, ONT M3J 1P3 CANADA