Skip to content
Prev 313605 / 398502 Next

How to select a subset data to do a barplot in ggplot2

Hi:

The simplest way to do it is to modify the input data frame by taking
out the records not having status live or dead and then redefining the
factor in the new data frame to get rid of the removed levels. Calling
your input data frame DF rather than data,

DF <- structure(list(FID = c(1L, 1L, 2L, 2L, 2L, 2L, 6L, 6L, 10L, 10L,
10L, 11L, 11L, 11L, 12L, 17L, 17L, 17L), IID = c(4621L, 4628L,
4631L, 4632L, 4633L, 4634L, 4675L, 4679L, 4716L, 4719L, 4721L,
4726L, 4728L, 4730L, 4732L, 4783L, 4783L, 4784L), STATUS = structure(c(2L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 3L, 2L, 2L, 2L,
2L), .Label = c("dead", "live", "nosperm"), class = "factor")), .Names
= c("FID",
"IID", "STATUS"), class = "data.frame", row.names = c(NA, -18L
))

# The right hand side above came from dput(DF), where DF was created by
# DF <- read.table(textConnection("<your posted data>"), header = TRUE)
# Consider using dput() to represent your data in the future.

# Retain the records with status live or dead only
DF2 <- DF[DF$STATUS %in% c("live", "dead"), ]

# This does not get rid of the original levels...
levels(DF2$STATUS)
# ...so redefine the factor
DF2$STATUS <- factor(DF2$STATUS)
'data.frame':   16 obs. of  3 variables:
 $ FID   : int  1 1 2 2 2 2 6 6 10 10 ...
 $ IID   : int  4621 4628 4631 4632 4633 4634 4675 4679 4716 4719 ...
 $ STATUS: Factor w/ 2 levels "dead","live": 2 1 2 2 2 2 2 1 1 2 ...

# now plot:

# (1) FID numeric
ggplot(DF2, aes(x = FID, fill = STATUS)) + geom_bar()

# (2) FID factor
ggplot(DF2, aes(x = factor(FID), fill = STATUS)) + geom_bar()

The second one makes more sense to me, but you may have reasons to
prefer the first.

Dennis
On Thu, Dec 13, 2012 at 4:38 AM, Yao He <yao.h.1988 at gmail.com> wrote: