"hadley wickham" <h.wickham at gmail.com> writes:
On 7/12/07, Pete Kazmier <pete-expires-20070910 at kazmier.com> wrote:
I'm an R newbie but recently discovered the ggplot2 and reshape
packages which seem incredibly useful and much easier to use for a
beginner. Using the data from the IMDB, I'm trying to see how the
average movie rating varies by year. Here is what my data looks like:
ratings <- read.delim("groomed.list", header = TRUE, sep = "|", comment.char = "")
ratings <- subset(ratings, VoteCount > 100)
head(ratings)
Title Histogram VoteCount VoteMean Year
1 !Huff (2004) (TV) 0000000016 299 8.4 2004
8 'Allo 'Allo! (1982) 0000000125 829 8.6 1982
50 .hack//SIGN (2002) 0000001113 150 7.0 2002
56 1-800-Missing (2003) 0000000103 118 5.4 2003
66 Greatest Artists (2000) (mini) 00..000016 110 7.8 2000
77 00 Scariest Movie (2004) (mini) 00..000115 256 8.6 2004
Have you tried using the movies dataset included in ggplot? Or is
there some data that you want that is not in that dataset.
It's funny that you mention this because I had intended to write this
email about a month ago but was delayed due to other reasons. In any
case, when I was typing this up last night, I wanted to recreate my
steps but I could not find the IMDB movie data I had used originally.
I searched everywhere to no avail so I downloaded the data myself and
groomed it. Only now do I remember that I had used the movies dataset
included in ggplot.
How do 'byYear' and 'byYear2' differ? I am trying to use 'typeof' but
both seem to be lists. However, they are clearly different in some
way because 'qplot' graphs them differently.
Try using str - it's much more helpful, and you should see the
different quickly.
Thanks! This is the function I've been looking for in my quest to
learn about internal data types of R. Too bad it has such a terrible
name!
Using the built in movies data:
mm <- melt(movies, id=1:2, m=c("rating", "votes"))
msum <- cast(mm, year ~ variable, c(mean, sum))
qplot(year, rating_mean, data=msum, colour=votes_sum)
qplot(year, rating_mean, data=msum, colour=votes_sum, geom="line")
Great! This is exactly what I was looking to do. By the way, does
any of your documentation use the movie dataset as an example? I'm
curious what else I can do with the dataset. For example, how can I
use ggplot's facets to see the same information by type of movie? I'm
unsure of how to manipulate the binary variables into a single
variable so that it can be treated as levels.