# Hello,
# I have a data set that looks something like the following:
site<-c(rep('a',5),rep('b',2),rep('c',4),rep('d',11))
year<-c(1980, 1981, 1982, 1993, 1995, 1980, 1983, 1981, 1993, 1995,
1999, c(1980:1990))
count<-c(60,35,36,12,8,112,98,20,13,15,15,65,43,49,51,34,33,33,33,40,11,0)
data<-data.frame(site, year, count)
# > site year count
# 1 a 1980 60
# 2 a 1981 35
# 3 a 1982 36
# 4 a 1993 12
# .
# .
# .
# I ran a negative binomial glm using:
library(MASS)
model_a<-glm.nb(count~year, data=data)
# then extracted predicted values using:
py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
f1<-p1$fit
plot(count~year, data=data)
lines(py$year, f1)
# Works perfectly.
# Now, I want to add site as a factor:
model_b<-glm.nb(count~year+as.factor(site), data=data)
# I have tried a couple different ways of predicting the values based
on model_b, but am having trouble.
py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
p1<-predict(model_b, newdata=py, se.fit=TRUE, type='response')
# gives:
# >Error in model.frame.default(Terms, newdata, na.action = na.action,
xlev = object$xlevels) :
# variable lengths differ (found for 'as.factor(site)')
# In addition: Warning message:
# 'newdata' had 20 rows but variable(s) found have 22 rows
py<-expand.grid(data$site, data$year)
names(py)<-c('site','year')
p1<-predict(model_b, newdata=py)
# This works, but results in 484 values, and I can't plot a line over
my points.
# There is probably a simple solution, but I'm having trouble wrapping
my mind around it. Mind you, this is also a last
# minute change to my thesis, and I haven't slept in about three days.
# Any suggestions? I would be extremely grateful...
# Banging my head against the wall,
# A stressed out grad student
Using predict() After Adding a Factor to a glm.nb() Model
2 messages · 077315q at acadiau.ca, David L Carlson
Actually the first one did not work. You left out the warning message:
py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1)) p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
Warning message: 'newdata' had 20 rows but variable(s) found have 22 rows Since py did not contain the explanatory variable "year," predict() threw a warning and then just used the values in data giving you the same results that you already have stored in model_a. Compare
model_a$fitted.values
1 2 3 4 5 6 7
8
61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622
55.163093
9 10 11 12 13 14 15
16
15.124074 12.190050 7.919156 61.444110 55.163093 49.524141 44.461622
39.916610
17 18 19 20 21 22
35.836204 32.172911 28.884091 25.931465 23.280666 20.900841
p1$fit
1 2 3 4 5 6 7
8
61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622
55.163093
9 10 11 12 13 14 15
16
15.124074 12.190050 7.919156 61.444110 55.163093 49.524141 44.461622
39.916610
17 18 19 20 21 22
35.836204 32.172911 28.884091 25.931465 23.280666 20.900841
Set the variable name to year in py and then use predict():
py<-data.frame(year=seq(from=min(data$year), to=max(data$year), by=1)) p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
Now run your plotting commands. The plot may not look much different, but now it is based on the 20 evenly spaced values in py instead of the 22 original data values in data. Now you have the right idea for adding site to py but instead of using the irregularly spaced data, use evenly spaced values. Then you will have to plot separate lines for each site:
model_b<-glm.nb(count~year+site, data=data)
py2 <- expand.grid(site=letters[1:4], year=1980:1999)
p2<-predict(model_b, newdata=py2, se.fit=TRUE, type='response')
py2 <- data.frame(py2, p2)
plot(count~year, data=data, pch=as.character(site))
x <- unstack(py2, py2$year~py2$site)
y <- unstack(py2, py2$fit~py2$site)
matlines(x, y)
legend("topright", letters[1:4], col=1:4, lty=1:4)
---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
project.org] On Behalf Of 077315q at acadiau.ca
Sent: Saturday, September 08, 2012 2:47 PM
To: r-help at r-project.org
Subject: [R] Using predict() After Adding a Factor to a glm.nb() Model
# Hello,
# I have a data set that looks something like the following:
site<-c(rep('a',5),rep('b',2),rep('c',4),rep('d',11))
year<-c(1980, 1981, 1982, 1993, 1995, 1980, 1983, 1981, 1993, 1995,
1999, c(1980:1990))
count<-
c(60,35,36,12,8,112,98,20,13,15,15,65,43,49,51,34,33,33,33,40,11,0)
data<-data.frame(site, year, count)
# > site year count
# 1 a 1980 60
# 2 a 1981 35
# 3 a 1982 36
# 4 a 1993 12
# .
# .
# .
# I ran a negative binomial glm using:
library(MASS)
model_a<-glm.nb(count~year, data=data)
# then extracted predicted values using:
py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
f1<-p1$fit
plot(count~year, data=data)
lines(py$year, f1)
# Works perfectly.
# Now, I want to add site as a factor:
model_b<-glm.nb(count~year+as.factor(site), data=data)
# I have tried a couple different ways of predicting the values based
on model_b, but am having trouble.
py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
p1<-predict(model_b, newdata=py, se.fit=TRUE, type='response')
# gives:
# >Error in model.frame.default(Terms, newdata, na.action = na.action,
xlev = object$xlevels) :
# variable lengths differ (found for 'as.factor(site)')
# In addition: Warning message:
# 'newdata' had 20 rows but variable(s) found have 22 rows
py<-expand.grid(data$site, data$year)
names(py)<-c('site','year')
p1<-predict(model_b, newdata=py)
# This works, but results in 484 values, and I can't plot a line over
my points.
# There is probably a simple solution, but I'm having trouble wrapping
my mind around it. Mind you, this is also a last
# minute change to my thesis, and I haven't slept in about three days.
# Any suggestions? I would be extremely grateful...
# Banging my head against the wall,
# A stressed out grad student
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code.