Skip to content

Using predict() After Adding a Factor to a glm.nb() Model

2 messages · 077315q at acadiau.ca, David L Carlson

#
# Hello,

# I have a data set that looks something like the following:

site<-c(rep('a',5),rep('b',2),rep('c',4),rep('d',11))
year<-c(1980, 1981, 1982, 1993, 1995, 1980, 1983, 1981, 1993, 1995,  
1999, c(1980:1990))
count<-c(60,35,36,12,8,112,98,20,13,15,15,65,43,49,51,34,33,33,33,40,11,0)

data<-data.frame(site, year, count)

# > site year count
# 1     a 1980    60
# 2     a 1981    35
# 3     a 1982    36
# 4     a 1993    12
# .
# .
# .

# I ran a negative binomial glm using:

library(MASS)
model_a<-glm.nb(count~year, data=data)

# then extracted predicted values using:

py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
p1<-predict(model_a, newdata=py, se.fit=TRUE, type='response')
f1<-p1$fit

plot(count~year, data=data)
lines(py$year, f1)

# Works perfectly.

# Now, I want to add site as a factor:
model_b<-glm.nb(count~year+as.factor(site), data=data)

# I have tried a couple different ways of predicting the values based  
on model_b, but am having trouble.

py<-data.frame(seq(from=min(data$year), to=max(data$year), by=1))
p1<-predict(model_b, newdata=py, se.fit=TRUE, type='response')

# gives:
# >Error in model.frame.default(Terms, newdata, na.action = na.action,  
xlev = object$xlevels) :
#   variable lengths differ (found for 'as.factor(site)')
# In addition: Warning message:
# 'newdata' had 20 rows but variable(s) found have 22 rows

py<-expand.grid(data$site, data$year)
names(py)<-c('site','year')
p1<-predict(model_b, newdata=py)

# This works, but results in 484 values, and I can't plot a line over  
my points.

# There is probably a simple solution, but I'm having trouble wrapping  
my mind around it. Mind you, this is also a last
# minute change to my thesis, and I haven't slept in about three days.

# Any suggestions? I would be extremely grateful...

# Banging my head against the wall,
# A stressed out grad student
#
Actually the first one did not work. You left out the warning message:
Warning message:
'newdata' had 20 rows but variable(s) found have 22 rows

Since py did not contain the explanatory variable "year," predict() threw a
warning and then just used the values in data giving you the same results
that you already have stored in model_a. Compare
1         2         3         4         5         6         7
8 
61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622
55.163093 
        9        10        11        12        13        14        15
16 
15.124074 12.190050  7.919156 61.444110 55.163093 49.524141 44.461622
39.916610 
       17        18        19        20        21        22 
35.836204 32.172911 28.884091 25.931465 23.280666 20.900841
1         2         3         4         5         6         7
8 
61.444110 55.163093 49.524141 15.124074 12.190050 61.444110 44.461622
55.163093 
        9        10        11        12        13        14        15
16 
15.124074 12.190050  7.919156 61.444110 55.163093 49.524141 44.461622
39.916610 
       17        18        19        20        21        22 
35.836204 32.172911 28.884091 25.931465 23.280666 20.900841

Set the variable name to year in py and then use predict():
Now run your plotting commands. The plot may not look much different, but
now it is based on the 20 evenly spaced values in py instead of the 22
original data values in data.

Now you have the right idea for adding site to py but instead of using the
irregularly spaced data, use evenly spaced values. Then you will have to
plot separate lines for each site:
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352