Skip to content

predict nbinomial glm

3 messages · K. Steinmann, Brian Ripley, Sundar Dorai-Raj

#
Dear R-helpers,

let us assume, that I have the following dataset:

a <- rnbinom(200, 1, 0.5)
b <- (1:200)
c <- (30:229)
d <- rep(c("q", "r", "s", "t"), rep(50,4))
data_frame <- data.frame(a,b,c,d)

In a first step I run a glm.nb (full code is given at the end of this mail) and
want to predict my response variable a.
In a second step, I would like to run a glm.nb based on a subset of the
data_frame. As soon as I want to predict the response variable a, I get the
following error message:
"Error in model.frame.default(Terms, newdata, na.action = na.action, xlev =
object$xlevels) :
        factor d has new level(s) q"

Does anybody have a solution to this problem?

Thank you in advance,
K. Steinmann (working with R 2.0.0)


Code:

library(MASS)

a <- rnbinom(200, 1, 0.5)
b <- (1:200)
c <- (30:229)
d <- rep(c("q", "r", "s", "t"), rep(50,4))


data_frame <- data.frame(a,b,c,d)


model_1 = glm.nb(a ~ b + d , data = data_frame)


pred_model_1 = predict(model_1, newdata = data_frame, type = "response", se.fit
= FALSE, dispersion = NULL, terms = NULL)


subset_of_dataframe = subset(data_frame, (b > 80 & c < 190 ))


model_2 = glm.nb(a ~ b + d , data = subset_of_dataframe)
pred_model_2 = predict(model_2, newdata = subset_of_dataframe, type =
"response", se.fit = FALSE, dispersion = NULL, terms = NULL)
#
This is seems to be an unstated repeat of much of an earlier and 
unanswered post

 	https://stat.ethz.ch/pipermail/r-help/2005-August/075914.html

entitled

 	[R] error in predict glm (new levels cause problems)

It is nothing to do with `nbinomial glm' (sic): all model fitting 
functions including lm and glm do this.  The reason you did not get at 
least one reply from your first post is that you seemed not to have done 
your homework.  (One thing the posting guide does ask is for you to try 
the current version of R, and yours is three versions old.)

The code is protecting you from an attempt at statistical nonsense. 
(Indeed, the check was added to catch such misuses.)  Your email address 
seems to be that of a student, so please seek the help of your advisor. 
You seem surprised that you are not allowed to make predictions about 
levels for which you have supplied no relevant data.
On Tue, 16 Aug 2005, K. Steinmann wrote:

            

  
    
#
Katharina,

I agree with Prof. Ripley's assessment. But, perhaps one thing you may 
have overlooked is that subset.data.frame does not remove unused levels. So,

 > subset_of_dataframe = subset(data_frame, (b > 80 & c < 190))
 > levels(subset_of_dataframe$d)
[1] "q" "r" "s" "t"
 > table(subset_of_dataframe$d)
  q  r  s  t
  0 20 50 10

Even though the level "q" does not appear it is still a level of "d". 
Perhaps you need to do the following after the subset:

subset_of_dataframe[] <- lapply(subset_of_dataframe, "[", drop = TRUE)

which drops all unused levels from factors.

I'm not sure if your problem is statistical in nature or simply a 
misunderstanding of the software. I'm only attempting to answer the 
latter. As Prof. Ripley suggests, discuss any statistical problem (i.e. 
predicting on missing levels) with your advisor.

HTH,

--sundar

P.S. Also, update R. It's free.
Prof Brian Ripley wrote: