Skip to content
Prev 16577 / 29559 Next

Problem with categorical variable coefficients and se in glm

Hi all

I'm hoping that this is something that people deal with regularly and you
can help me out quickly even though it is a bit more of a stats question
than R.

I have a dataset where data$Resp is a count response variable (lots of 0s)
so I used a negative binomial glm with a categorical response variable. The
categories are the types of vegetation that I stratified my sampling by - so
they are not an arbitrary post hoc decision.

The UM category only has 0's for a response and produces a large coefficient
and large standard error (see output below). So I added a small number (1)
to one row of the UM category to explore what was happening and get a better
result. With a continuous response variable you can add a very small number
(say 0.001) so that it is still representative of 0, but with this count
data, 1 is the minimum.

I get a better estimate, but is there some better way of dealing with this
type of situation? I could possibly combine UM and UB categories, but I did
want to keep them separate.

Thanks alot :)
Resp Cat
1     0   D
2     0   D
3     0   D
4     0   D
5     3   D
6     0   D
7     0   D
8     0   D
9    11   F
10   11   F
11    3   F
12   14   F
13   19   F
14   41   F
15   12   S
16   55   S
17    3   S
18    0   S
19    0   S
20   30   F
21    4   F
22   10   F
23   99  DS
24    3  DS
25    1  DS
26    7  DS
27    4  DS
28    0  DS
29    2  DS
30    1  DS
31    0  UB
32    0  UB
33    0  UB
34    0  UB
35    1  UB
36    0  UM
37    0  UM
38    0  UM
39    0  UM
40    0  UM
Call:
glm.nb(formula = data$Resp ~ data$Cat, data = data, init.theta =
0.5087557508, 
    link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.85799  -0.75714  -0.58082  -0.00009   1.95946  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -0.9808     0.7609  -1.289 0.197409    
data$CatF      3.7464     0.8969   4.177 2.95e-05 ***
data$CatS      3.6199     0.9932   3.645 0.000268 ***
data$CatDS     3.6636     0.9128   4.013 5.99e-05 ***
data$CatUB    -0.6286     1.4043  -0.448 0.654427    
data$CatUM   -18.3218  4215.7113  -0.004 0.996532    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for Negative Binomial(0.5088) family taken to be 1)

    Null deviance: 77.163  on 39  degrees of freedom
Residual deviance: 33.700  on 34  degrees of freedom
AIC: 191.12

Number of Fisher Scoring iterations: 1


              Theta:  0.509 
          Std. Err.:  0.152 

 2 x log-likelihood:  -177.120
Call:
glm.nb(formula = data1$Resp ~ data1$Cat, data = data1, init.theta =
0.515774723, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8671  -0.7593  -0.5814  -0.2098   1.9726  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.9808     0.7587  -1.293 0.196112    
data1$CatF    3.7464     0.8934   4.194 2.75e-05 ***
data1$CatS    3.6199     0.9888   3.661 0.000251 ***
data1$CatDS   3.6636     0.9092   4.030 5.59e-05 ***
data1$CatUB  -0.6286     1.4012  -0.449 0.653712    
data1$CatUM  -0.6286     1.4012  -0.449 0.653712    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for Negative Binomial(0.5158) family taken to be 1)

    Null deviance: 76.133  on 39  degrees of freedom
Residual deviance: 36.328  on 34  degrees of freedom
AIC: 196.69

Number of Fisher Scoring iterations: 1


              Theta:  0.516 
          Std. Err.:  0.152 

 2 x log-likelihood:  -182.686 


Thanks! 



--
View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Problem-with-categorical-variable-coefficients-and-se-in-glm-tp7581579.html
Sent from the R-sig-geo mailing list archive at Nabble.com.