Problem with categorical variable coefficients and se in glm

WillM

Sat, Nov 3, 2012 4:15 PM

Hi all

I'm hoping that this is something that people deal with regularly and you
can help me out quickly even though it is a bit more of a stats question
than R.

I have a dataset where data$Resp is a count response variable (lots of 0s)
so I used a negative binomial glm with a categorical response variable. The
categories are the types of vegetation that I stratified my sampling by - so
they are not an arbitrary post hoc decision.

The UM category only has 0's for a response and produces a large coefficient
and large standard error (see output below). So I added a small number (1)
to one row of the UM category to explore what was happening and get a better
result. With a continuous response variable you can add a very small number
(say 0.001) so that it is still representative of 0, but with this count
data, 1 is the minimum.

I get a better estimate, but is there some better way of dealing with this
type of situation? I could possibly combine UM and UB categories, but I did
want to keep them separate.

Thanks alot :)

Resp Cat
1     0   D
2     0   D
3     0   D
4     0   D
5     3   D
6     0   D
7     0   D
8     0   D
9    11   F
10   11   F
11    3   F
12   14   F
13   19   F
14   41   F
15   12   S
16   55   S
17    3   S
18    0   S
19    0   S
20   30   F
21    4   F
22   10   F
23   99  DS
24    3  DS
25    1  DS
26    7  DS
27    4  DS
28    0  DS
29    2  DS
30    1  DS
31    0  UB
32    0  UB
33    0  UB
34    0  UB
35    1  UB
36    0  UM
37    0  UM
38    0  UM
39    0  UM
40    0  UM

Call:
glm.nb(formula = data$Resp ~ data$Cat, data = data, init.theta =
0.5087557508, 
    link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.85799  -0.75714  -0.58082  -0.00009   1.95946  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -0.9808     0.7609  -1.289 0.197409    
data$CatF      3.7464     0.8969   4.177 2.95e-05 ***
data$CatS      3.6199     0.9932   3.645 0.000268 ***
data$CatDS     3.6636     0.9128   4.013 5.99e-05 ***
data$CatUB    -0.6286     1.4043  -0.448 0.654427    
data$CatUM   -18.3218  4215.7113  -0.004 0.996532    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for Negative Binomial(0.5088) family taken to be 1)

    Null deviance: 77.163  on 39  degrees of freedom
Residual deviance: 33.700  on 34  degrees of freedom
AIC: 191.12

Number of Fisher Scoring iterations: 1


              Theta:  0.509 
          Std. Err.:  0.152 

 2 x log-likelihood:  -177.120

Call:
glm.nb(formula = data1$Resp ~ data1$Cat, data = data1, init.theta =
0.515774723, 
    link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8671  -0.7593  -0.5814  -0.2098   1.9726  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -0.9808     0.7587  -1.293 0.196112    
data1$CatF    3.7464     0.8934   4.194 2.75e-05 ***
data1$CatS    3.6199     0.9888   3.661 0.000251 ***
data1$CatDS   3.6636     0.9092   4.030 5.59e-05 ***
data1$CatUB  -0.6286     1.4012  -0.449 0.653712    
data1$CatUM  -0.6286     1.4012  -0.449 0.653712    
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for Negative Binomial(0.5158) family taken to be 1)

    Null deviance: 76.133  on 39  degrees of freedom
Residual deviance: 36.328  on 34  degrees of freedom
AIC: 196.69

Number of Fisher Scoring iterations: 1


              Theta:  0.516 
          Std. Err.:  0.152 

 2 x log-likelihood:  -182.686 


Thanks! 



--
View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Problem-with-categorical-variable-coefficients-and-se-in-glm-tp7581579.html
Sent from the R-sig-geo mailing list archive at Nabble.com.

Problem with categorical variable coefficients and se in glm

Thread (3 messages)