Poisson regression in R

glmstat

Sat, Mar 1, 2008 10:36 PM #

I have these questions:
(1) Use Poisson regression to estimate the main effects of car, age, and
dist (each treated as categorical and modelled using indicator variables)
and interaction terms.  
(2) It was determined by one study that all the interactions were
unimportant and decided that age and car could be treated as though they
were continuous variables. Fit a model incorporating these features and
compare it with the best model obtained in (1). 

n is the number of insurance policies
y is the number of claims
car is the car in an insurance category
age is the age of policy holder
dist is the district where the policy holder lived (1 for London and other
major cities, and 0 otherwise)

Data:

car	age	dist	y	n
1	1	0	65	317
1	2	0	65	476
1	3	0	52	486
1	4	0	310	3259
2	1	0	98	486
2	2	0	159	1004
2	3	0	175	1355
2	4	0	877	7660
3	1	0	41	223
3	2	0	117	539
3	3	0	137	697
3	4	0	477	3442
4	1	0	11	40
4	2	0	35	148
4	3	0	39	214
4	4	0	167	1019
1	1	1	2	20
1	2	1	5	33
1	3	1	4	40
1	4	1	36	316
2	1	1	7	31
2	2	1	10	81
2	3	1	22	122
2	4	1	102	724
3	1	1	5	18
3	2	1	7	39
3	3	1	16	68
3	4	1	63	344
4	1	1	0	3
4	2	1	6	16
4	3	1	8	25
4	4	1	33	114


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I need help finding the correct R code to construct models. According to the
previous study, the model in (2) "is simpler than (1), fits well (deviance =
53.11, d.f. = 60, p-value = 0.72) and gives coefficients (standard errors):
AGE, ? 0.177 (0.018); CAR, 0.198 (0.021); DIST, 0.210 (0.059)." 

As of the first model, I think that I should use this code, but not sure:

As of the second model, I used this code, but it produces results that
contradict what the previous study says (and deleting intercept does not
help):

Call:
glm(formula = y ~ age + car + factor(dist), family = poisson)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-14.0258   -3.3200   -0.6296    2.0575   18.1442  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    3.08222    0.08127   37.92   <2e-16 ***
age            0.83664    0.02067   40.48   <2e-16 ***
car           -0.16723    0.01612  -10.37   <2e-16 ***
factor(dist)1 -2.15937    0.05849  -36.92   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 5660.6  on 31  degrees of freedom
Residual deviance: 1154.5  on 28  degrees of freedom
AIC: 1330.8

Number of Fisher Scoring iterations: 5

View this message in context: http://www.nabble.com/Poisson-regression-in-R-tp15784126p15784126.html
Sent from the R help mailing list archive at Nabble.com.

Peter Dalgaard

Sun, Mar 2, 2008 12:27 AM #

glmstat wrote:

This looks like homework, so only hints are offered.

You don't seem to be using n, consider incorporating an offset (I would 
expect most texts on Poison regr. to discuss this).

n is the number of insurance policies
y is the number of claims
car is the car in an insurance category
age is the age of policy holder
dist is the district where the policy holder lived (1 for London and other
major cities, and 0 otherwise)

Data:

car	age	dist	y	n
1	1	0	65	317
1	2	0	65	476
1	3	0	52	486
1	4	0	310	3259
2	1	0	98	486
2	2	0	159	1004
2	3	0	175	1355
2	4	0	877	7660
3	1	0	41	223
3	2	0	117	539
3	3	0	137	697
3	4	0	477	3442
4	1	0	11	40
4	2	0	35	148
4	3	0	39	214
4	4	0	167	1019
1	1	1	2	20
1	2	1	5	33
1	3	1	4	40
1	4	1	36	316
2	1	1	7	31
2	2	1	10	81
2	3	1	22	122
2	4	1	102	724
3	1	1	5	18
3	2	1	7	39
3	3	1	16	68
3	4	1	63	344
4	1	1	0	3
4	2	1	6	16
4	3	1	8	25
4	4	1	33	114


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I need help finding the correct R code to construct models. According to the
previous study, the model in (2) "is simpler than (1), fits well (deviance =
53.11, d.f. = 60, p-value = 0.72) and gives coefficients (standard errors):
AGE, ? 0.177 (0.018); CAR, 0.198 (0.021); DIST, 0.210 (0.059)." 

As of the first model, I think that I should use this code, but not sure:

firstmodel<-glm(y~factor(age)*factor(car)*factor(dist),family=poisson)

As of the second model, I used this code, but it produces results that
contradict what the previous study says (and deleting intercept does not
help):

secondmodel<-glm(y~age+car+factor(dist),family=poisson)
summary(secondmodel)

Call:
glm(formula = y ~ age + car + factor(dist), family = poisson)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-14.0258   -3.3200   -0.6296    2.0575   18.1442  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    3.08222    0.08127   37.92   <2e-16 ***
age            0.83664    0.02067   40.48   <2e-16 ***
car           -0.16723    0.01612  -10.37   <2e-16 ***
factor(dist)1 -2.15937    0.05849  -36.92   <2e-16 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 5660.6  on 31  degrees of freedom
Residual deviance: 1154.5  on 28  degrees of freedom
AIC: 1330.8

Number of Fisher Scoring iterations: 5

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907