n is the number of insurance policies
y is the number of claims
car is the car in an insurance category
age is the age of policy holder
dist is the district where the policy holder lived (1 for London and other
major cities, and 0 otherwise)
Data:
car age dist y n
1 1 0 65 317
1 2 0 65 476
1 3 0 52 486
1 4 0 310 3259
2 1 0 98 486
2 2 0 159 1004
2 3 0 175 1355
2 4 0 877 7660
3 1 0 41 223
3 2 0 117 539
3 3 0 137 697
3 4 0 477 3442
4 1 0 11 40
4 2 0 35 148
4 3 0 39 214
4 4 0 167 1019
1 1 1 2 20
1 2 1 5 33
1 3 1 4 40
1 4 1 36 316
2 1 1 7 31
2 2 1 10 81
2 3 1 22 122
2 4 1 102 724
3 1 1 5 18
3 2 1 7 39
3 3 1 16 68
3 4 1 63 344
4 1 1 0 3
4 2 1 6 16
4 3 1 8 25
4 4 1 33 114
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I need help finding the correct R code to construct models. According to the
previous study, the model in (2) "is simpler than (1), fits well (deviance =
53.11, d.f. = 60, p-value = 0.72) and gives coefficients (standard errors):
AGE, ? 0.177 (0.018); CAR, 0.198 (0.021); DIST, 0.210 (0.059)."
As of the first model, I think that I should use this code, but not sure:
firstmodel<-glm(y~factor(age)*factor(car)*factor(dist),family=poisson)
As of the second model, I used this code, but it produces results that
contradict what the previous study says (and deleting intercept does not
help):
secondmodel<-glm(y~age+car+factor(dist),family=poisson)
summary(secondmodel)
Call:
glm(formula = y ~ age + car + factor(dist), family = poisson)
Deviance Residuals:
Min 1Q Median 3Q Max
-14.0258 -3.3200 -0.6296 2.0575 18.1442
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.08222 0.08127 37.92 <2e-16 ***
age 0.83664 0.02067 40.48 <2e-16 ***
car -0.16723 0.01612 -10.37 <2e-16 ***
factor(dist)1 -2.15937 0.05849 -36.92 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 5660.6 on 31 degrees of freedom
Residual deviance: 1154.5 on 28 degrees of freedom
AIC: 1330.8
Number of Fisher Scoring iterations: 5