simplifying a GLM-removing categorical variables
mariannej <marianne.james <at> abdn.ac.uk> writes:
I have created a GLM (using the quasipoisson family) and am now trying to simplify it. One of my explanatory variables is categorical (vegetation type, with 6 different levels). In the model, 5 of the 6 levels are significant and one is not. How should I simplify my model? Do I need to take out the whole category (i.e. all of vegetation type), or just the level that is not significant (but how would I explain this biologically?) Please spell out any anwers simply, I am new to R,
This is really a statistical rather than an R question,
but the short answer is: you probably shouldn't try to
remove the "non-significant" level. Depending on the
details of your model -- the "significance" of the parameters,
which I assume you're gleaning from summary(), refers
to the difference of the levels from the baseline (first)
level. If 5 out of the 6 levels are significantly different
from the baseline, then the factor belongs in the model.
(You could _conceivably_ try to lump the "non-significant"
level together with the baseline level, but this really
goes in the direction of data-dredging.)
I would strongly recommend that you consult a good
general text on generalized linear models for strategies
of model simplification and interpretation -- to repeat,
this is really a statistical question and not an
R-specific one ...
good luck,
Ben Bolker