Hi Thierry,
Thanks for your quick answer. The problem is not so much the LABOUR
variable, however, but the AGE variable, which consists of about 5
categories for which I do indeed not create separate dummy variables.
But R does not behave as expected when deciding on which dummy to use
as reference category ...
Jos
On Mon, Jan 19, 2009 at 2:37 PM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:
Dear Jos,
In R you don't need to create you own dummy variables. Just create a
factor variable LABOUR (with two levels) and rerun your model. Then you
should be able to calculate all coefficients.
HTH,
Thierry
------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data.
~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey
-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Jos Elkink
Verzonden: maandag 19 januari 2009 15:16
Aan: r-help at r-project.org
Onderwerp: [R] reference category for factor in regression
Hi all,
I am struggling with a strange issue in R that I have not encountered
before and I am not sure how to resolve this.
The model looks like this, with all irrelevant variables left out:
LABOUR - a dummy variable
NONLABOUR = 1 - LABOUR
AGE - a categorical variable / factor
VOTE - a dummy variable
glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE,
family=binomial(link="logit"))
In other words, a standard interaction model, but I want to know the
intercepts and coefficients for each of the two cases (LABOUR and
NONLABOUR), instead of getting coefficients for the differences as in
a normal interaction model.
But the strange thing is, for the two occurances of the AGE variable,
it makes a different choice as to which AGE category to leave out of
the regression. The cross-table of AGE with LABOUR does not have empty
cells.
Anyone any idea what might be going wrong? Or what I could do about
this?
Thanks in advance for any help!
Regards,
Jos
--
Johan A. Elkink
Lecturer
School of Politics and International Relations & CHS Graduate School
University College Dublin
Ph. +353 1 716 7026 | Library Building, Rm 512
http://jaeweb.cantr.net