reference category for factor in regression

Mon, Jan 19, 2009 8:51 AM

Jos,

See ?relevel for information on how to reorder the levels of a factor,
while being able to specify the reference level.

Basically, the first level of the factor is taken as the reference. If
you want to utilize a different ordering, as an alternative to the
above, simply use:

  AGE <- factor(AGE, levels = c(FirstLevel, SecondLevel, ...)

BTW, you might want to review Frank Harrell's page on why categorizing a
continuous variable is not a good idea:

  http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous

HTH,

Marc Schwartz

on 01/19/2009 09:52 AM Jos Elkink wrote:

Hi Thierry,

Thanks for your quick answer. The problem is not so much the LABOUR
variable, however, but the AGE variable, which consists of about 5
categories for which I do indeed not create separate dummy variables.
But R does not behave as expected when deciding on which dummy to use
as reference category ...

Jos

On Mon, Jan 19, 2009 at 2:37 PM, ONKELINX, Thierry
<Thierry.ONKELINX at inbo.be> wrote:

Dear Jos,

In R you don't need to create you own dummy variables. Just create a
factor variable LABOUR (with two levels) and rerun your model. Then you
should be able to calculate all coefficients.

HTH,

Thierry

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature
and Forest
Cel biometrie, methodologie en kwaliteitszorg / Section biometrics,
methodology and quality assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
Namens Jos Elkink
Verzonden: maandag 19 januari 2009 15:16
Aan: r-help at r-project.org
Onderwerp: [R] reference category for factor in regression

Hi all,

I am struggling with a strange issue in R that I have not encountered
before and I am not sure how to resolve this.

The model looks like this, with all irrelevant variables left out:

LABOUR - a dummy variable
NONLABOUR = 1 - LABOUR
AGE - a categorical variable / factor
VOTE - a dummy variable

glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE,
family=binomial(link="logit"))

In other words, a standard interaction model, but I want to know the
intercepts and coefficients for each of the two cases (LABOUR and
NONLABOUR), instead of getting coefficients for the differences as in
a normal interaction model.

But the strange thing is, for the two occurances of the AGE variable,
it makes a different choice as to which AGE category to leave out of
the regression. The cross-table of AGE with LABOUR does not have empty
cells.

Anyone any idea what might be going wrong? Or what I could do about
this?

Thanks in advance for any help!

Regards,

Jos

reference category for factor in regression

Thread (9 messages)