reference category for factor in regression
Jos, See ?relevel for information on how to reorder the levels of a factor, while being able to specify the reference level. Basically, the first level of the factor is taken as the reference. If you want to utilize a different ordering, as an alternative to the above, simply use: AGE <- factor(AGE, levels = c(FirstLevel, SecondLevel, ...) BTW, you might want to review Frank Harrell's page on why categorizing a continuous variable is not a good idea: http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/CatContinuous HTH, Marc Schwartz
on 01/19/2009 09:52 AM Jos Elkink wrote:
Hi Thierry, Thanks for your quick answer. The problem is not so much the LABOUR variable, however, but the AGE variable, which consists of about 5 categories for which I do indeed not create separate dummy variables. But R does not behave as expected when deciding on which dummy to use as reference category ... Jos On Mon, Jan 19, 2009 at 2:37 PM, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be> wrote:
Dear Jos, In R you don't need to create you own dummy variables. Just create a factor variable LABOUR (with two levels) and rerun your model. Then you should be able to calculate all coefficients. HTH, Thierry ------------------------------------------------------------------------ ---- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey -----Oorspronkelijk bericht----- Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] Namens Jos Elkink Verzonden: maandag 19 januari 2009 15:16 Aan: r-help at r-project.org Onderwerp: [R] reference category for factor in regression Hi all, I am struggling with a strange issue in R that I have not encountered before and I am not sure how to resolve this. The model looks like this, with all irrelevant variables left out: LABOUR - a dummy variable NONLABOUR = 1 - LABOUR AGE - a categorical variable / factor VOTE - a dummy variable glm(VOTE ~ 0 + LABOUR + NONLABOUR + LABOUR : AGE + NONLABOUR : AGE, family=binomial(link="logit")) In other words, a standard interaction model, but I want to know the intercepts and coefficients for each of the two cases (LABOUR and NONLABOUR), instead of getting coefficients for the differences as in a normal interaction model. But the strange thing is, for the two occurances of the AGE variable, it makes a different choice as to which AGE category to leave out of the regression. The cross-table of AGE with LABOUR does not have empty cells. Anyone any idea what might be going wrong? Or what I could do about this? Thanks in advance for any help! Regards, Jos