Dummy variables in Factors with more than 2 levels
By default, R uses the 'opposite' approach: the intercept is the mean of the first level, and the other parameters of the differences between the first level and that level. See ?contrasts Hank
On May 21, 2008, at 5:59 AM, carlos ramirez wrote:
Hi All,
Sorry to bother with a
basic question.
I was wondering how R
manages dummy variables when computing factors with more than 2
levels. For
instance in my study I have the variable ?stress? with 3 levels
(?pre-tonic?, ?tonic?,
and ?pos-tonic? coded, ?1?, 2? and ?3? respectively).
Programs such as SPSS transform
nominal and ordinal categories into sets of dichotomies ( dummy
variables) in
such a way that a computed dummy variable 1 (dummy pre-tonic) will
assign 1 to
all pre-tonic stress and ?0? to all the others. Dummy variable 2
(dummy tonic)
assigns ?1? to all tonic data and ?0? to the rest. By default SPSS
leaves the
last level as the ?reference category? (in this case post-tonic)
for comparison.
Using what is called the ?indicator contrast?. Thus, the coding
ends up being
something like the example below
------------------------------------------------
Dummy variables Value
Coding (1) (2)Stress
1 1.000 .000 2 .000
1.000 3 .000 .000
Thus, in the outcome, Beta (B) and Exp (B) do not present the odds
ratio of the dependent variable in relation to the independent
variable but odds
ratio of the dummy variables with respect to the reference category
(post-tonic
in this case).
When I run the mix log model in R I get an
outpost like the following.
Generalized linear
mixed model fit using Laplace
Formula: Identif ~ (1
| Subj) + (1 | Item) + Place + Stress
+ Voicing
Data: idcrg1
Family: binomial(logit link)
AIC
BIC logLik deviance
1163 1211 -572.6 1145
Random effects:
Groups Name Variance Std.Dev.
Subj
(Intercept) 0.63178 0.79485
Item
(Intercept) 0.88192 0.93910
number of obs: 1476,
groups: Subj, 41; Item, 36
Estimated scale
(compare to 1 ) 0.888108
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.9920
0.4948 4.026 5.67e-05 ***
Place2 -0.7253 0.4376
-1.658 0.0974 .
Place3 -0.1389 0.4478
-0.310 0.7565
Stress2 0.8765 0.4493
1.951 0.0511 .
Stress3 -0.2386 0.4298
-0.555 0.5788
Voicing2 0.6937
0.3601 1.927 0.0540 .
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ?
1
Correlation of Fixed
Effects:
(Intr) Place2 Place3 Strss2 Strss3
Place2 -0.466
Place3 -0.447
0.511
Stress2 -0.426 -0.035 -0.002 0.026
Stress3 -0.451
0.004 -0.006 0.017 0.485
Voicing2 -0.356 0.004 -0.023
0.008 0.020 0.011
Based on the index
that appears on Stress in the Fixed Effects outcome (Stress2 and
Stress3; same
for Place2 and Place3) .
Am I correct to assume
that the reference category in this case was the first level and
not the last
as it is done in SPSS?
Does R create dummy
variables to calculate the regression?
Thanks for your time. I?d
appreciate any help you could provide.
Sincerely,Carlos
_________________________________________________________________
[[alternative HTML version deleted]]
<ATT00001.txt>
Dr. Hank Stevens, Associate Professor 338 Pearson Hall Botany Department Miami University Oxford, OH 45056 Office: (513) 529-4206 Lab: (513) 529-4262 FAX: (513) 529-4243 http://www.cas.muohio.edu/~stevenmh/ http://www.users.muohio.edu/harkesae/ http://www.cas.muohio.edu/ecology http://www.muohio.edu/botany/ "E Pluribus Unum" "I love deadlines. I love the whooshing noise they make as they go by." (Douglas Adams) If you send an attachment, please try to send it in a format anyone can read, such as PDF, text, Open Document Format, HTML, or RTF. Why? See: http://www.gnu.org/philosophy/no-word-attachments.html