Skip to content
Prev 303025 / 398503 Next

Simple question about formulae in R!?

On Fri, Aug 10, 2012 at 7:39 AM, Henrik Singmann
<henrik.singmann at psychologie.uni-freiburg.de> wrote:
This is because for factor or character variables, R automatically
dummy codes them (or whatever contrasts were set, dummies are the
default, but others could be set globally which would then propogate
to the specific model.matrix() call).
One way to think about this is the regression versus ANOVA style of
parameterization.  Consider these three simple models

Here we have a traditional regression.  am is a two level factor, so
it gets one dummy code with the intercept.  The model matrix looks
something like:

i a
1 0
1 0
1 0
1 1
1 1
1 1

what does that mean for the expected values?  For a = 0,

yhat = b0 * 1 + b1 * 0 = b0 * 1, the intercept

for a = 1

yhat = b0 * 1 + b1 * 1

what is the difference between these two?  Merely b1.  Hence under
this parameterization, b0 is the expected value for the group a = 0,
and b1 is the difference in expected values between the two groups a =
0 and a = 1.
(Intercept) factor(am)1
  17.147368    7.244939

We can find the expected values for each group.  The first is just b0,
the intercept term.  The secondn is the intercept plus b1, the single
other coefficient.  We can get this just by adding (or summing all the
coefficients) like so:
[1] 24.39231

Now, if we explicitly force out the intercept, R uses a different
coding scheme.  The intercept normally captures some reference or
baseline group, which every other group is compared to, but if we
force it out, the default design matrix is something like:

1 0
1 0
1 0
0 1
0 1
0 1

this looks very similar to what we had before, only now the two
columns are opposites.  Let's think again what our expected values
are.  For a = 0

yhat = b0 * 1 + b1 * 0 = b0 * 1

for a = 1

yhat = b0 * 0 + b1 * 1 = b1 * 1

now b0 encodes the expected value of the group a = 0 and b1 encodes
the expected value of the group a = 1.  By default these coefficients
are tested against 0.  There is no more explicit test if the two
groups are different from each other because instead of creating
differences, we have created the overall group expectation.  In lm():
factor(am)0 factor(am)1
   17.14737    24.39231

both those numbers should be familiar from before.  In this case, both
models included two parameters, the only difference is whether they
encode group expectations or an expectation and expected difference.
These are equivalenet models, and their likelihood etc. will be
identical.   Note however that if you suppress the intercept, R will
use a different formula for the R squared so those will not look
identical (although if you got predicted values from both models,
yhat, you would find those identical).

This concept generalizes to the interaction of categorical variables
without their main effects.  Normally, the interaction terms are
expected differences from the reference group, but if the "main
effect" is dropped, then instead, the interactions become the expected
values of the various cells (if you think of a 2 x 2 table of group
membership).

For example here it is as usual:
(Intercept)             factor(vs)1             factor(am)1
              15.050000                5.692857                4.700000
factor(vs)1:factor(am)1
               2.928571
[1] 20.74286
[1] 19.75
[1] 28.37143

the intercept is the expected value for the group vs = 0 & am = 0,
then you get the difference between expectations for am = 0 & vs = 0
and am = 0 & vs = 1, then likewise for am, and finally, the additional
bit of being am = 1 and vs = 1.  I show the by hand calculations to
get the expected group values.  Right now, we get significance tests
of whether these differences are statistically significantly different
from zero.

If we drop the main effects and intercept, we get:
factor(vs)0:factor(am)0 factor(vs)1:factor(am)0 factor(vs)0:factor(am)1
               15.05000                20.74286                19.75000
factor(vs)1:factor(am)1
               28.37143

which you can see directly gets all the group expectations.

Cheers,

Josh
With few exceptions, I'm in the habit of not giving people type 3
p-values.  People like, but generally do not actually understand them.
 I would also like to point out that I am in psychology, so yes I know
psychology likes them.  Further, I am a doctoral student so it is not
like I am so established that people bow to my every whim, it is work
to explain why I am not sometimes and I have had discussions in
various lab meetings and with advisors about this, but my point is
that it is possible.  I would urge you to do the same and take heart
that there are others working for change---you are not the only one
even if it feels like it at your university.