Skip to content

Multivariate multiple regression

3 messages · Iain Pardoe, Henric Nilsson, Arne Henningsen

#
I'd like to model the relationship between m responses Y1, ..., Ym and a
single set of predictor variables X1, ..., Xr.  Each response is assumed
to follow its own regression model, and the error terms in each model
can be correlated.  My understanding is that although lm() handles
vector Y's on the left-hand side of the model formula, it really just
fits m separate lm models.  What should I use to do a full multivariate
analysis (as in section 7.7 of Johnson/Wichern)?  Thanks.

Iain Pardoe <ipardoe at lcbmail.uoregon.edu>
University of Oregon
#
Iain Pardoe said the following on 2005-05-04 20:08:
Have you tried using the `lm' function? Note that R 2.1.0 added several 
useful functions for multivariate responses.

Let's replicate Johnson & Wichern's Example 7.8 (p. 413, in the 4th Ed.) 
using `lm':

 > ex7.8 <- data.frame(z1 = c(0, 1, 2, 3, 4),
+                     y1 = c(1, 4, 3, 8, 9),
+                     y2 = c(-1, -1, 2, 3, 2))
 >
 > f.mlm <- lm(cbind(y1, y2) ~ z1, data = ex7.8)
 > summary(f.mlm)
Response y1 :

Call:
lm(formula = y1 ~ z1, data = ex7.8)

Residuals:
          1          2          3          4          5
  6.880e-17  1.000e+00 -2.000e+00  1.000e+00 -1.326e-16

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)   1.0000     1.0954   0.913   0.4286
z1            2.0000     0.4472   4.472   0.0208 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.414 on 3 degrees of freedom
Multiple R-Squared: 0.8696,     Adjusted R-squared: 0.8261
F-statistic:    20 on 1 and 3 DF,  p-value: 0.02084


Response y2 :

Call:
lm(formula = y2 ~ z1, data = ex7.8)

Residuals:
          1          2          3          4          5
  9.931e-17 -1.000e+00  1.000e+00  1.000e+00 -1.000e+00

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  -1.0000     0.8944  -1.118   0.3450
z1            1.0000     0.3651   2.739   0.0714 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.155 on 3 degrees of freedom
Multiple R-Squared: 0.7143,     Adjusted R-squared: 0.619
F-statistic:   7.5 on 1 and 3 DF,  p-value: 0.07142


 > SSD(f.mlm)
$SSD
    y1 y2
y1  6 -2
y2 -2  4

$call
lm(formula = cbind(y1, y2) ~ z1, data = ex7.8)

$df
[1] 3

attr(,"class")
[1] "SSD"
 > f.mlm1 <- update(f.mlm, ~ 1)
 > anova(f.mlm, f.mlm1)
Analysis of Variance Table

Model 1: cbind(y1, y2) ~ z1
Model 2: cbind(y1, y2) ~ 1
   Res.Df Df Gen.var.  Pillai approx F num Df den Df Pr(>F)
1      3      1.4907
2      4  1   4.4721  0.9375  15.0000      2      2 0.0625 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


HTH,
Henric
1 day later
#
On Wednesday 04 May 2005 20:08, Iain Pardoe wrote:
I don't know Johnson/Wichern. However, it seems to me that you want to do a 
"seemingly unrelated regression" (SUR). You can do this with the package 
"systemfit".

Arne