Skip to content

Problem With Model.Tables Function

2 messages · Gary Whysong, Brian Ripley

#
I am using R for the first time in one of my classes.  My students have
alerted me to a problem for which we have not found an answer.  We find
that some means returned by the model.tables function are not correct when
missing data is present in analysis of variance problems.  We have
duplicated the problem using R 1.2.0, 1.2.1, and 1.2.2 under Windows 98
and several distributions of Linux (Redhat 7.0, Mandrake 7.2, SuSE 7.0,
and 7.1). 

The situation is best illustrated with a small example of a randomized
block design having three treatments and four blocks.
Df  Sum Sq Mean Sq F value    Pr(>F)
blocks       3  28.250   9.417  10.273  0.008868 **
trtmnts      2 147.167  73.583  80.273 4.676e-05 ***
Residuals    6   5.500   0.917
---
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1
Tables of means
Grand mean
 
14.41667
 
 blocks
     1      2      3      4
13.667 16.333 12.333 15.333
 
 trtmnts
    1     2     3
10.50 13.75 19.00
 
Entering the data again and dropping treatment 2, block3 and treatment 3,
block 4, we have:
Df  Sum Sq Mean Sq F value    Pr(>F)
blocks2      3  18.267   6.089  7.4341 0.0410993 *
trtmts2      2 126.557  63.279 77.2587 0.0006367 ***
Residuals    4   3.276   0.819
---
Signif. codes:  0  `***'  0.001  `**'  0.01  `*'  0.05  `.'  0.1  ` '  1
Tables of means
Grand mean
 
14.3
 
 blocks2
        1     2  3    4
    13.67 16.33 13 13.5
rep  3.00  3.00  2  2.0
 
 trtmts2
        1     2     3
    10.68 14.47 18.97
rep  4.00  3.00  3.00
 
We find that the treatment means (trtmts2) are incorrect although the
number of replications indicated are correct. Block means (blocks2) are
correct.
 
The treatment means should be: 10.5, 14.67, and 19.0, respectively.

Further investigation reveals that we encounter this problem whenever
dealing with unequal replications or missing data.  For example, with
unequal subsamples, or missing data in factorial experiments.  We can get
the correct means by using regression techniques (lm) to solve the
analysis of variance problems and extracting the fitted values from the
appropriate lm model. 

Since I am learning R, perhaps I have missed something?  Is this possibly
a bug in the model.tables function?

------------------------------------------------------------
   Gary Whysong, Associate Professor, Environmental Resources
   Morrison School of Agribusiness & Resource Management
   Arizona State University East
   Phone: (480) 727-1263, E-mail: gwhysong at Cactus.east.asu.edu
------------------------------------------------------------


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
I believe that these are correct, but you haven't told us why you think
they are wrong.  In the unbalanced case you need to understand carefully
what to expect: your expected answer is definitely wrong, which suggests
that R's might well be right.

For the record, these R results agree with S-PLUS on your examples.
But they are not much documented in the unbalanced case, so perhaps
`for experts only'.

The FAQ says:

9.1 What is a bug?

....

If a command does the wrong thing, that is a bug.  But be sure you know for
certain what it ought to have done.  If you aren't familiar with the
command, or don't know for certain how the command is supposed to work,
then it might actually be working right.  Rather than jumping to
conclusions, show the problem to someone who knows for certain.
On Sat, 10 Mar 2001, Gary Whysong wrote:
[...]
Why?  These are *model*.tables not *data*.tables. You have to
adjust for block effects, and they are unbalanced.  Blocks 3 and 4 have
lower responses than 1 and 2, and they are missing for treatments 2 and 3.
Seems to adjust correctly to me.

Note the order of terms matters in unbalanced models.