Skip to content

Lack of independence in anova()

27 messages · Phillip Good, Douglas Bates, Thomas Lumley +6 more

Messages 1–25 of 27

#
If the observations are normally distributed and the 2xk design is
balanced,  theory requires that the tests for interaction and row effects be
independent.  In my program, appended below, this would translate to cntT
(approx)= cntR*cntI/N if all R routines were functioning correctly.  They
aren't.

sim2=function(size,N,p){
  cntR=0
  cntC=0
  cntI=0
  cntT=0
  cntP=0
  for(i in 1:N){
    #generate data
     v=gendata(size)
    #analyze after build(ing) design containing data
     lm.out=lm(yield~c*r,build(size,v))
     av.out=anova(lm.out)
    #if column effect is significant, increment cntC
     if (av.out[[5]][1]<=p)cntC=cntC+1
   #if row effect is significant, increment cntR
     if (av.out[[5]][2]<=p){
           cntR=cntR+1
	   tmp = 1
	   }
     else tmp =0 		
     if (av.out[[5]][3]<=p){
           #if interaction is significant, increment cntI
            cntI=cntI+1
	#if both interaction and row effect are significant, increment cntT
            cntT=cntT + tmp
            } 
     }
    list(cntC=cntC, cntR=cntR, cntI=cntI, cntT=cntT)
}

build=function(size,v){
#size is a vector containing the sample sizes
col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
+size[7]+size[8]))
return(data.frame(c=factor(col), r=factor(row),yield=v))
}

gendata=function(size){
  ssize=sum(size);
  return (rnorm(ssize))
}

#Example
 size=c(3,3,3,0,3,3,3,0)
 sim2(size,10000,10,.16)



Phillip Good
Huntington Beach CA
#
I have already had email exchanges off-list with Phillip Good pointing
out that the independence property that he wishes to establish by
simulation is a consequence of orthogonality of the column span of the
row contrasts and the interaction contrasts.  If you set the contrasts
option to a set of orthogonal contrasts such as contr.helmert or
contr.sum, which has no effect on the results of the anova, this is
easily established.
function(size, v = rnorm(sum(size))) {
    col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
    rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
    row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
    +size[7]+size[8]))
    return(data.frame(c=factor(col), r=factor(row),yield=v))
}
[1] 3 3 3 0 3 3 3 0
c r      yield
1  0 0  1.2369081
2  0 0  1.5616230
3  0 0  1.8396185
4  1 0  0.3173245
5  1 0  1.0715115
6  1 0 -1.1459955
7  2 0  0.2488894
8  2 0  0.1158625
9  2 0  2.6200816
10 0 1  1.2624048
11 0 1 -0.9862578
12 0 1 -0.3235653
13 1 1  0.2039706
14 1 1 -1.4574576
15 1 1  1.9158713
16 2 1 -2.0333909
17 2 1  1.0050272
18 2 1  0.6789184
(Intercept) c1 c2 r1 c1:r1 c2:r1
(Intercept)          18  0  0  0     0     0
c1                    0 12  0  0     0     0
c2                    0  0 36  0     0     0
r1                    0  0  0 18     0     0
c1:r1                 0  0  0  0    12     0
c2:r1                 0  0  0  0     0    36
On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
#
A couple more comments on this simulation.  Notice that the sim2
function is defined with arguments size, N and p but is being called
with four arguments.  It appears as if the value of p will be 10 in
that call.

If you decide to do such a simulation yourself you can save a lot of
time by just building the model matrix once and using lm.fit in
subsequent calls.

Also, there is no need to do the counting in the body of the sim2
function.  Just save the 3 p-values from each replication.  The test
of independence is equivalent to showing that the distribution of the
p-values is uniform over the unit cube.
On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
#
To my surprise, the R functions I employed did NOT verify the independence
property.  I've no quarrel with the theory--it's the R functions that worry
me.

pG

-----Original Message-----
From: Douglas Bates [mailto:dmbates at gmail.com]
Sent: Monday, July 04, 2005 9:13 AM
To: pigood at verizon.net; r-help
Subject: Re: [R] Lack of independence in anova()


I have already had email exchanges off-list with Phillip Good pointing
out that the independence property that he wishes to establish by
simulation is a consequence of orthogonality of the column span of the
row contrasts and the interaction contrasts.  If you set the contrasts
option to a set of orthogonal contrasts such as contr.helmert or
contr.sum, which has no effect on the results of the anova, this is
easily established.
function(size, v = rnorm(sum(size))) {
    col=c(rep(0,size[1]),rep(1,size[2]),rep(2,size[3]),rep(3,size[4]),
    rep(0,size[5]),rep(1,size[6]),rep(2,size[7]),rep(3,size[8]))
    row=c(rep(0,size[1]+size[2]+size[3]+size[4]),rep(1,size[5]+size[6]
    +size[7]+size[8]))
    return(data.frame(c=factor(col), r=factor(row),yield=v))
}
[1] 3 3 3 0 3 3 3 0
c r      yield
1  0 0  1.2369081
2  0 0  1.5616230
3  0 0  1.8396185
4  1 0  0.3173245
5  1 0  1.0715115
6  1 0 -1.1459955
7  2 0  0.2488894
8  2 0  0.1158625
9  2 0  2.6200816
10 0 1  1.2624048
11 0 1 -0.9862578
12 0 1 -0.3235653
13 1 1  0.2039706
14 1 1 -1.4574576
15 1 1  1.9158713
16 2 1 -2.0333909
17 2 1  1.0050272
18 2 1  0.6789184
(Intercept) c1 c2 r1 c1:r1 c2:r1
(Intercept)          18  0  0  0     0     0
c1                    0 12  0  0     0     0
c2                    0  0 36  0     0     0
r1                    0  0  0 18     0     0
c1:r1                 0  0  0  0    12     0
c2:r1                 0  0  0  0     0    36
On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
be
cntT
http://www.R-project.org/posting-guide.html
#
My bad.  The example of a call should read, sim2(size,10000,.16).
Originally, the intent of the program was to compare ANOV (the gold
standard) with synchronized permutation tests when data is from contaminated
normal distributions or the design is unbalanced.  The tests for main
effects and interactions are always independent with synchronized
permutations and ought to be for ANOV with normal data and balanced designs.

-----Original Message-----
From: Douglas Bates [mailto:dmbates at gmail.com]
Sent: Monday, July 04, 2005 9:24 AM
To: pigood at verizon.net
Cc: r-help at r-project.org
Subject: Re: [R] Lack of independence in anova()


A couple more comments on this simulation.  Notice that the sim2
function is defined with arguments size, N and p but is being called
with four arguments.  It appears as if the value of p will be 10 in
that call.

If you decide to do such a simulation yourself you can save a lot of
time by just building the model matrix once and using lm.fit in
subsequent calls.

Also, there is no need to do the counting in the body of the sim2
function.  Just save the 3 p-values from each replication.  The test
of independence is equivalent to showing that the distribution of the
p-values is uniform over the unit cube.
On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
be
cntT
http://www.R-project.org/posting-guide.html
#
I wrote up a note on how to do a more efficient simulation of the
p-values from anova then discovered to my surprise that the chi-square
test for independence of the significance of the F-tests indicated
that they were not independent.  I was stumped by this but fortunately
Thomas Lumley came to my rescue with an explanation.  There is no
reason why the results of the F tests should be independent.  The
numerators are independent but the denominator is the same for both
tests.  When, due to random variation, the denominator is small, then
the p-values for both tests will tend to be small.  If, instead of
F-tests you use chi-square tests then you do see independence.

Here is the note on the simulation.

There are several things that could be done to speed the simulation of
the p-values of an anova like this under the null distribution.

If you examine the structure of a fitted lm object (use the function
str()) you will see that there are components called `qr', `effects'
and `assign'.  You can verify by experimentation that `qr' and
`assign' depend only on the experimental design.  Furthermore, the
`effects' vector can be reproduced as qr.qty(qrstr, y).

The sums of squares for the different terms in the model are the sums
of squares of elements of the effects vector as indexed by the assign
vector.  The residual sum of squares is the sum of squares of the
remaining elements of the assign vector.  You can generate 10000
replications of the calculations of the relevant sums of squares as
c r
1  1 1
2  1 1
3  1 1
4  2 1
5  2 1
6  2 1
7  3 1
8  3 1
9  3 1
10 1 2
11 1 2
12 1 2
13 2 2
14 2 2
15 2 2
16 3 2
17 3 2
18 3 2
[1] 0 1 1 2 3 3
[1] 20.61  0.01 20.61  0.00  0.00
[,1]      [,2]     [,3]      [,4]       [,5]        [,6]
0 0.4783121 0.3048634 0.713689 0.6937838 0.03649023  2.63392426
1 0.5825213 1.4756395 1.127018 0.5209751 1.18697199  3.32972093
2 0.2612723 3.6396106 0.547506 1.1641910 0.37843963  0.03411672
3 2.6259806 3.5504584 1.645215 0.1197238 0.85361018  4.53895212
4 9.1942755 8.2122693 4.863392 5.4413571 2.03715439 22.94815118

The rows of that array correspond to the sum of squares for the
Intercept (which we generally ignore), the factor 'c', the factor 'r',
their interaction and the residuals.

As you can see that took about 21 seconds on my system, which I expect
is a bit faster than your simulation ran.

Because I set the seed to a known value I can reproduce the results
for the first few simulations to check that the sums of squares are
correct.  Remember that the original fit (fm1) is not included in the
table.
Analysis of Variance Table

Response: rnorm(18)
          Df Sum Sq Mean Sq F value Pr(>F)
c          2 0.5825  0.2913  0.3801 0.6917
r          1 0.2613  0.2613  0.3410 0.5701
c:r        2 2.6260  1.3130  1.7137 0.2215
Residuals 12 9.1943  0.7662               

You can continue this process if you wish further verification that
the results correspond to the fitted models.

You can get the p-values for the F-tests as
+                      pr = pf((res[3,]/1)/(res[5,]/12), 1, 12, low = FALSE),
+                      pint = pf((res[4,]/2)/(res[5,]/12), 2, 12, low = FALSE))

Again you can check this for the first few by hand.
pc         pr      pint
1 0.69171238 0.57006574 0.2214847
2 0.37102129 0.03975286 0.1158059
3 0.28634939 0.26771167 0.1740633
4 0.57775850 0.13506561 0.8775828
5 0.06363138 0.16124100 0.1224806

If you wish you could then check marginal distributions using
techniques like an empirical density plot.
At this point I would recommend checking the joint distribution but if
you want to choose a specific level and check the contingency table
that could be done as
I(pint < 0.16)
I(pr < 0.16) FALSE TRUE
       FALSE  7204 1240
       TRUE   1215  341

The summary method for an xtabs object provides a test of independence
Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsF)
Number of cases in table: 10000 
Number of factors: 2 
Test for independence of all factors:
	Chisq = 51.6, df = 1, p-value = 6.798e-13

for which you can see the puzzling result.  However, if you use the
chisquared test based on the known residual variance of 1, you get
+                      pr = pchisq(res[3,], 1, low = FALSE),
+                      pint = pchisq(res[4,], 2, low = FALSE))
pc         pr      pint
1 0.7473209 0.60924741 0.2690144
2 0.4781553 0.05642013 0.1694446
3 0.5692081 0.45933855 0.4392846
4 0.7706757 0.28059805 0.9418946
5 0.5523983 0.53843951 0.6525907
I(pint < 0.16)
I(pr < 0.16) FALSE TRUE
       FALSE  7121 1319
       TRUE   1324  236
Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsC)
Number of cases in table: 10000 
Number of factors: 2 
Test for independence of all factors:
	Chisq = 0.25041, df = 1, p-value = 0.6168
On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
#
On 7/4/05, Douglas Bates <dmbates at gmail.com> wrote:
...
I meant to write "effects vector" not "assign vector" in that last sentence.
1 day later
#
Do you or Lumley have a citation for this conclusion?   Most people go
forward with the ANOV on the basis that the various tests are independent.

Phillip Good
P.S.  Tests based on the method of synchronized permutations are
independent.

-----Original Message-----
From: Douglas Bates [mailto:dmbates at gmail.com]
Sent: Monday, July 04, 2005 1:44 PM
To: pigood at verizon.net
Cc: r-help
Subject: Re: [R] Lack of independence in anova()


I wrote up a note on how to do a more efficient simulation of the
p-values from anova then discovered to my surprise that the chi-square
test for independence of the significance of the F-tests indicated
that they were not independent.  I was stumped by this but fortunately
Thomas Lumley came to my rescue with an explanation.  There is no
reason why the results of the F tests should be independent.  The
numerators are independent but the denominator is the same for both
tests.  When, due to random variation, the denominator is small, then
the p-values for both tests will tend to be small.  If, instead of
F-tests you use chi-square tests then you do see independence.

Here is the note on the simulation.

There are several things that could be done to speed the simulation of
the p-values of an anova like this under the null distribution.

If you examine the structure of a fitted lm object (use the function
str()) you will see that there are components called `qr', `effects'
and `assign'.  You can verify by experimentation that `qr' and
`assign' depend only on the experimental design.  Furthermore, the
`effects' vector can be reproduced as qr.qty(qrstr, y).

The sums of squares for the different terms in the model are the sums
of squares of elements of the effects vector as indexed by the assign
vector.  The residual sum of squares is the sum of squares of the
remaining elements of the assign vector.  You can generate 10000
replications of the calculations of the relevant sums of squares as
c r
1  1 1
2  1 1
3  1 1
4  2 1
5  2 1
6  2 1
7  3 1
8  3 1
9  3 1
10 1 2
11 1 2
12 1 2
13 2 2
14 2 2
15 2 2
16 3 2
17 3 2
18 3 2
[1] 0 1 1 2 3 3
asgn, sum)))
[1] 20.61  0.01 20.61  0.00  0.00
[,1]      [,2]     [,3]      [,4]       [,5]        [,6]
0 0.4783121 0.3048634 0.713689 0.6937838 0.03649023  2.63392426
1 0.5825213 1.4756395 1.127018 0.5209751 1.18697199  3.32972093
2 0.2612723 3.6396106 0.547506 1.1641910 0.37843963  0.03411672
3 2.6259806 3.5504584 1.645215 0.1197238 0.85361018  4.53895212
4 9.1942755 8.2122693 4.863392 5.4413571 2.03715439 22.94815118

The rows of that array correspond to the sum of squares for the
Intercept (which we generally ignore), the factor 'c', the factor 'r',
their interaction and the residuals.

As you can see that took about 21 seconds on my system, which I expect
is a bit faster than your simulation ran.

Because I set the seed to a known value I can reproduce the results
for the first few simulations to check that the sums of squares are
correct.  Remember that the original fit (fm1) is not included in the
table.
Analysis of Variance Table

Response: rnorm(18)
          Df Sum Sq Mean Sq F value Pr(>F)
c          2 0.5825  0.2913  0.3801 0.6917
r          1 0.2613  0.2613  0.3410 0.5701
c:r        2 2.6260  1.3130  1.7137 0.2215
Residuals 12 9.1943  0.7662

You can continue this process if you wish further verification that
the results correspond to the fitted models.

You can get the p-values for the F-tests as
FALSE),
+                      pr = pf((res[3,]/1)/(res[5,]/12), 1, 12, low =
FALSE),
+                      pint = pf((res[4,]/2)/(res[5,]/12), 2, 12, low =
FALSE))

Again you can check this for the first few by hand.
pc         pr      pint
1 0.69171238 0.57006574 0.2214847
2 0.37102129 0.03975286 0.1158059
3 0.28634939 0.26771167 0.1740633
4 0.57775850 0.13506561 0.8775828
5 0.06363138 0.16124100 0.1224806

If you wish you could then check marginal distributions using
techniques like an empirical density plot.
At this point I would recommend checking the joint distribution but if
you want to choose a specific level and check the contingency table
that could be done as
I(pint < 0.16)
I(pr < 0.16) FALSE TRUE
       FALSE  7204 1240
       TRUE   1215  341

The summary method for an xtabs object provides a test of independence
Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsF)
Number of cases in table: 10000
Number of factors: 2
Test for independence of all factors:
	Chisq = 51.6, df = 1, p-value = 6.798e-13

for which you can see the puzzling result.  However, if you use the
chisquared test based on the known residual variance of 1, you get
+                      pr = pchisq(res[3,], 1, low = FALSE),
+                      pint = pchisq(res[4,], 2, low = FALSE))
pc         pr      pint
1 0.7473209 0.60924741 0.2690144
2 0.4781553 0.05642013 0.1694446
3 0.5692081 0.45933855 0.4392846
4 0.7706757 0.28059805 0.9418946
5 0.5523983 0.53843951 0.6525907
I(pint < 0.16)
I(pr < 0.16) FALSE TRUE
       FALSE  7121 1319
       TRUE   1324  236
Call: xtabs(formula = ~I(pr < 0.16) + I(pint < 0.16), data = pvalsC)
Number of cases in table: 10000
Number of factors: 2
Test for independence of all factors:
	Chisq = 0.25041, df = 1, p-value = 0.6168
On 7/4/05, Phillip Good <pigood at verizon.net> wrote:
worry
effects
cntT
They
http://www.R-project.org/posting-guide.html
#
On Wed, 6 Jul 2005, Phillip Good wrote:
I don't have a citation. I would have thought that the claim that they 
were independent was more in need of a citation.  I'd expect that books on 
the classical theory of linear models would show that the row, 
interaction, and residual mean squares are independent, and the lack of 
independence of the tests is then basic probability.  If X, Y, and Z are 
independent and Z takes on more than one value then X/Z and Y/Z can't be 
independent.
Quite possibly.  If so, they presumably condition on the observed residual 
mean square.

 	-thomas
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
I'm confused:  I understood Doug to be describing a traditional, 
normal theory ANOVA with k rows in the table for different effects plus 
a (k+1)st for residuals and the mean squares (MS) column are all 
indpendent chi-squares scaled by the same unknown sigma^2.  The k 
statistics in the F column are ratios of independent chi-squares with 
the same independent denominator chi-square.  How can they be indpendent?

	  spencer graves
p.s.  How is the method of synchronized permutations relevant to a 
traditional, normal theory ANOVA?
Phillip Good wrote:

            

  
    
#
On 7/6/05, Phillip Good <pigood at verizon.net> wrote:
Perhaps we could review the sequence of events here.  This exchange
began with your sending me a message claiming that there is a bug in
lm or anova in R because the results of your simulation were what you
expected.  I responded saying that it is unlikely that a serious bug
in such a fundamental part of R would remain undetected for such a
long time and then went further and took apart your simulation and
showed how it could be done much more effectively and also showed,
with help from Thomas Lumley and Peter Dalgaard, that your original
assumption is incorrect.

You have now made another blanket statement the "Most people go
forward with the ANOV on the basis that the various tests are
independent" and indicated that Thomas or I should provide a citation
to validate our claim.  Perhaps instead of claiming that it is
necessary for us to produce evidence in support of our claim that they
are not independent because they are based on the same denominator
mean square, you could produce a citation to back up your claim.  To
date none of your claims of the independence of the tests or the
supposed bugs in R have been substantiated.  If indeed "most people"
do as you claim then "most people" are suffering from a misconception,
a not-uncommon situation in statistics.
#
On 7/6/05, Douglas Bates <dmbates at gmail.com> wrote:
...
I meant to write "were not what you expected".  I've got to learn to
read the email messages before posting them.
#
spencer graves asks:
How is the method of synchronized permutations relevant to a traditional,
normal theory ANOVA?

One ought now ask as I am doing currently whether  traditional, normal
theory ANOVA is applicable to the analysis of experimental designs.  If in
fact, the results of the various ANOV tests are not independent, then:
   i) shouldn't we be teaching this in courses on the analysis of
experimental designs? (Can anyone point to an FDA submission or aplied
statistics publication which conceeds this result?)
  ii) using tests whose results are independent?
#
On 7/6/2005 1:43 PM, Phillip Good wrote:
I think it's relatively infrequent that we make inferences where 
independence matters, so it makes sense to use tests that are marginally 
well-behaved.  When do you care about the simultaneous behaviour of two 
tests?

The nice thing about ANOVA tests is that each one is valid regardless of 
the presence or absence of effects in other rows.  This doesn't require 
independence.

Where you might use independence is to try to construct a combined 
p-value looking for a violation of any of several hypotheses, but if 
that's the test you're interested in, you should just have pooled the 
terms in the numerator.

Duncan Murdoch
#
On 06-Jul-05 Phillip Good wrote:
This hardly needs a citation -- Thomas Lumley's explanation (excerpted
below from Doug Bates's mail) is justification enough, and it is
succinct and elementary.

However, a place one naturally looks for caveats of this kind is
in Kendall & Stuart, and I duly found it (Vol 3, section 35.46
of my 1983 edition). It is essentially exactly the same explanation:

  "However, the tests in the AV tables which we have considered
   are never independent tests, for although the various SS in a
   table may be independent of each other, all the tests we have
   derived use the Residual SS as denominator of the test statistic,
   and the various tests must therefore be statistically dependent,
   since, e.g., a Residual SS which is (by chance) large will
   depress all the values of the test staistics simultaneously."

(And K&S, thorough as they are with citations, do not cite any
primary reference for this either!)

However, if the "degrees of freedom" for Residual SS is large,
then the amount of random variation in the denominator will be
small and it will be effectively constant. Then, of course,
with independent numerators, the tests will be effectively
independent (and equivalent to chi-squared) and also, therefore,
the p-values.

The fact that "most people go forward with the ANOV on the basis
that the various tests are independent" possibly reflects the
wide-spread act of faith that one has "a large sample", whatever
the value of n may really be. One wonders how often people check
the p-value for their F on (n1:n2) d.f. against the p-value for
(n1:Inf) d.f.? The 5% point decreases quite perceptibly as n2
increases up to about 20, and more slowly thereafter; but still
the difference between F(n1:20) and F(n1:Inf) is substantial
for any n1 (being about 0.5 for n1 up to about 10, increasing
thereafter up to 0.84):

n1<-c(1+2*(0:50),5000);cbind(n1,qf(0.95,n1,20) - qf(0.95,n1,Inf))

F(Inf,Inf) = 1 ; F(20:Inf) = qf(0.95,Inf,20) = 1.841

Conversely (e.g.):

  > 1-pf(qf(0.95,20,20),20,20)
  [1] 0.05
  > 1-pf(qf(0.95,20,20),20,Inf)
  [1] 0.002391189

Such differences are related to the degree of non-independence of
several tests on the same data.
But surely this amounts to assuming n2 = Inf? If that's an adequate
approximation, then fine; but if not (see e.g. above) then not!

Best wishes to all,
Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 06-Jul-05                                       Time: 19:29:18
------------------------------ XFMail ------------------------------
JRG
#
On 6 Jul 2005 at 12:30, Douglas Bates wrote:

            
At the risk of further roiling the waters ...

As several have already pointed out, the "usual" F-tests in a balanced ANOVA have independent numerators but a common denominator, 
and hence the F-statistics cannot be independent.  Is this not the basis of Kimball's Inequality, which states that the effect of 
the common denominator is that the simultaneous error rate cannot exceed what it would be if the tests *were* in fact independent?

In other words, you should get a simultaneous error rate for the F-tests that is lower than that under independence of test 
statistics. Are you?

---JRG

John R. Gleason
#
As Duncan Murdoch indicated, the primary issue is not that the 
different tests are (not) statitically independent but that they are 
sensitive to different alternative hypotheses.

	  spencer graves
JRG wrote:

            

  
    
#
On Wed, Jul 06, 2005 at 10:06:45AM -0700, Thomas Lumley wrote:
(...)
Not really true. I  can produce a counterexample on request (admittedly
quite trivial though).

G??ran Brostr??m
#
On 06-Jul-05 G??ran Brostr??m wrote:
But true if both X  and Y have positive probability of being
non-zero, n'est-pas?

Tut, tut, G??ran!

Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 07-Jul-05                                       Time: 00:51:24
------------------------------ XFMail ------------------------------
#
Hi, G??ran:  I'll bite:

	  (a) I'd like to see your counterexample.

	  (b) I'd like to know what is wrong with my the following, apparently 
defective, proof that they can't be independent:  First consider 
indicator functions of independent events A, B, and C.

	  P{(AC)&(BC)} = P{ABC} = PA*PB*PC.

	  But P(AC)*P(BC) = PA*PB*(PC)^2.  Thus, AC and BC can be independent 
only if PC = 0 or 1, i.e., the indicator of C is constant almost surely.

	  Is there a flaw in this?  If not, is there some reason this case 
cannot be extended the product of arbitrary random variables X, Y, and 
W=1/Z?

	  Thanks,
	  spencer graves
G??ran Brostr??m wrote:

            

  
    
#
My thanks to Douglas Bates and others for pointing out ANOV's limitations.
Now the question is how to put my new won knowledge into practice.
Before my eyes were opened
   if row effect and interaction were both statistically significant
   I would report the row effects separately for each column.

Now
   if row effect and interaction are both statistically significant
   i'll realize this may simply be a matter of the test statistics being
interdependant (though I might still report the column effects separately
if, say, one column was different--in a practical sense--from the rest).
   if neither row nor column nor interaction effects are significant, I'll
attribute the result to too small a sample.

(As far as higher-order interactions go, I never really had a clue as to
what they were about anyway.)

Phillip Good
#
(Ted Harding) wrote:
If X and Y are independent with symmetric distributions about zero, and 
Z is is supported on +/- A for some non-zero constant A, then X/Z and 
Y/Z are still independent.  There are probably other special cases too.

Duncan Murdoch
#
Spencer Graves wrote:
I don't see one.

 > If not, is there some reason this case
Because you can't?  The situations are different?

If C is a non-trivial event independent of A, then AC is strictly a 
subset of A.  However, as the example I just posted shows (with constant 
1), you can have a non-trivial random variable W where XW has exactly 
the same distribution as X.

Duncan Murdoch
#
Hi, Duncan & G??ran:

	  Consider the following:  X, Y, Z have symmetric distributions with 
the following restrictions:

	  P(X=1)=P(X=-1)=x with P(|X|<1)=0 so P(|X|>1)=1-2x.
	  P(Y=1)=P(Y=-1)=y with P(|Y|<1)=0 so P(|Y|>1)=1-2y.
	  P(Z=1)=P(Z=-1)=z with P(|Z|>1)=0 so P(|Z|<1)=1-2z.

	  Then

	  P(X/Z=1)=2xz, P(Y/Z=1)=2yz, and
	  P{(X/Z=1)&(Y/Z)=1}=2xyz.

	  Independence requires that this last probability is 4xyz^2.  This is 
true only if z=0.5.  If z<0.5, then X/Z and Y/Z are clearly dependent.
	
	  How's this?
	  spencer graves
Duncan Murdoch wrote:

            

  
    
#
Spencer Graves wrote:

            
As far as I can see, this is correct.
Yes. Random variables are independent if all events which can be defined 
in terms of them are independent.
If Z is non-constant, it must be some event defined by Z with 
probability strictly beteween 0 and 1
and the above argument cannot be used.

Kjetil
-- 

Kjetil Halvorsen.

Peace is the most effective weapon of mass construction.
               --  Mahdi Elmandjra