Skip to content

non-parametric permutation and signed paired-difference distributions

6 messages · Weidong Gu, Jean V Adams, Michael Friendly +1 more

#
Hi all
Consider the classic data below from Darwin on the heights of 15 pairs 
of zea mays (corn) plants
either cross-fertilized or self-fertilized, where the goal is to see if 
it makes a difference.

 > head(ZeaMays)
   pair pot  cross   self   diff
1    1   1 23.500 17.375  6.125
2    2   1 12.000 20.375 -8.375
3    3   1 21.000 20.000  1.000
4    4   2 22.000 20.000  2.000
5    5   2 19.125 18.375  0.750
6    6   2 21.500 18.625  2.875
...

I'd like to illustrate two types of non-parametric tests of whether the 
mean(diff) = 0.

(a) Permutation test, where the values of, say self are permuted and 
diff=cross - self
is calculated for each permutation.  There are 15! permutations, but a 
reasonably
large number of random permutations would suffice.

(b) Test based on assigning each abs(diff) a + or - sign, and 
calculating the mean(diff).
There are 2^15 such possible values, but again, a reasonably large 
number of random
samples would do.

This is obviously a case for apply and friends, but I can't quite see 
how to set it up.

The complete data:

 > dput(ZeaMays)
structure(list(pair = 1:15, pot = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1",
"2", "3", "4"), class = "factor"), cross = c(23.5, 12, 21, 22,
19.125, 21.5, 22.125, 20.375, 18.25, 21.625, 23.25, 21, 22.125,
23, 12), self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625,
15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18), diff = c(6.125,
-8.375, 1, 2, 0.75, 2.875, 3.5, 5.125, 1.75, 3.625, 7, 3, 9.375,
7.5, -6)), row.names = c(NA, -15L), .Names = c("pair", "pot",
"cross", "self", "diff"), class = "data.frame")



-- Michael Friendly Email: friendly AT yorku DOT ca Professor, 
Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 
736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 
1P3 CANADA
#
On Fri, Oct 14, 2011 at 11:38 AM, Michael Friendly <friendly at yorku.ca> wrote:
What do you mean by 'assigning each abs(diff) a + or - sign, and
calculating the mean(diff)'?
abs(diff) should be all positive, right?
#
On 10/14/2011 1:20 PM, Weidong Gu wrote:
I mean to calculate
mean (sign * abs(diff))
where sign is a vector of length 15 composed of some combination of (-1, 
+1).

This is actually what Fisher did.  It corresponds to assigning one 
member of each pair to cross / self
#
On 10/14/2011 4:10 PM, Bert Gunter wrote:
That's beautiful, Bert.  Thanks!  Here is my fleshed-out example

mean(ZeaMays$diff)
# complete permutation distribution of diff, for all 2^15 ways of assigning
# one value to cross and the other to self
allmeans <- as.matrix(expand.grid(as.data.frame(matrix(rep(c(-1,1),15), 
nr =2)))) %*% abs(ZeaMays$diff) / 15

# upper-tail p-value
sum(allmeans > mean(ZeaMays$diff)) / 2^15
# two-tailed p-value
sum(abs(allmeans) > mean(ZeaMays$diff)) / 2^15

hist(allmeans, breaks=64, xlab="Mean difference, cross-self",
     main="Histogram of all mean differences")
abline(v=mean(ZeaMays$diff), col="red", lwd=2)
abline(v=-mean(ZeaMays$diff), col="red", lwd=2, lty=2)