Hi all
Consider the classic data below from Darwin on the heights of 15 pairs
of zea mays (corn) plants
either cross-fertilized or self-fertilized, where the goal is to see if
it makes a difference.
> head(ZeaMays)
pair pot cross self diff
1 1 1 23.500 17.375 6.125
2 2 1 12.000 20.375 -8.375
3 3 1 21.000 20.000 1.000
4 4 2 22.000 20.000 2.000
5 5 2 19.125 18.375 0.750
6 6 2 21.500 18.625 2.875
...
I'd like to illustrate two types of non-parametric tests of whether the
mean(diff) = 0.
(a) Permutation test, where the values of, say self are permuted and
diff=cross - self
is calculated for each permutation. There are 15! permutations, but a
reasonably
large number of random permutations would suffice.
(b) Test based on assigning each abs(diff) a + or - sign, and
calculating the mean(diff).
There are 2^15 such possible values, but again, a reasonably large
number of random
samples would do.
This is obviously a case for apply and friends, but I can't quite see
how to set it up.
The complete data:
> dput(ZeaMays)
structure(list(pair = 1:15, pot = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1",
"2", "3", "4"), class = "factor"), cross = c(23.5, 12, 21, 22,
19.125, 21.5, 22.125, 20.375, 18.25, 21.625, 23.25, 21, 22.125,
23, 12), self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625,
15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18), diff = c(6.125,
-8.375, 1, 2, 0.75, 2.875, 3.5, 5.125, 1.75, 3.625, 7, 3, 9.375,
7.5, -6)), row.names = c(NA, -15L), .Names = c("pair", "pot",
"cross", "self", "diff"), class = "data.frame")
-- Michael Friendly Email: friendly AT yorku DOT ca Professor,
Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416
736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J
1P3 CANADA
non-parametric permutation and signed paired-difference distributions
6 messages · Weidong Gu, Jean V Adams, Michael Friendly +1 more
On Fri, Oct 14, 2011 at 11:38 AM, Michael Friendly <friendly at yorku.ca> wrote:
Hi all Consider the classic data below from Darwin on the heights of 15 pairs of zea mays (corn) plants either cross-fertilized or self-fertilized, where the goal is to see if it makes a difference.
head(ZeaMays)
?pair pot ?cross ? self ? diff 1 ? ?1 ? 1 23.500 17.375 ?6.125 2 ? ?2 ? 1 12.000 20.375 -8.375 3 ? ?3 ? 1 21.000 20.000 ?1.000 4 ? ?4 ? 2 22.000 20.000 ?2.000 5 ? ?5 ? 2 19.125 18.375 ?0.750 6 ? ?6 ? 2 21.500 18.625 ?2.875 ... I'd like to illustrate two types of non-parametric tests of whether the mean(diff) = 0. (a) Permutation test, where the values of, say self are permuted and diff=cross - self is calculated for each permutation. ?There are 15! permutations, but a reasonably large number of random permutations would suffice. (b) Test based on assigning each abs(diff) a + or - sign, and calculating the mean(diff). There are 2^15 such possible values, but again, a reasonably large number of random samples would do.
What do you mean by 'assigning each abs(diff) a + or - sign, and calculating the mean(diff)'? abs(diff) should be all positive, right?
This is obviously a case for apply and friends, but I can't quite see how to set it up. The complete data:
dput(ZeaMays)
structure(list(pair = 1:15, pot = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1",
"2", "3", "4"), class = "factor"), cross = c(23.5, 12, 21, 22,
19.125, 21.5, 22.125, 20.375, 18.25, 21.625, 23.25, 21, 22.125,
23, 12), self = c(17.375, 20.375, 20, 20, 18.375, 18.625, 18.625,
15.25, 16.5, 18, 16.25, 18, 12.75, 15.5, 18), diff = c(6.125,
-8.375, 1, 2, 0.75, 2.875, 3.5, 5.125, 1.75, 3.625, 7, 3, 9.375,
7.5, -6)), row.names = c(NA, -15L), .Names = c("pair", "pot",
"cross", "self", "diff"), class = "data.frame")
-- Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology
Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700
Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111014/817c12ac/attachment.pl>
On 10/14/2011 1:20 PM, Weidong Gu wrote:
On Fri, Oct 14, 2011 at 11:38 AM, Michael Friendly<friendly at yorku.ca> wrote:
Hi all Consider the classic data below from Darwin on the heights of 15 pairs of zea mays (corn) plants either cross-fertilized or self-fertilized, where the goal is to see if it makes a difference.
head(ZeaMays)
pair pot cross self diff 1 1 1 23.500 17.375 6.125 2 2 1 12.000 20.375 -8.375 3 3 1 21.000 20.000 1.000 4 4 2 22.000 20.000 2.000 5 5 2 19.125 18.375 0.750 6 6 2 21.500 18.625 2.875 ... I'd like to illustrate two types of non-parametric tests of whether the mean(diff) = 0. (a) Permutation test, where the values of, say self are permuted and diff=cross - self is calculated for each permutation. There are 15! permutations, but a reasonably large number of random permutations would suffice. (b) Test based on assigning each abs(diff) a + or - sign, and calculating the mean(diff). There are 2^15 such possible values, but again, a reasonably large number of random samples would do.
What do you mean by 'assigning each abs(diff) a + or - sign, and calculating the mean(diff)'? abs(diff) should be all positive, right?
I mean to calculate mean (sign * abs(diff)) where sign is a vector of length 15 composed of some combination of (-1, +1). This is actually what Fisher did. It corresponds to assigning one member of each pair to cross / self
Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111014/8f88e40c/attachment.pl>
On 10/14/2011 4:10 PM, Bert Gunter wrote:
If I understand what you want to do, it's simple. 2^15 is small (only about 33000), so you can generate all the possible means (sums, actually) and and find the population quantile for your result. If avals is the vector of 15 absolute values, the complete distribution is: allsums <- as.matrix(expand.grid(as.data.frame(matrix(rep(c(-1,1),15), nr =2)))) %*% avals This was instantaneous on my machine.
That's beautiful, Bert. Thanks! Here is my fleshed-out example
mean(ZeaMays$diff)
# complete permutation distribution of diff, for all 2^15 ways of assigning
# one value to cross and the other to self
allmeans <- as.matrix(expand.grid(as.data.frame(matrix(rep(c(-1,1),15),
nr =2)))) %*% abs(ZeaMays$diff) / 15
# upper-tail p-value
sum(allmeans > mean(ZeaMays$diff)) / 2^15
# two-tailed p-value
sum(abs(allmeans) > mean(ZeaMays$diff)) / 2^15
hist(allmeans, breaks=64, xlab="Mean difference, cross-self",
main="Histogram of all mean differences")
abline(v=mean(ZeaMays$diff), col="red", lwd=2)
abline(v=-mean(ZeaMays$diff), col="red", lwd=2, lty=2)
Michael Friendly Email: friendly AT yorku DOT ca Professor, Psychology Dept. York University Voice: 416 736-5115 x66249 Fax: 416 736-5814 4700 Keele Street Web: http://www.datavis.ca Toronto, ONT M3J 1P3 CANADA