On Sep 20, 2012, at 02:43 , Thomas Lumley wrote:
On Thu, Sep 20, 2012 at 5:46 AM, Mohamed Radhouane Aniba
<aradwen at gmail.com> wrote:
Hello All,
I am writing to ask your opinion on how to interpret this case. I have
two vectors "a" and "b" that I am trying to compare.
The wilcoxon test is giving me a pvalue of 5.139217e-303 of a over b with
the alternative "greater". Now if I make a summary on each of them I have
the following
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0001411 0.0002381 0.0002671 0.0003623 0.0012910
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000000 0.0000000 0.0000000 0.0004947 0.0002972 1.0000000
The mean ratio is then around 0.5399031 which naively goes in opposite
direction of the wilcoxon test ( I was expecting to find a ratio >> 1)
There's nothing conceptually strange about the Wilcoxon test showing a
difference in the opposite direction to the difference in means. It's
probably easiest to think about this in terms of the Mann-Whitney
version of the same test, which is based on the proportion of pairs of
one observation from each group where the `a' observation is higher.
Your 'c' vector has a lot more zeros, so a randomly chosen observation
from 'c' is likely to be smaller than one from 'a', but the non-zero
observations seem to be larger, so the mean of 'c' is higher.
The Wilcoxon test probably isn't very useful in a setting like this,
since its results really make sense only under 'stochastic ordering',
where the shift is in the same direction across the whole
distribution.
-thomas
I was sure I had seen a definition where X was "larger than" Y if P(X>Y) >
P(Y<X), but that's obviously not the normal definition. Anyways, it is worth
emphasizing that that is what the Wilcoxon test tests for, not whether the
means differ, nor whether the medians do. As a counterexample of the latter,
try
x <- rep(0:1, c(60,40))
y <- rep(0:1, c(80,20))
wilcox.test(x,y)
median(x)
median(y)
(and the "location shift" reference in wilcox.test output is a bit of a red
herring.)
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com