Skip to content

stratified Wilcoxon available?

12 messages · Frank E Harrell Jr, Thomas Lumley, Heinz Tuechler +2 more

#
Dear All,

is there a stratified version of the Wilcoxon test (also known as van
Elteren test) available in R?
I could find it in the survdiff function of the survival package for
censored data. I think, it should be possible to use this function creating
a dummy censoring indicator and setting it to not censored, but may be
there is a better way to perform the test.

Thanks,

Heinz T??chler
#
Heinz Tuechler <tuechler at gmx.at> writes:
Not easily, I think. I played with the stratified Kruskal Wallis test
(which is the same thing for larger values of 2...) with a grad
student some years ago, but we never got it integrated as an "official"
R function. 

It was not massively hard to code, as I recall it. Basically, you
convert observations to within-stratum ranks, scaled so that the
scores have similar variance (this is crucial: just adding the
per-stratum rank sums won't work). You can then get the relevant SSD
from lm(), by comparing the models "r ~ group + strata" and "r ~
strata". This SSD can be looked up as a chi-square statistic, possibly
after applying a scale factor which I have forgotten.... (I.e. do your
own math, don't trust me!)
#
Peter Dalgaard wrote:
You might think of such a stratified test as part of a proportional odds 
model with adjustment for strata as main effects.  The Wilcoxon tests is 
  a special case of the PO model.  You can fit it with polr or lrm.
#
Thanks to Peter Dalgaard and Frank Harrell for your answers. Fortunately I
don't have an urgent need for this test, but it may be in the future.
Still I would be grateful if someone could comment on my opinion that using
survdiff and regarding all the measures as events would lead to an
equivalent test.

Thanks,

Heinz T??chler
At 15:18 28.08.2005 -0500, Frank E Harrell Jr wrote:
#
On Sun, 28 Aug 2005, Heinz Tuechler wrote:

            
In the absence of ties, yes.   In the presence of ties I think survdiff() 
does something slightly different from what would be usual for the 
Wilcoxon test.  This would matter only with many tied observations.

 	-thomas
Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle
#
At 19:02 28.08.2005 -0700, Thomas Lumley wrote:
Thank you, Thomas, for this information.

Heinz
creating
http://www.R-project.org/posting-guide.html
#
you can plug it together using the `coin' infrastructure (see the
examples in the manual and vignette).

Torsten
#
An embedded and charset-unspecified text was scrubbed...
Name: KW.strat.2005.R
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050829/f72aa6c4/KW.strat.2005.pl
-------------- next part --------------
#
On Mon, 29 Aug 2005, Peter Dalgaard wrote:

            
the conditional version of the above test can be computed as follows:

R> library("coin")
R> set.seed(290875)
R> mydf <- data.frame(y = rnorm(90), x = gl(3, 30)[sample(1:90)], b = gl(3, 30))
R>
R> ### with global ranks, i.e., ranking without using the blocks
R> kruskal_test(y ~ x | b, data = mydf)

        Asymptotic Kruskal-Wallis Test

data:  y by groups 1, 2, 3
         stratified by b
T = 0.5242, df = 2, p-value = 0.7694

R>
R> ### with _blockwise_ ranking and quadratic form of the test statistic
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+                   numeric_trafo = rank, block = mydf$b), teststat = "quad")

        Asymptotic General Independence Test

data:  y by groups 1, 2, 3
         stratified by b
T = 0.3824, df = 2, p-value = 0.826

R>
R> ### the same
R> SKruskal.test(y ~ x | b, data = mydf)

        Kruskal-Wallis stratified rank sum test

data:  y , group:  x , strata:  b
K = 0.3824, df = 2, p-value = 0.826

R>
R>
R>
R> ### trend test (using scores 1, 2, 3)
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+                   numeric_trafo = rank, block = mydf$b), teststat = "quad",
+                   scores = list(x = 1:3))

        Asymptotic General Independence Test

data:  y by groups 1 < 2 < 3
         stratified by b
T = 0.0264, df = 1, p-value = 0.871

R>
R> ### hm.
R> SKruskal.test(y ~ x | b, data = mydf, trend = TRUE)
Error in SKruskal.test(y ~ x | b, data = mydf, trend = TRUE) :
        subscript out of bounds

Best,

Torsten
#
Torsten Hothorn <Torsten.Hothorn at rzmail.uni-erlangen.de> writes:
Nice to see that the old code made sense. A bit surprising that it
gives _exactly_ the same result as the blockwise ranking in coin...
Perhaps introduce some ties? (round(y,1) is usually effective).

The trend test is easily fixed: just spell "t value" without capital V
as we do nowadays. This gives
Kruskal-Wallis stratified rank sum trend test

data:  y , group:  x , strata:  b , trend:  as.numeric(group)
Z = -0.1624, df = 1, p-value = 0.871
Kruskal-Wallis stratified rank sum trend test

data:  y , group:  x , strata:  b , trend:  1 2 3
Z = -0.1624, df = 1, p-value = 0.871

(The df=1 is a bit misleading in this case...)
#
On Tue, 30 Aug 2005, Peter Dalgaard wrote:

            
Nice to see that the new code is working correctly :-)
why? Without ties, the conditional and unconditional versions of the tests
should have exactly the same result.
this should generate differences, yes.
maybe report 0.1624^2 = 0.0264 as test statistic (same as with teststat
= "quad" in `coin')?

Best,

Torsten
#
torsten at hothorn.de writes:
Oh, just that there are a number of little things that could have been
done in slightly different but asymptotically equivalent ways. The
scale factor for the within-block ranks e.g.
Yes, but there was a point in getting a signed statistic. Otherwise, I
might as well have reused the code that used the SSD from the anova
table.