Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R? I could find it in the survdiff function of the survival package for censored data. I think, it should be possible to use this function creating a dummy censoring indicator and setting it to not censored, but may be there is a better way to perform the test. Thanks, Heinz T??chler
stratified Wilcoxon available?
12 messages · Frank E Harrell Jr, Thomas Lumley, Heinz Tuechler +2 more
Heinz Tuechler <tuechler at gmx.at> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R? I could find it in the survdiff function of the survival package for censored data. I think, it should be possible to use this function creating a dummy censoring indicator and setting it to not censored, but may be there is a better way to perform the test.
Not easily, I think. I played with the stratified Kruskal Wallis test (which is the same thing for larger values of 2...) with a grad student some years ago, but we never got it integrated as an "official" R function. It was not massively hard to code, as I recall it. Basically, you convert observations to within-stratum ranks, scaled so that the scores have similar variance (this is crucial: just adding the per-stratum rank sums won't work). You can then get the relevant SSD from lm(), by comparing the models "r ~ group + strata" and "r ~ strata". This SSD can be looked up as a chi-square statistic, possibly after applying a scale factor which I have forgotten.... (I.e. do your own math, don't trust me!)
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Peter Dalgaard wrote:
Heinz Tuechler <tuechler at gmx.at> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R? I could find it in the survdiff function of the survival package for censored data. I think, it should be possible to use this function creating a dummy censoring indicator and setting it to not censored, but may be there is a better way to perform the test.
Not easily, I think. I played with the stratified Kruskal Wallis test (which is the same thing for larger values of 2...) with a grad student some years ago, but we never got it integrated as an "official" R function. It was not massively hard to code, as I recall it. Basically, you convert observations to within-stratum ranks, scaled so that the scores have similar variance (this is crucial: just adding the per-stratum rank sums won't work). You can then get the relevant SSD from lm(), by comparing the models "r ~ group + strata" and "r ~ strata". This SSD can be looked up as a chi-square statistic, possibly after applying a scale factor which I have forgotten.... (I.e. do your own math, don't trust me!)
You might think of such a stratified test as part of a proportional odds model with adjustment for strata as main effects. The Wilcoxon tests is a special case of the PO model. You can fit it with polr or lrm.
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
Thanks to Peter Dalgaard and Frank Harrell for your answers. Fortunately I don't have an urgent need for this test, but it may be in the future. Still I would be grateful if someone could comment on my opinion that using survdiff and regarding all the measures as events would lead to an equivalent test. Thanks, Heinz T??chler
At 15:18 28.08.2005 -0500, Frank E Harrell Jr wrote:
Peter Dalgaard wrote:
Heinz Tuechler <tuechler at gmx.at> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R? I could find it in the survdiff function of the survival package for censored data. I think, it should be possible to use this function creating a dummy censoring indicator and setting it to not censored, but may be there is a better way to perform the test.
Not easily, I think. I played with the stratified Kruskal Wallis test (which is the same thing for larger values of 2...) with a grad student some years ago, but we never got it integrated as an "official" R function. It was not massively hard to code, as I recall it. Basically, you convert observations to within-stratum ranks, scaled so that the scores have similar variance (this is crucial: just adding the per-stratum rank sums won't work). You can then get the relevant SSD from lm(), by comparing the models "r ~ group + strata" and "r ~ strata". This SSD can be looked up as a chi-square statistic, possibly after applying a scale factor which I have forgotten.... (I.e. do your own math, don't trust me!)
You might think of such a stratified test as part of a proportional odds
model with adjustment for strata as main effects. The Wilcoxon tests is
a special case of the PO model. You can fit it with polr or lrm.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
On Sun, 28 Aug 2005, Heinz Tuechler wrote:
Thanks to Peter Dalgaard and Frank Harrell for your answers. Fortunately I don't have an urgent need for this test, but it may be in the future. Still I would be grateful if someone could comment on my opinion that using survdiff and regarding all the measures as events would lead to an equivalent test.
In the absence of ties, yes. In the presence of ties I think survdiff() does something slightly different from what would be usual for the Wilcoxon test. This would matter only with many tied observations. -thomas
Thanks, Heinz T?chler At 15:18 28.08.2005 -0500, Frank E Harrell Jr wrote:
Peter Dalgaard wrote:
Heinz Tuechler <tuechler at gmx.at> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R? I could find it in the survdiff function of the survival package for censored data. I think, it should be possible to use this function creating a dummy censoring indicator and setting it to not censored, but may be there is a better way to perform the test.
Not easily, I think. I played with the stratified Kruskal Wallis test (which is the same thing for larger values of 2...) with a grad student some years ago, but we never got it integrated as an "official" R function. It was not massively hard to code, as I recall it. Basically, you convert observations to within-stratum ranks, scaled so that the scores have similar variance (this is crucial: just adding the per-stratum rank sums won't work). You can then get the relevant SSD from lm(), by comparing the models "r ~ group + strata" and "r ~ strata". This SSD can be looked up as a chi-square statistic, possibly after applying a scale factor which I have forgotten.... (I.e. do your own math, don't trust me!)
You might think of such a stratified test as part of a proportional odds
model with adjustment for strata as main effects. The Wilcoxon tests is
a special case of the PO model. You can fit it with polr or lrm.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
At 19:02 28.08.2005 -0700, Thomas Lumley wrote:
On Sun, 28 Aug 2005, Heinz Tuechler wrote:
Thanks to Peter Dalgaard and Frank Harrell for your answers. Fortunately I don't have an urgent need for this test, but it may be in the future. Still I would be grateful if someone could comment on my opinion that using survdiff and regarding all the measures as events would lead to an equivalent test.
In the absence of ties, yes. In the presence of ties I think survdiff() does something slightly different from what would be usual for the Wilcoxon test. This would matter only with many tied observations. -thomas
Thank you, Thomas, for this information. Heinz
Thanks, Heinz T??chler At 15:18 28.08.2005 -0500, Frank E Harrell Jr wrote:
Peter Dalgaard wrote:
Heinz Tuechler <tuechler at gmx.at> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R? I could find it in the survdiff function of the survival package for censored data. I think, it should be possible to use this function
creating
a dummy censoring indicator and setting it to not censored, but may be there is a better way to perform the test.
Not easily, I think. I played with the stratified Kruskal Wallis test (which is the same thing for larger values of 2...) with a grad student some years ago, but we never got it integrated as an "official" R function. It was not massively hard to code, as I recall it. Basically, you convert observations to within-stratum ranks, scaled so that the scores have similar variance (this is crucial: just adding the per-stratum rank sums won't work). You can then get the relevant SSD from lm(), by comparing the models "r ~ group + strata" and "r ~ strata". This SSD can be looked up as a chi-square statistic, possibly after applying a scale factor which I have forgotten.... (I.e. do your own math, don't trust me!)
You might think of such a stratified test as part of a proportional odds
model with adjustment for strata as main effects. The Wilcoxon tests is
a special case of the PO model. You can fit it with polr or lrm.
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Thomas Lumley Assoc. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R?
you can plug it together using the `coin' infrastructure (see the examples in the manual and vignette). Torsten
Thanks, Heinz T??chler
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
An embedded and charset-unspecified text was scrubbed... Name: KW.strat.2005.R Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050829/f72aa6c4/KW.strat.2005.pl -------------- next part --------------
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Mon, 29 Aug 2005, Peter Dalgaard wrote:
Torsten Hothorn <Torsten.Hothorn at rzmail.uni-erlangen.de> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R?
you can plug it together using the `coin' infrastructure (see the examples in the manual and vignette).
I managed to dig out our old code, and patched it up loosely to fit versions of R later than 0.62 (the trend test code still seems broken). Use at own risk. The usage is fairly straightforward:
SKruskal.test(y~g1|g2)
Kruskal-Wallis stratified rank sum test
data: y , group: g1 , strata: g2
K = 3.1486, df = 3, p-value = 0.3693
the conditional version of the above test can be computed as follows:
R> library("coin")
R> set.seed(290875)
R> mydf <- data.frame(y = rnorm(90), x = gl(3, 30)[sample(1:90)], b = gl(3, 30))
R>
R> ### with global ranks, i.e., ranking without using the blocks
R> kruskal_test(y ~ x | b, data = mydf)
Asymptotic Kruskal-Wallis Test
data: y by groups 1, 2, 3
stratified by b
T = 0.5242, df = 2, p-value = 0.7694
R>
R> ### with _blockwise_ ranking and quadratic form of the test statistic
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+ numeric_trafo = rank, block = mydf$b), teststat = "quad")
Asymptotic General Independence Test
data: y by groups 1, 2, 3
stratified by b
T = 0.3824, df = 2, p-value = 0.826
R>
R> ### the same
R> SKruskal.test(y ~ x | b, data = mydf)
Kruskal-Wallis stratified rank sum test
data: y , group: x , strata: b
K = 0.3824, df = 2, p-value = 0.826
R>
R>
R>
R> ### trend test (using scores 1, 2, 3)
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+ numeric_trafo = rank, block = mydf$b), teststat = "quad",
+ scores = list(x = 1:3))
Asymptotic General Independence Test
data: y by groups 1 < 2 < 3
stratified by b
T = 0.0264, df = 1, p-value = 0.871
R>
R> ### hm.
R> SKruskal.test(y ~ x | b, data = mydf, trend = TRUE)
Error in SKruskal.test(y ~ x | b, data = mydf, trend = TRUE) :
subscript out of bounds
Best,
Torsten
Torsten Hothorn <Torsten.Hothorn at rzmail.uni-erlangen.de> writes:
On Mon, 29 Aug 2005, Peter Dalgaard wrote:
Torsten Hothorn <Torsten.Hothorn at rzmail.uni-erlangen.de> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R?
you can plug it together using the `coin' infrastructure (see the examples in the manual and vignette).
I managed to dig out our old code, and patched it up loosely to fit versions of R later than 0.62 (the trend test code still seems broken). Use at own risk. The usage is fairly straightforward:
SKruskal.test(y~g1|g2)
Kruskal-Wallis stratified rank sum test
data: y , group: g1 , strata: g2
K = 3.1486, df = 3, p-value = 0.3693
the conditional version of the above test can be computed as follows:
R> library("coin")
R> set.seed(290875)
R> mydf <- data.frame(y = rnorm(90), x = gl(3, 30)[sample(1:90)], b = gl(3, 30))
R>
R> ### with global ranks, i.e., ranking without using the blocks
R> kruskal_test(y ~ x | b, data = mydf)
Asymptotic Kruskal-Wallis Test
data: y by groups 1, 2, 3
stratified by b
T = 0.5242, df = 2, p-value = 0.7694
R>
R> ### with _blockwise_ ranking and quadratic form of the test statistic
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+ numeric_trafo = rank, block = mydf$b), teststat = "quad")
Asymptotic General Independence Test
data: y by groups 1, 2, 3
stratified by b
T = 0.3824, df = 2, p-value = 0.826
R>
R> ### the same
R> SKruskal.test(y ~ x | b, data = mydf)
Kruskal-Wallis stratified rank sum test
data: y , group: x , strata: b
K = 0.3824, df = 2, p-value = 0.826
R>
R>
R>
R> ### trend test (using scores 1, 2, 3)
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+ numeric_trafo = rank, block = mydf$b), teststat = "quad",
+ scores = list(x = 1:3))
Asymptotic General Independence Test
data: y by groups 1 < 2 < 3
stratified by b
T = 0.0264, df = 1, p-value = 0.871
R>
R> ### hm.
R> SKruskal.test(y ~ x | b, data = mydf, trend = TRUE)
Error in SKruskal.test(y ~ x | b, data = mydf, trend = TRUE) :
subscript out of bounds
Best,
Torsten
Nice to see that the old code made sense. A bit surprising that it gives _exactly_ the same result as the blockwise ranking in coin... Perhaps introduce some ties? (round(y,1) is usually effective). The trend test is easily fixed: just spell "t value" without capital V as we do nowadays. This gives
SKruskal.test(y ~ x | b, data = mydf, trend = TRUE)
Kruskal-Wallis stratified rank sum trend test data: y , group: x , strata: b , trend: as.numeric(group) Z = -0.1624, df = 1, p-value = 0.871
SKruskal.test(y ~ x | b, data = mydf, trend = 1:3)
Kruskal-Wallis stratified rank sum trend test data: y , group: x , strata: b , trend: 1 2 3 Z = -0.1624, df = 1, p-value = 0.871 (The df=1 is a bit misleading in this case...)
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
On Tue, 30 Aug 2005, Peter Dalgaard wrote:
Torsten Hothorn <Torsten.Hothorn at rzmail.uni-erlangen.de> writes:
On Mon, 29 Aug 2005, Peter Dalgaard wrote:
Torsten Hothorn <Torsten.Hothorn at rzmail.uni-erlangen.de> writes:
Dear All, is there a stratified version of the Wilcoxon test (also known as van Elteren test) available in R?
you can plug it together using the `coin' infrastructure (see the examples in the manual and vignette).
I managed to dig out our old code, and patched it up loosely to fit versions of R later than 0.62 (the trend test code still seems broken). Use at own risk. The usage is fairly straightforward:
SKruskal.test(y~g1|g2)
Kruskal-Wallis stratified rank sum test
data: y , group: g1 , strata: g2
K = 3.1486, df = 3, p-value = 0.3693
the conditional version of the above test can be computed as follows:
R> library("coin")
R> set.seed(290875)
R> mydf <- data.frame(y = rnorm(90), x = gl(3, 30)[sample(1:90)], b = gl(3, 30))
R>
R> ### with global ranks, i.e., ranking without using the blocks
R> kruskal_test(y ~ x | b, data = mydf)
Asymptotic Kruskal-Wallis Test
data: y by groups 1, 2, 3
stratified by b
T = 0.5242, df = 2, p-value = 0.7694
R>
R> ### with _blockwise_ ranking and quadratic form of the test statistic
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+ numeric_trafo = rank, block = mydf$b), teststat = "quad")
Asymptotic General Independence Test
data: y by groups 1, 2, 3
stratified by b
T = 0.3824, df = 2, p-value = 0.826
R>
R> ### the same
R> SKruskal.test(y ~ x | b, data = mydf)
Kruskal-Wallis stratified rank sum test
data: y , group: x , strata: b
K = 0.3824, df = 2, p-value = 0.826
R>
R>
R>
R> ### trend test (using scores 1, 2, 3)
R> independence_test(y ~ x | b, data = mydf, ytrafo = function(data) trafo(data,
+ numeric_trafo = rank, block = mydf$b), teststat = "quad",
+ scores = list(x = 1:3))
Asymptotic General Independence Test
data: y by groups 1 < 2 < 3
stratified by b
T = 0.0264, df = 1, p-value = 0.871
R>
R> ### hm.
R> SKruskal.test(y ~ x | b, data = mydf, trend = TRUE)
Error in SKruskal.test(y ~ x | b, data = mydf, trend = TRUE) :
subscript out of bounds
Best,
Torsten
Nice to see that the old code made sense.
Nice to see that the new code is working correctly :-)
A bit surprising that it gives _exactly_ the same result as the blockwise ranking in coin...
why? Without ties, the conditional and unconditional versions of the tests should have exactly the same result.
Perhaps introduce some ties? (round(y,1) is usually effective).
this should generate differences, yes.
The trend test is easily fixed: just spell "t value" without capital V as we do nowadays. This gives
SKruskal.test(y ~ x | b, data = mydf, trend = TRUE)
Kruskal-Wallis stratified rank sum trend test
data: y , group: x , strata: b , trend: as.numeric(group)
Z = -0.1624, df = 1, p-value = 0.871
SKruskal.test(y ~ x | b, data = mydf, trend = 1:3)
Kruskal-Wallis stratified rank sum trend test
data: y , group: x , strata: b , trend: 1 2 3
Z = -0.1624, df = 1, p-value = 0.871
(The df=1 is a bit misleading in this case...)
maybe report 0.1624^2 = 0.0264 as test statistic (same as with teststat = "quad" in `coin')? Best, Torsten
-- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
torsten at hothorn.de writes:
Nice to see that the old code made sense.
Nice to see that the new code is working correctly :-)
A bit surprising that it gives _exactly_ the same result as the blockwise ranking in coin...
why? Without ties, the conditional and unconditional versions of the tests should have exactly the same result.
Oh, just that there are a number of little things that could have been done in slightly different but asymptotically equivalent ways. The scale factor for the within-block ranks e.g.
Perhaps introduce some ties? (round(y,1) is usually effective).
this should generate differences, yes.
The trend test is easily fixed: just spell "t value" without capital V as we do nowadays. This gives
SKruskal.test(y ~ x | b, data = mydf, trend = TRUE)
Kruskal-Wallis stratified rank sum trend test
data: y , group: x , strata: b , trend: as.numeric(group)
Z = -0.1624, df = 1, p-value = 0.871
SKruskal.test(y ~ x | b, data = mydf, trend = 1:3)
Kruskal-Wallis stratified rank sum trend test
data: y , group: x , strata: b , trend: 1 2 3
Z = -0.1624, df = 1, p-value = 0.871
(The df=1 is a bit misleading in this case...)
maybe report 0.1624^2 = 0.0264 as test statistic (same as with teststat = "quad" in `coin')?
Yes, but there was a point in getting a signed statistic. Otherwise, I might as well have reused the code that used the SSD from the anova table.
O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907