Different results on running Wilcoxon Rank Sum test in R and SPSS
Dear Professor John,? Thank you very much for your reply!? I agree with you that the non-parametric tests I mentioned in my previous email (Moods median test and Median test) do not make sense in this situation as they treat PFD_n and drug_code as different groups. As you correctly said, I want to use PFD_n as a vector of scores and drug_code to make two groups out of it. This is exactly what the Independent samples median test does in SPSS. I wish to perform the same test in R and am unable to do so. Simply put, I am asking how to perform the Independent samples median test in R just like it is performed in SPSS?? Secondly, for the question you are asking about the test statistic, I have not performed the Wilcoxon Rank sum test in SPSS for the PFD_n and drug_code data. I have said something to the contrary in my first email, I apologize for that.? Thank you very much for your time!?
Yours sincerelyBharat Rawlley On Wednesday, 20 January, 2021, 04:47:21 am IST, John Fox <jfox at mcmaster.ca> wrote:
Dear Bharat Rawlley, What you tried to do appears to be nonsense. That is, you're treating PFD_n and drug_code as if they were scores for two different groups. I assume that what you really want to do is to treat PFD_n as a vector of scores and drug_code as defining two groups. If that's correct, and with your data into Data, you can try the following: ------snip ------ > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE) ??? Wilcoxon rank sum test with continuity correction data:? PFD_n by drug_code W = 197, p-value = 0.05563 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: ? -2.000014e+00? 5.037654e-05 sample estimates: difference in location ? ? ? ? ? ? ? -1.000019 Warning messages: 1: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? : ? cannot compute exact p-value with ties 2: In wilcox.test.default(x = c(27, 26, 20, 24, 28, 28, 27, 27, 26,? : ? cannot compute exact confidence intervals with ties ------snip ------ You can get an approximate confidence interval by specifying exact=FALSE: ------snip ------ > wilcox.test(PFD_n ~ drug_code, data=Data, conf.int=TRUE, exact=FALSE) ??? Wilcoxon rank sum test with continuity correction data:? PFD_n by drug_code W = 197, p-value = 0.05563 alternative hypothesis: true location shift is not equal to 0 95 percent confidence interval: ? -2.000014e+00? 5.037654e-05 sample estimates: difference in location ? ? ? ? ? ? ? -1.000019 ------snip ------ As it turns out, your data are highly discrete and have a lot of ties (see in particular PFD_n = 28): ------snip ------ > xtabs(~ PFD_n + drug_code, data=Data) ? ? ? drug_code PFD_n? 0? 1 ? ? 0? 2? 0 ? ? 16? 1? 1 ? ? 18? 0? 1 ? ? 19? 0? 1 ? ? 20? 2? 0 ? ? 22? 0? 1 ? ? 24? 2? 0 ? ? 25? 1? 2 ? ? 26? 5? 2 ? ? 27? 4? 2 ? ? 28? 5 13 ? ? 30? 1? 2 ------snip ------ I'm no expert in nonparametric inference, but I doubt whether the approximate p-value will be very accurate for data like these. I don't know why wilcox.test() (correctly used) and SPSS are giving you slightly different results -- assuming that you're actually doing the same thing in both cases. I couldn't help but notice that most of your data are missing. Are you getting the same value of the test statistic and different p-values, or is the test statistic different as well? I hope this helps, ? John John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://socialsciences.mcmaster.ca/jfox/
On 2021-01-19 5:46 a.m., bharat rawlley via R-help wrote:
? Thank you for the reply and suggestion, Michael!
I used dput() and this is the output I can share with you. Simply explained, I have 3 columns namely, drug_code, freq4w_n and PFD_n. Each column has 132 values (including NA). The problem with the Wilcoxon Rank Sum test has been described in my first email.
Please do let me know if you need any further clarification from my side! Thanks a lot for your time!
structure(list(drug_code = c(0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,?1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1,?0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1,?1, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1,?0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0,?1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1,?1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0), freq4w_n = c(1,?NA, NA, 0, NA, 4, NA, 10, NA, 0, 6, NA, NA, NA, NA, NA, 10, NA,?0, NA, NA, NA, NA, 0, NA, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA,?NA, NA, NA, NA, NA, 0, 0, 12, 0, NA, 1, 2, 1, 2, 2, NA, 28, 0,?NA, 4, NA, 1, NA, NA, NA, NA, NA, 0, 3, 1, NA, NA, NA, NA, 4,?28, NA, NA, 0, 2, 12, 0, NA, NA, NA, 0, NA, 0, NA, NA, NA, NA,?NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA, NA, 6, 1, NA, NA,?NA, 0, NA, NA, NA, 0, 0, NA, 0, NA, 2, 8, 3, NA, NA, NA, 0, NA,?NA, NA, 9, NA, NA, NA, NA, NA, NA, NA, NA), PFD_n = c(27, NA,?NA, 28, NA, 26, NA, 20, NA, 30, 24, NA, NA, NA, NA, NA, 18, NA,?28, NA, NA, NA, NA, 28, NA, 28, NA, NA, NA, 28, NA, 28, NA, NA,?NA, NA, NA, NA, NA, NA, 28, 28, 16, 28, NA, 27, 26, 27, 26, 26,?NA, 0, 30, NA, 24, NA, 27, NA, NA, NA, NA, NA, 28, 25, 27, NA,?NA, NA, NA, 26, 0, NA, NA, 28, 26, 16, 28, NA, NA, NA, 28, NA,?28, NA, NA, NA, NA, NA, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA,?NA, 22, 27, NA, NA, NA, 28, NA, NA, NA, 28, 28, NA, 28, NA, 26,?20, 25, NA, NA, NA, 30, NA, NA, NA, 19, NA, NA, NA, NA, NA, NA,?NA, NA)), row.names = c(NA, -132L), class = c("tbl_df", "tbl",?"data.frame"))
Yours sincerely?Bharat Rawlley? ? On Tuesday, 19 January, 2021, 03:53:27 pm IST, Michael Dewey <lists at dewey.myzen.co.uk> wrote:
?
? Unfortunately your data did not come through. Try using dput() and then
pasting that into the body of your e-mail message.
On 18/01/2021 17:26, bharat rawlley via R-help wrote:
Hello, On running the Wilcoxon Rank Sum test in R and SPSS, I am getting the following discrepancies which I am unable to explain. Q1 In the attached data set, I was trying to compare freq4w_n in those with drug_code 0 vs 1. SPSS gives a P value 0.031 vs R gives a P value 0.001779. The code I used in R is as follows - wilcox.test(freq4w_n, drug_code, conf.int = T) Q2 Similarly, in the same data set, when trying to compare PFD_n in those with drug_code 0 vs 1, SPSS gives a P value 0.038 vs R gives a P value?< 2.2e-16. The code I used in R is as follows - wilcox.test(PFD_n, drug_code, mu = 0, alternative = "two.sided", correct = TRUE, paired = FALSE, conf.int = TRUE) I have tried searching on Google and watching some Youtube tutorials, I cannot find an answer, Any help will be really appreciated, Thank you!
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.