Skip to content

wilcox.test; data type conversion?

4 messages · Steven McKinney, Peter Dalgaard, Par Leijonhufvud

#
You can set up the data as
[1] MVG VG  VG  G   MVG G   VG  G   VG 
Levels: G < VG < MVG
[1] male   male   female male   female male   female male   male  
Levels: male female
grade    sex
1   MVG   male
2    VG   male
3    VG female
4     G   male
5   MVG female
6     G   male
7    VG female
8     G   male
9    VG   male

Now for the Wilcoxon-Mann_Whitney test
Error in wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L,  : 
  'x' must be numeric

I'm not sure if anyone has written a version that will work on ordered factor variables,
but you can coerce the ordered factor to its underlying integer representation with e.g.
Wilcoxon rank sum test with continuity correction

data:  as.integer(grade) by sex 
W = 4.5, p-value = 0.2695
alternative hypothesis: true location shift is not equal to 0 

Warning message:
In wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L,  :
  cannot compute exact p-value with ties

You can break the ties by jittering the data.  Each jitter will of course
produce different tie breakers.  A few repeats of the test, or a loop and
some summaries of the outcomes, will give you an idea of the
"average" result.
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 4, p-value = 0.2619
alternative hypothesis: true location shift is not equal to 0
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 3, p-value = 0.1667
alternative hypothesis: true location shift is not equal to 0
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 7, p-value = 0.7143
alternative hypothesis: true location shift is not equal to 0
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 6, p-value = 0.5476
alternative hypothesis: true location shift is not equal to 0 


I'll let you judge elegance.


As for the barplots, I think all you need to do is specify the row and column order you'd like.

Try this example
Substitute your data, use beside=FALSE to stack, etc.

Steven McKinney
#
Steven McKinney <smckinney at bccrc.ca> [2010.10.29] wrote:
Thanks, this solved my problems. I'll just explain the problem with
ties, that is easier to understand than running jitter and comparing.
Ahh, that simple. I'll fiddle on with that. 

Trying to wean the students away from using Excel for all numbers related
work...
1 day later
#
On 10/29/2010 06:24 AM, Steven McKinney wrote:
I wouldn't bother with that. The p-value is based on the correct
covariance matrix of the rank sums, tie-breaking just adds noise to the
analysis. If you really want an exact p-value, package exactRankTests is
the ticket. (Or, if there is really only 3 females in 9 students, you
can get ambitious and set up the permutation distribution by enumerating
the choose(9,3)=84 possible outcomes.)
#
Peter Dalgaard <pdalgd at gmail.com> [2010.10.30] wrote:
Good to know.
In this case I suspect that for upper secondary students we won't get
that fancy. This is the first time they deal with statistics beyond
mean, median and possibly standard deviation (and simple graphs, of
course).  Thus the request for simple and elegant solutions; arcane
incantations won't make them leave the comforts of Excel.
Well, this being totally fake data I created for a student exercise,
there can be as many male and female students in the class as I think
they can be bothered typing in... :-)

/Par