wilcox.test; data type conversion?

You can set up the data as
grade <- ordered(c("MVG", "VG", "VG", "G", "MVG", "G", "VG", "G", "VG"), levels = c("G", "VG", "MVG"))
grade
[1] MVG VG  VG  G   MVG G   VG  G   VG 
Levels: G < VG < MVG
sex <- factor(c( "male", "male", "female", "male", "female", "male", "female", "male", "male"), levels = c("male", "female"))
sex
[1] male   male   female male   female male   female male   male  
Levels: male female
gradesbysex <- data.frame(grade, sex)

gradesbysex
grade    sex
1   MVG   male
2    VG   male
3    VG female
4     G   male
5   MVG female
6     G   male
7    VG female
8     G   male
9    VG   male

Now for the Wilcoxon-Mann_Whitney test
wilcox.test(grade ~ sex, data = gradesbysex)
Error in wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L,  : 
  'x' must be numeric

I'm not sure if anyone has written a version that will work on ordered factor variables,
but you can coerce the ordered factor to its underlying integer representation with e.g.
wilcox.test(as.integer(grade) ~ sex, data = gradesbysex)
Wilcoxon rank sum test with continuity correction

data:  as.integer(grade) by sex 
W = 4.5, p-value = 0.2695
alternative hypothesis: true location shift is not equal to 0 

Warning message:
In wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L,  :
  cannot compute exact p-value with ties

You can break the ties by jittering the data.  Each jitter will of course
produce different tie breakers.  A few repeats of the test, or a loop and
some summaries of the outcomes, will give you an idea of the
"average" result.
wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 4, p-value = 0.2619
alternative hypothesis: true location shift is not equal to 0
wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 3, p-value = 0.1667
alternative hypothesis: true location shift is not equal to 0
wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 7, p-value = 0.7143
alternative hypothesis: true location shift is not equal to 0
wilcox.test(jitter(as.integer(grade)) ~ sex, data = gradesbysex)
Wilcoxon rank sum test

data:  jitter(as.integer(grade)) by sex 
W = 6, p-value = 0.5476
alternative hypothesis: true location shift is not equal to 0 

I'll let you judge elegance.

As for the barplots, I think all you need to do is specify the row and column order you'd like.

Try this example
barplot(VADeaths, beside = TRUE)
barplot(VADeaths[5:1,c(4, 2, 3, 1)], beside = TRUE)
Substitute your data, use beside=FALSE to stack, etc.

Steven McKinney
I'm working on a quick tutorial for my students, and was planning on
using Mann-Whitney U as one of the tests.

I have the following (fake) data

 grade <- c("MVG", "VG", "VG", "G", "MVG", "G", "VG", "G", "VG")
 sex <- c( "male", "male", "female", "male", "female", "male", "female", "male", "male")
 gradesbysex <- data.frame(grade, sex)

The grades is in the Swedish system, where the order is G < VG < MVG

The idea is that they will investigate if they can show a grade
difference by sex (i.e. that the teacher gives better grades to boys or
girls).

Since the wilcox.test needs the order of the grades it wants numeric
vector  for the data. Is there a good and simple (i.e. student
compatible) way to handle this? I could tell them to enter data as
numbers instead, but an elegant way to do this inside R would be
preferable.

On the same theme, is there a way to tell barplot that, when making
stacked barplots, to stack the data in a particular order (default
appears to be alphabetical)?

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
You can set up the data as

grade <- ordered(c("MVG", "VG", "VG", "G", "MVG", "G", "VG", "G", "VG"), levels = c("G", "VG", "MVG"))
wilcox.test(as.integer(grade) ~ sex, data = gradesbysex)
Thanks, this solved my problems. I'll just explain the problem with
ties, that is easier to understand than running jitter and comparing.
As for the barplots, I think all you need to do is specify the row and column order you'd like.

Try this example

barplot(VADeaths, beside = TRUE)
barplot(VADeaths[5:1,c(4, 2, 3, 1)], beside = TRUE)
Substitute your data, use beside=FALSE to stack, etc.
Ahh, that simple. I'll fiddle on with that. 

Trying to wean the students away from using Excel for all numbers related
work...
wilcox.test(as.integer(grade) ~ sex, data = gradesbysex)
	Wilcoxon rank sum test with continuity correction

data:  as.integer(grade) by sex 
W = 4.5, p-value = 0.2695
alternative hypothesis: true location shift is not equal to 0 

Warning message:
In wilcox.test.default(x = c(3L, 2L, 1L, 1L, 1L, 2L), y = c(2L,  :
  cannot compute exact p-value with ties

You can break the ties by jittering the data.  Each jitter will of course
produce different tie breakers.  A few repeats of the test, or a loop and
some summaries of the outcomes, will give you an idea of the
"average" result.
I wouldn't bother with that. The p-value is based on the correct
covariance matrix of the rank sums, tie-breaking just adds noise to the
analysis. If you really want an exact p-value, package exactRankTests is
the ticket. (Or, if there is really only 3 females in 9 students, you
can get ambitious and set up the permutation distribution by enumerating
the choose(9,3)=84 possible outcomes.)
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com
I wouldn't bother with that. The p-value is based on the correct
covariance matrix of the rank sums, tie-breaking just adds noise to the
analysis. 
Good to know.
If you really want an exact p-value, package exactRankTests is
In this case I suspect that for upper secondary students we won't get
that fancy. This is the first time they deal with statistics beyond
mean, median and possibly standard deviation (and simple graphs, of
course).  Thus the request for simple and elegant solutions; arcane
incantations won't make them leave the comforts of Excel.
the ticket. (Or, if there is really only 3 females in 9 students, you
can get ambitious and set up the permutation distribution by enumerating
the choose(9,3)=84 possible outcomes.)
Well, this being totally fake data I created for a student exercise,
there can be as many male and female students in the class as I think
they can be bothered typing in... :-)

/Par