Kruskal-Wallace power calculations. - R-help

Thu, Apr 2, 2015 7:25 AM #

Greetings, I am working on a project where we are applying the
Kruskal-Wallace test to some factor data to evaluate their correlation with
existing grade data.  I know that the grade data is nonnormal therefore we
cannot rely on ANOVA or a similar parametric test.  What I would like to
find is a mechanism for making power calculations for the KW test given the
nonparametric assumptions.  My perusal of the literature has suggested that
a simulation would be the best method.

Can anyone point me to good examples of such simulations for KW in R?  And
does anyone have a favourite package for generating simulated data or
conducting such tests?

    Thank you,
    Collin.

Jeff Newmiller

Thu, Apr 2, 2015 8:23 AM #

Please stop... you are acting like a broken record, and are also posting in HTML format. Please read the Posting Guide and demonstrate that you have used a search engine on this topic before posting again.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

On April 2, 2015 7:25:20 AM PDT, Collin Lynch <cflynch at ncsu.edu> wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Jim Lemon

Thu, Apr 2, 2015 3:35 PM #

Hi Collin,
Have a look at this:

http://stats.stackexchange.com/questions/70643/power-analysis-for-kruskal-wallis-or-mann-whitney-u-test-using-r

Although, thinking about it, this might have constituted your "perusal of
the literature".

Plus it always looks better when you spell the names properly

Jim


On Fri, Apr 3, 2015 at 2:23 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Collin Lynch

Thu, Apr 2, 2015 7:19 PM #

Thank you Jim, I did see those (though not my typo :) and am still
pondering the warning about post-hoc analyses.

The situation that I am in is that I have a set of individuals who
have been assigned a course grade. We have then clustered these
individuals into about 50 communities using standard community
detection algorithms with the goal of determining whether community
membership affects one of their grades. We are using the KW test as
the grade data is strongly non-normal and my coauthors preferred KW as
an alternative.

The two issues that I am struggling with are: 1) whether the post-hoc
power analysis would be useful; and 2) how to code the simulation
studies that are described in:
http://onlinelibrary.wiley.com/doi/10.1002/bimj.4710380510/abstract

Problem #1 is of course beyond the scope of this e-mail list though I
would welcome anyone's suggestions on that point. I am not sure that
I buy the arguments against it offered here:

http://graphpad.com/support/faq/why-it-is-not-helpful-to-compute-the-power-of-an-experiment-to-detect-the-difference-actually-observed-why-is-post-hoc-power-analysis-futile/

It seems that the rationale boils down to "you didn't find it so you
couldn't find it" but that does not tell me how far off I was from the
goal. I am still perusing the articles the author cites however.

With respect to question #2 I am trying to lay my hands on the article
and did find this old r-help discussion:
http://r.789695.n4.nabble.com/Power-of-Kruskal-Wallis-Test-td4671188.html
however I am not sure how to adapt the simulation studies that it
links to to my current problem. The links it leads to focus on
mixed-effects models. This may be more of a pure stats question and
not suited for this list but I thought I'd ask in the hopes that
anyone had any more specific KW code or knew of a good tutorial for
the right kinds of simulation studies.

Thank you,
Collin.

On Thu, Apr 2, 2015 at 6:35 PM, Jim Lemon <drjimlemon at gmail.com> wrote:

Hi Collin,
Have a look at this:

http://stats.stackexchange.com/questions/70643/power-analysis-for-kruskal-wallis-or-mann-whitney-u-test-using-r

Although, thinking about it, this might have constituted your "perusal of
the literature".

Plus it always looks better when you spell the names properly

Jim


On Fri, Apr 3, 2015 at 2:23 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:

Please stop... you are acting like a broken record, and are also posting
in HTML format. Please read the Posting Guide and demonstrate that you have
used a search engine on this topic before posting again.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go
Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.
rocks...1k

---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On April 2, 2015 7:25:20 AM PDT, Collin Lynch <cflynch at ncsu.edu> wrote:

Greetings, I am working on a project where we are applying the
Kruskal-Wallace test to some factor data to evaluate their correlation
with
existing grade data.  I know that the grade data is nonnormal therefore
we
cannot rely on ANOVA or a similar parametric test.  What I would like
to
find is a mechanism for making power calculations for the KW test given
the
nonparametric assumptions.  My perusal of the literature has suggested
that
a simulation would be the best method.

Can anyone point me to good examples of such simulations for KW in R?
And
does anyone have a favourite package for generating simulated data or
conducting such tests?

   Thank you,
   Collin.

      [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Greg Snow

Fri, Apr 3, 2015 10:43 AM #

Here is some sample code:

## Simulation function to create data, analyze it using
## kruskal.test, and return the p-value
## change rexp to change the simulation distribution

simfun <- function(means, k=length(means), n=rep(50,k)) {
  mydata <- lapply( seq_len(k), function(i) {
    rexp(n[i], 1) - 1 + means[i]
  })
  kruskal.test(mydata)$p.value
}

# simulate under the null to check proper sizing
B <- 10000
out1 <- replicate(B, simfun(rep(3,4)))
hist(out1)
mean( out1 <= 0.05 )
binom.test( sum(out1 <= 0.05), B, p=0.05)

### Now simulate for power

B <- 10000
out2 <- replicate(B, simfun( c(3,3,3.2,3.3)))
hist(out2)
mean( out2 <= 0.05 )
binom.test( sum(out2 <= 0.05), B, p=0.05 )

This simulates from a continuous exponential (skewed) and shifts to
get the means (shifted location is a common assumption, though not
required for the actual test).

On Thu, Apr 2, 2015 at 8:19 PM, Collin Lynch <cflynch at ncsu.edu> wrote:

Thank you Jim, I did see those (though not my typo :) and am still
pondering the warning about post-hoc analyses.

The situation that I am in is that I have a set of individuals who
have been assigned a course grade. We have then clustered these
individuals into about 50 communities using standard community
detection algorithms with the goal of determining whether community
membership affects one of their grades. We are using the KW test as
the grade data is strongly non-normal and my coauthors preferred KW as
an alternative.

The two issues that I am struggling with are: 1) whether the post-hoc
power analysis would be useful; and 2) how to code the simulation
studies that are described in:
http://onlinelibrary.wiley.com/doi/10.1002/bimj.4710380510/abstract

Problem #1 is of course beyond the scope of this e-mail list though I
would welcome anyone's suggestions on that point. I am not sure that
I buy the arguments against it offered here:

http://graphpad.com/support/faq/why-it-is-not-helpful-to-compute-the-power-of-an-experiment-to-detect-the-difference-actually-observed-why-is-post-hoc-power-analysis-futile/

It seems that the rationale boils down to "you didn't find it so you
couldn't find it" but that does not tell me how far off I was from the
goal. I am still perusing the articles the author cites however.

With respect to question #2 I am trying to lay my hands on the article
and did find this old r-help discussion:
http://r.789695.n4.nabble.com/Power-of-Kruskal-Wallis-Test-td4671188.html
however I am not sure how to adapt the simulation studies that it
links to to my current problem. The links it leads to focus on
mixed-effects models. This may be more of a pure stats question and
not suited for this list but I thought I'd ask in the hopes that
anyone had any more specific KW code or knew of a good tutorial for
the right kinds of simulation studies.

Thank you,
Collin.

On Thu, Apr 2, 2015 at 6:35 PM, Jim Lemon <drjimlemon at gmail.com> wrote:

Hi Collin,
Have a look at this:

http://stats.stackexchange.com/questions/70643/power-analysis-for-kruskal-wallis-or-mann-whitney-u-test-using-r

Although, thinking about it, this might have constituted your "perusal of
the literature".

Plus it always looks better when you spell the names properly

Jim


On Fri, Apr 3, 2015 at 2:23 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:

Please stop... you are acting like a broken record, and are also posting
in HTML format. Please read the Posting Guide and demonstrate that you have
used a search engine on this topic before posting again.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go
Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.
rocks...1k

---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On April 2, 2015 7:25:20 AM PDT, Collin Lynch <cflynch at ncsu.edu> wrote:

Greetings, I am working on a project where we are applying the
Kruskal-Wallace test to some factor data to evaluate their correlation
with
existing grade data.  I know that the grade data is nonnormal therefore
we
cannot rely on ANOVA or a similar parametric test.  What I would like
to
find is a mechanism for making power calculations for the KW test given
the
nonparametric assumptions.  My perusal of the literature has suggested
that
a simulation would be the best method.

Can anyone point me to good examples of such simulations for KW in R?
And
does anyone have a favourite package for generating simulated data or
conducting such tests?

   Thank you,
   Collin.

      [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com

Collin Lynch

Fri, Apr 3, 2015 2:17 PM #

Thank you very much Greg, I will give that a try.

    Best,
    Collin.

On Fri, Apr 3, 2015 at 1:43 PM, Greg Snow <538280 at gmail.com> wrote:

Here is some sample code:

## Simulation function to create data, analyze it using
## kruskal.test, and return the p-value
## change rexp to change the simulation distribution

simfun <- function(means, k=length(means), n=rep(50,k)) {
  mydata <- lapply( seq_len(k), function(i) {
    rexp(n[i], 1) - 1 + means[i]
  })
  kruskal.test(mydata)$p.value
}

# simulate under the null to check proper sizing
B <- 10000
out1 <- replicate(B, simfun(rep(3,4)))
hist(out1)
mean( out1 <= 0.05 )
binom.test( sum(out1 <= 0.05), B, p=0.05)

### Now simulate for power

B <- 10000
out2 <- replicate(B, simfun( c(3,3,3.2,3.3)))
hist(out2)
mean( out2 <= 0.05 )
binom.test( sum(out2 <= 0.05), B, p=0.05 )

This simulates from a continuous exponential (skewed) and shifts to
get the means (shifted location is a common assumption, though not
required for the actual test).

On Thu, Apr 2, 2015 at 8:19 PM, Collin Lynch <cflynch at ncsu.edu> wrote:

Thank you Jim, I did see those (though not my typo :) and am still
pondering the warning about post-hoc analyses.

The situation that I am in is that I have a set of individuals who
have been assigned a course grade. We have then clustered these
individuals into about 50 communities using standard community
detection algorithms with the goal of determining whether community
membership affects one of their grades. We are using the KW test as
the grade data is strongly non-normal and my coauthors preferred KW as
an alternative.

The two issues that I am struggling with are: 1) whether the post-hoc
power analysis would be useful; and 2) how to code the simulation
studies that are described in:
http://onlinelibrary.wiley.com/doi/10.1002/bimj.4710380510/abstract

Problem #1 is of course beyond the scope of this e-mail list though I
would welcome anyone's suggestions on that point. I am not sure that
I buy the arguments against it offered here:

http://graphpad.com/support/faq/why-it-is-not-helpful-to-compute-the-power-of-an-experiment-to-detect-the-difference-actually-observed-why-is-post-hoc-power-analysis-futile/

It seems that the rationale boils down to "you didn't find it so you
couldn't find it" but that does not tell me how far off I was from the
goal. I am still perusing the articles the author cites however.

With respect to question #2 I am trying to lay my hands on the article
and did find this old r-help discussion:
http://r.789695.n4.nabble.com/Power-of-Kruskal-Wallis-Test-td4671188.html
however I am not sure how to adapt the simulation studies that it
links to to my current problem. The links it leads to focus on
mixed-effects models. This may be more of a pure stats question and
not suited for this list but I thought I'd ask in the hopes that
anyone had any more specific KW code or knew of a good tutorial for
the right kinds of simulation studies.

Thank you,
Collin.

On Thu, Apr 2, 2015 at 6:35 PM, Jim Lemon <drjimlemon at gmail.com> wrote:

Hi Collin,
Have a look at this:

http://stats.stackexchange.com/questions/70643/power-analysis-for-kruskal-wallis-or-mann-whitney-u-test-using-r

Although, thinking about it, this might have constituted your "perusal of
the literature".

Plus it always looks better when you spell the names properly

Jim


On Fri, Apr 3, 2015 at 2:23 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us>
wrote:

Please stop... you are acting like a broken record, and are also posting
in HTML format. Please read the Posting Guide and demonstrate that you have
used a search engine on this topic before posting again.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go
Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.
rocks...1k

---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.

On April 2, 2015 7:25:20 AM PDT, Collin Lynch <cflynch at ncsu.edu> wrote:

Greetings, I am working on a project where we are applying the
Kruskal-Wallace test to some factor data to evaluate their correlation
with
existing grade data.  I know that the grade data is nonnormal therefore
we
cannot rely on ANOVA or a similar parametric test.  What I would like
to
find is a mechanism for making power calculations for the KW test given
the
nonparametric assumptions.  My perusal of the literature has suggested
that
a simulation would be the best method.

Can anyone point me to good examples of such simulations for KW in R?
And
does anyone have a favourite package for generating simulated data or
conducting such tests?

   Thank you,
   Collin.

      [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.