multiple hypothesis testing

Tue, Mar 17, 2009 4:47 AM

Vijaykumar Muley wrote:

Dear all,

Myself Vijaykumar Muley working as senior research fellow. By training I
am
a computational biologist with not a strong knowledge of statistics. I
have
done some analysis which is explained as follows,

I have 10340 (X) profiles of binary vectors with same length(N=845), I
will
call then "gene profiles"
for example...

    v1  v2  v3  v4.....vN
a  1   0    1   0      1
b  0   0    1   0      0
c  1   0    1   1      1
d  0   1    1   1      1
e  0   0    1   1      1
.  .   .    .   ........
.  .   .    .   ........
.  .   .    .   ........
upto
10340


then I have some other binary profiles with same length (N=845), here I
will
call then "expression profile";
    v1  v2  v3  v4.....vN
f1  1   0    1   0      1
f2  0   0    1   0      0
f3  1   0    1   1      1


now I am comparing profile f1 with all X profiles using hypergeometic
distribution function. What I am getting is p-value(probability) of the
similarity between profile f1 and all X profiles i.e. 10340 by random
chance
alone.

for example,

#pair   p-value

f1,a    1e-20
f1,b    0.01
.
.
upto
f1,10340 0.05

same thing i am doing with f2 and f3.

if we arrange this data(output) in better readable format, it looks like

      f1       f2    f3
a   1e-20    0.01  0.10
b   0.01     1e-9  0.02
c   1e-3     0.1   0.30
d   0.03     0.07  1e-5
e   1e-1     0.01  1e-9
.  .   .    .   ........
.  .   .    .   ........
.  .   .    .   ........
upto
10340


I hope everyone understood what type of output I am getting.

Now I want to perform multiple hypothesis comparision(P-value adjustment)
on
this data , so that I will get the statistically significant associations
between various "expression profiles" and "gene profiles" at specific
alpha
level;

Most conservative method for p-value adjustment is bonferroni and many
others with less conservation, I dont care which method I use but the
problem here is

according to what parameter I should use for correct or adjust p-values ?.

so in case of Bonferroni correction,
should I multiply the each p-value with 10340 or
as I have compared 3 expression profiles against 10340 gene profiles,
should
I multiply p-value with 3*10340

I am aksing this for understanding. What I want to do is

From the above gene, p-value table, I want to calculate the percentage of

false positive rate at each p-values from 0.0001 to 0.05
So that I can use a good cutoff as significance level (alpha) to exclude
the
gene profiles which are weakly associated with all expression profiles.
(If I am correct, to do this I need to use other p-value correction
methods,
either simulation based, resampling or
Benjamini and Hochberg (B&H).

Please can any one suuggests me about p-value adjustment or p-value
correction, I mean statistically or technically which number should I
consider for correction, 10340 or 3 * 10340, as I have three features to
associate with same 10340 gene set. or if I am wrong, can any one tell me
the protocol which I should refer to get fair number of significant
associations between genes and expression profiles.

I am using package "multtest" for p-value adjustment but literally I am
not
getting for correction,
should I give p-values for each expression profile alone or give it all
p-values ie. 3*10340.

I have gone through many tutorials and articles for multiple hypothesis
testing but really couldnt get exactly, what is it.

Please give me some clues, some of you may be actively working on p-value
adjustment / multiple hypothesis testing, I expect some suggestions.

I will be grateful for you kind help.

sincerely,

Please do NOT reply to a digest when posting to the list, you should start a
new thread (or at the very least delete the digest to which you are replying
from your email).

You may be interested False Discovery Rate (FDR) methods proposed by
Benjamini & Hochberg[1] and various related work/papers/software[2][3]

Neil 


[1] Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J. R. Statist Soc B
57:289-300
[2] http://genomics.princeton.edu/storeylab/qvalue/

View this message in context: http://www.nabble.com/multiple-hypothesis-testing-tp22512331p22557450.html
Sent from the R help mailing list archive at Nabble.com.

multiple hypothesis testing

Thread (2 messages)