Skip to content

Qvalue package: I am getting back 1, 000 q values when I only want 1 q value.

4 messages · Jim Lemon, Thomas Ryan, Jay Tanzman

#
Hi all, I'm wondering if someone could put me on the right path to using
the "qvalue" package correctly.

I have an original p value from an analysis, and I've done 1,000
randomisations of the data set. So I now have an original P value and 1,000
random p values. I want to work out the false discovery rate (FDR) (Q; as
described by Storey and Tibshriani in 2003) for my original p value,
defined as the number of expected false positives over the number of
significant results for my original P value.

So, for my original P value, I want one Q value, that has been calculated
as described above based on the 1,000 random p values.

I wrote this code:

pvals <- c(list_of_p_values_obtained_from_randomisations)
qobj <-qvalue(p=pvals)
r_output1 <- qobj$pvalue
r_output2 <- qobj$qvalue

r_output1 is the list of 1,000 p values that I put in, and r_output2 is a q
value for each of those p values (i.e. so there are 1,000 q values).

The problem is I don't want there to be 1,000 Q values (i.e one for each
random p value). The Q value should be the false discovery rate (FDR) (Q),
defined as the number of expected false positives over the number of
significant results. So I want one Q value for my original P value, and to
calculate that one Q value using the 1,000 random P values I have generated.

Could someone please tell me where I'm going wrong.

Thanks
Tom
#
Hi Tom,
The vector qobj$qvalue seems to be the local false discovery rate for
each of your randomizations. Note that the manual implies that the p
values are those of multiple comparisons within a data set, not
randomizations of the data, so I'm not sure that your usage is valid
for the function..

Jim
On Fri, Jan 13, 2017 at 4:12 AM, Thomas Ryan <tombernardryan at gmail.com> wrote:
#
Jim,

Thanks for the reply. Yes I'm just playing around with the data at the
minute, but regardless of where the p values actually come from, I can't
seem to get a Q value that makes sense.

For example, in one case, I have an actual P value of 0.05.  I have a list
of 1,000 randomised p values: range of these randomised p values is 0.002
to 0.795, average of the randomised p values is 0.399 and the median of the
randomised p values is 0.45.

So I thought it would be reasonable to expect the FDR Q Value (i.e the
number of expected false positives over the number of significant results) to
be at least over 0.05, given that 869 of the randomised p values are >
0.05?

When I run the code:

library(qvalue)
list1 <-scan("ListOfPValues")

qobj <-qvalue(p=list1)

qobj$pi0


The answer is 0.0062. That's why I thought qobj$pi0 isn't the right
variable to be looking at? So my problem (or my mis-understanding) is that
I have an actual P value of 0.05, but then a Q value that is lower, 0.006?


Thanks again for your help,

Tom
On Thu, Jan 12, 2017 at 9:27 PM, Jim Lemon <drjimlemon at gmail.com> wrote:

            

  
  
3 days later
#
What you're doing makes no sense.  Given p-values p_i, i=1...n, resulting
from hypothesis tests t_i, i=1...n, the q-value of p_i is the expected
proportion of false positives among all n tests if the significance level
of each test is ?=p_i. Thus a q-value is only defined for an observed
p-value.  Assuming that you have stored n observed p-values in an R vector
P, and the ith p-value P[i]==.05, then the R syntax to obtain the q-value
for P[i] is qvalue(P)$qvalues[i].

If, instead (as I suspect), that .05 is not among your observed p-values,
but you want to know what the FDR would be, given your sequence of
p-values, if the significance level of every test were .05, then the R
syntax would be
max(qvalue(P)$qvalues[P<=.05]).

On Fri, Jan 13, 2017 at 2:08 AM, Thomas Ryan <tombernardryan at gmail.com>
wrote: