I'm going to jump in here on the GSEA direction of change issue as a molecular biologist. I can think of examples where it would be relevant, and important to test for, and examples where it would be totally irrelevant. I am currently involved in research searching for genes involved with preference for alcohol. We may want to use a geneSet that consists of all the genes known to be directly involved in dopamine transmission in the brain. Because these genes may be up or down regulated in some unknown pattern because of feedback loops, direction of change would be irrelevant. We just want to know if, on average, dopamine related genes are differentially expressed. It may be that one receptor subunit is up-regulated and another down-regulated, we wouldn't know any of this a priori. On the other hand, suppose we wanted to construct a geneSet with a set of genes found significant in a mouse experiment and then see if this geneSet was significant in a GSEA analysis of our rats. Although it may not be, here the pattern of up/down may be important if pathways are similarly different between preferring and non-preferring lines. In this situation it would be nice to be able to test with both options, i.e. with directionality taken into account and without it. One way to do this would be for the geneSet object to contain a slot (attribute) that indicates whether the geneSet has directional information in it or not. Another slot would be a string describing how the phenotypes used to construct the geneSet are related to the directionality of individual genes (example: "Direction of change is with respect to alcohol preferring vs. non-preferring, with +1 correlating with increased expression in the preferring phenotype and -1 to decreased expression."). To make the use of this directionality optional, the GSEA algorithms would look at the slot for directional information (TRUE/FALSE). If FALSE, the a non-directional test would be applied, if TRUE, then a function argument would have to state whether to use directional information or not (TRUE/FALSE). Certainly you developers would have a better handle than I on how to implement this, but I thought I would jump in and give an idea of how an end-user might use this information. And sorry if I messed up the thread on this discussion, I get the digest of the developer newsgroup. Mark
Mark W. Kimpel MD Neuroinformatics Department of Psychiatry Indiana University School of Medicine