[Bioc-devel] Making GOstats pathway analysis using data from both a and b chip s in one go.
Hi,
Basically the problem with combining data from multiple chips is that
you need to be pretty careful about duplicate probes for single genes
(since GO maps at the EntrezGene level that is the level of resolution
you have to work with and double counting some genes and not others is a
pretty bad idea) and next it is getting the Universe right. This is
also incredibly important, and is straightforward for a single chip (and
that is what has been done in GOstats) and less straightforward for
anything else. You will need to do some programming to create your own
Universe and solve a few other problems (the GOHyper has a universe
argument that should work, let us know if it does not).
Finally as Wolfgang recently said, this approach (hypergeometric
testing) is probably not as good as Gene Set Enrichment (there are two
Bioc packages for doing this, one is called Category the other is called
sigPathway
best wishes
Robert
Peder.Worning at astrazeneca.com wrote:
Dear Bioconductor list,
I have been using the GOstats package to make pathway analysis of Affymetrix
data from the mouse MOE430a chips, and that works fine. Now I want to make
an analysis which includes the data from both the a and b chips in one go is
that possible?
My problem comes when I use the GOHyperG function where I have to make
either lib="moe430a" or lib="moe430b". Is there a way to incude annotation
from both chips in the same analysis?
Here is the R code I have used:
****************************************************************************
**************************************
library(GOstats)
library(moe430a)
library(moe430b)
library
indat <- read.delim("ASM+CSM+12X2+LPS_A.SAMROC.ann.molcl.logP.tab.txt",
header = T, sep = "\t", blank.lines.skip = T)
indata <- indat[!(substr(as.character(indat[,1]),1,3) %in% "AFF"),] # remove
AFF probesets
newdata_ASMup_01_pr02 <- indata[indata[,10] >= 2 & indata[,2] >=0.2,]
# GO pathway analysis from here
############################################################################
#############################################
newdata <- newdata_ASMup_01_pr02 # Here is the dataset chosen everything
from here must be repeated for a new dataset ####
newprobeset <- as.character(newdata[,1])
newprobeset_a <- as.character(newdata[newdata[,61]=="A" ,1])
newprobeset_b <- as.character(newdata[newdata[,61]=="B" ,1])
gNll_a = as.character(unlist(mget(newprobeset_a, moe430aLOCUSID)))
gNsLL_a <- unique(unlist(mget(newprobeset_a, env=moe430aLOCUSID,
ifnotfound=NA)))
gNll_b = as.character(unlist(mget(newprobeset_b, moe430bLOCUSID)))
gNsLL_b <- unique(unlist(mget(newprobeset_b, env=moe430bLOCUSID,
ifnotfound=NA)))
gNll <- c(gNll_a,gNll_b)
gNsLL <- unique(c(gNsLL_a,gNsLL_b))
# The three GO ontologies using only data from the a-chip
gGhyp_BP <- GOHyperG(gNsLL_a, lib="moe430a", what="BP")
gGO_BP <- makeGOGraph(gNll_a, Ontology="BP")
gGhyp_CC <- GOHyperG(gNsLL_a, lib="moe430a", what="CC")
gGO_CC <- makeGOGraph(gNll_a, Ontology="CC")
gGhyp_MF <- GOHyperG(gNsLL_a, lib="moe430a", what="MF")
gGO_MF <- makeGOGraph(gNll_a, Ontology="MF")
gGhyp_BP.pv <- gGhyp_BP$pv[nodes(gGO_BP)]
gg_BP = sort(gGhyp_BP.pv[gGhyp_BP.pv < 0.05])
gGhyp_CC.pv <- gGhyp_CC$pv[nodes(gGO_CC)]
gg_CC = sort(gGhyp_CC.pv[gGhyp_CC.pv < 0.05])
gGhyp_MF.pv <- gGhyp_MF$pv[nodes(gGO_MF)]
gg_MF = sort(gGhyp_MF.pv[gGhyp_MF.pv < 0.05])
gg_BP.terms <- getGOTerm(names(gg_BP))[["BP"]]
gg_CC.terms <- getGOTerm(names(gg_CC))[["CC"]]
gg_MF.terms <- getGOTerm(names(gg_MF))[["MF"]]
ggt_BP = as.character(unlist(gg_BP.terms))
numCh_BP = nchar(ggt_BP)
ggt2_BP = substr(ggt_BP, 1, 50)
ggt3_BP = paste("BP: ",ggt2_BP, ifelse(numCh_BP > 50, "..", ""), sep="")
ggt_CC = as.character(unlist(gg_CC.terms))
numCh_CC = nchar(ggt_CC)
ggt2_CC = substr(ggt_CC, 1, 50)
ggt3_CC = paste("CC: ",ggt2_CC, ifelse(numCh_CC > 50, "..", ""), sep="")
ggt_MF = as.character(unlist(gg_MF.terms))
numCh_MF = nchar(ggt_MF)
ggt2_MF = substr(ggt_MF, 1, 50)
ggt3_MF = paste("MF: ",ggt2_MF, ifelse(numCh_MF > 50, "..", ""), sep="")
##get counts
gg_BP.counts <- gGhyp_BP$intCounts[names(gg_BP)]
gg_CC.counts <- gGhyp_CC$intCounts[names(gg_CC)]
gg_MF.counts <- gGhyp_MF$intCounts[names(gg_MF)]
gg_BP.allcounts <- gGhyp_BP$goCounts[names(gg_BP)]
gg_CC.allcounts <- gGhyp_CC$goCounts[names(gg_CC)]
gg_MF.allcounts <- gGhyp_MF$goCounts[names(gg_MF)]
ggMat_BP = matrix(c(names(gg_BP.terms), ggt3_BP, signif(gg_BP,3),
gg_BP.counts, gg_BP.allcounts),
byrow=FALSE, nc=5, dimnames=list(1:length(gg_BP), c("GO ID",
"Term", "p-value","No of Genes","Total number")))
ggMat_CC = matrix(c(names(gg_CC.terms), ggt3_CC, signif(gg_CC,3),
gg_CC.counts, gg_CC.allcounts),
byrow=FALSE, nc=5, dimnames=list(1:length(gg_CC), c("GO ID",
"Term", "p-value","No of Genes","Total number")))
ggMat_MF = matrix(c(names(gg_MF.terms), ggt3_MF, signif(gg_MF,3),
gg_MF.counts, gg_MF.allcounts),
byrow=FALSE, nc=5, dimnames=list(1:length(gg_MF), c("GO ID",
"Term", "p-value","No of Genes","Total number")))
newMat <- rbind(ggMat_BP,ggMat_CC,ggMat_MF)
This is what I have done and it works beautifully, but how do I make the
analysis using both chips, because making the analysis seaprate of the A and
the B chip data doesn't make sense. It means that the B chip data is somehow
useless in this context. I have the same problem with my human data.
Some technical details: R is version 2.3.1 and GOstats is version 1.6.0 and
the annotation files are moe430a_1.12.0 and moeb430_1.12.0
Sincerly yours
Peder
________________________________________________ Peder Worning, PhD Senior Scientist, Bioinformatics AstraZeneca R&D Lund S-221 87 Lund, Sweden Tel: +46 46 33 72 93 Fax: +46 46 33 71 64 Email: peder.worning at astrazeneca.com _______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
Robert Gentleman, PhD Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M2-B876 PO Box 19024 Seattle, Washington 98109-1024 206-667-7700 rgentlem at fhcrc.org