[Bioc-devel] homolog.db package
Thank you Marc, I really appreciate the time you spent answering to me and the effort of creating the packages. Since all this involves a lot of work from you and the other maintainers I just think that few methods and a vignettes would make these packages live. I think that these methods should be in the annotate package (or maybe in AnnotationDbi), rather than in a package created ad hoc. I will definitely included in my package, which I'll submit to bioC as soon as ready, however, add one method to annotate would have a greater "impact" on the users' end. Let's discuss this off-line. Ciao Luigi
On Nov 6, 2009, at 1:47 PM, Marc Carlson wrote:
Hi Luigi, I am the one responsible for trying to maintain order with the annotations in this project. To date, there has not been a lot of external interest in cross-species mappings. Hong built the homolog.db package that you found, but he didn't update these packages for the most recent release, so unless Hong speaks up rather soon, it really seems that might be more or less abandoned at this point. I have built the inparanoid packages in anticipation that someone like you might come along some day, but it is difficult to predict the future, let alone write software for it, which is why there are not a lot of functions to make use of these packages yet. I have experimented with some of these mapping problems, but they are not simple problems as the mappings frequently tend to be many to one or many to many. And layered on top of that is the fact that the inparanoid project uses inconsistent labels for protein sequences which means that you have additional mapping steps to do once you find a match. Anyhow the way that you want to handle this mapping will depend almost entirely on the context of what questions you are asking. If you have a particular use case in mind, or a specific need, lets discuss it offline and see what can be done about it. It might be that we can add some methods to improve things. The inparanoid packages themselves are due for a major overhaul very soon as the sources have very recently undergone a major revision. Marc Tony Chiang wrote:
Let's keep this onlist so others can also respond. I don't know about conference calls...I am certainly not apart of any if there are. I completely agree with you about efficiency and usability. It might be worthwhile to talk to the maintainers of the packages to see if methods already exist. You could always create your own package that depends on the .db packages and submit to Bioc. On Fri, Nov 6, 2009 at 12:45 AM, Luigi Marchionni <marchion at jhu.edu> wrote:
Is there any conference call or something like that where developer talk and discuss this? I do not want to annoy anyone, though I think I have a perspective about annotation that can contribute overall. I am a professional annotator, so to speak, from sequences, to ids, to ontologies, and I am a biologist, and R end user. My perspective is that something must be computational efficient, AND usable. I am not saying there is the need to change the packages, but that there is the need to provide methods. Maybe I am not aware of them if they exist (that's why I ask), but I can definitely can tell you what biologist-not-scared-of-R needs. One line of code to a bunch of tasks. like:
ReadAffy()
imagine:
mapEntrez(ids,'Mmu','Hsa')
The latter is what my code does. If I have to recode everything to make it work with maintained bioc metadata packages I'll do it. However, still remains the problem of where such methods should sit. A new package? annotate? AnnotationBbi? Thanks, and now I go to bed. Luigi On Nov 6, 2009, at 3:10 AM, Tony Chiang wrote: Hi Luigi, I was not criticizing...I just thought I might point you to some other packages. I believe that the annotation packages do have methods that allow for the translation fairly easily as these are now SQL databases (please correct me if I am wrong anyone). I don't maintain these packages. I certainly would not mind if someone were to write some simple methods or functions that wrapped around these data packages. There is always the problem of the many to many mapping though....and this might be why the annotation package is used rather than a flat file. Cheers, --Tony On Fri, Nov 6, 2009 at 1:02 AM, Luigi Marchionni <marchion at jhu.edu> wrote:
Thanks. I am a good citizen overall. I see now where all I needed is sitting. Though, metadata packages come with a drawback (I just expecting to be wrong again), they do not contain methods to do stuff. I really would like to provide people bioC compliant ways to run one line of code and get mapping done. This can go in any "software" package I am not aware of, however this is going to make the difference. I do not want to change the way things are, I want to make them works easily to the end user (a non-scared-by-R biologist). Anything maintained is fine by me, I mean it, I recode everything needed in my software, but methods are needed for the average user. If they are existing I apologize, otherwise I just say let's add them to "annotate", "AnnotationDbi", or where you think they should sit. Out of my ignorance, still, I ask: do we need species metadata packages for what it is in 1 flat file in Homologene? Since it is ignorance, be nice. Luigi On Nov 6, 2009, at 2:41 AM, Tony Chiang wrote: Hi Luigi, You might want to also have a look at the homologue annotation packages that can be found in Bioc. They are based up imparanoid. For instance the package for human would be hom.Hs.imp.db Cheers, --Tony On Thu, Nov 5, 2009 at 11:10 PM, Luigi Marchionni <marchion at jhu.edu>wrote:
Dear All, As I wrote to the list a couple of weeks ago I took on the endeavor of creating an S4 package for storing genomics results data and further analyze them. I had already code working to compare results across experiments, platform and species. To be a good citizen I start using S4, and I start relying on all classes already existing in Bioc. Now I came to the issue of dealing with mapping genes (and features) across species. I see that Hong Li maintains a package (homolog.db) containing such information, which depends on several other packages. I installed them and found difficult to use it. I will give you few examples: This retrieves the mapping between the Homologene ID and the Entrez Gene ID. Obviously each list element has a different length, however there is not easy way to tell the correspondence between organism and Entrez gene ID. I can say that the first 1 in both elements below is Human, then... If this has to be the structure, then each element in xx below should be names with the corresponding taxonomy id. See the chunk of code below: ################################################################################
xx <- as.list(homologHOMOLOG2GENEID) xx[1]
$`3` [1] 34 469356 490207 505968 11364 24158 406283 [8] 38864 1276346 181757 173979 181758 ################################################################################ By using the code below I can however retrieve the mapping between Entrez gene identifiers to Homologene identifiers. Lets consider the first two elements of xx[1] above: ################################################################################
yy <- as.list(homologHOMOLOG) yy["34"]
$`34` [1] 3
yy["469356"]
$`469356` [1] 3 ################################################################################ Using a little coding I can now map from one Entrez ID to another across species, although without knowing which species. So I can use species information: ################################################################################
zz["34"]
$`34` [1] 9606
zz["469356"]
$`469356` [1] 9598 ################################################################################ OK. now I know that Entrez ID "34" in Taxonomy "9006" (human) correspond to Entrez ID "469356" in n Taxonomy "9598" (which I do not know by heart), through the Homologene id "3". To learn the the second taxonomy I can do: ################################################################################
ff <- as.list(homologORGANISM) ff["9598"]
$`9598` [1] "Pan troglodytes" ################################################################################ Good! I had to play around a little with the code, however I could map the human Entrez ID "34" to the monkey "469356" one. However I think this is a little too complicated. To install homolog.db and (with dependencies=TRUE) I also had to install: org.Hs.ipi.db_1.1.1.tar.gz org.Hs.sp.db_1.1.1.tar.gz PAnnBuilder_1.9.0.tar.gz And the package does not point to a library that implements the chunks of code above to map Entrez ids across species. Look the code below, I load my mapping library (where the cross- mapping homologene table takes 3.2 Mb), I load this object, and the taxonomy information: ################################################################################
library(moreFGS) data(homol) data(tax) ls()
[1] "ff" "homol" "tax" "xx" "yy" "zz" ################################################################################ Finally I load a library containing the taxSwitch() function: ################################################################################
library(funcBox) args(taxSwitch)
function (IDs, org1, org2, whatIn = "EGID", whatOut = "EGID") NULL ################################################################################ Now look at this, for one ID: ################################################################################
taxSwitch("34","Homo","Pan","EGID","EGID")
[1] "469356"
taxSwitch("34","Homo","Pan","EGID","EGID")
[1] "469356"
taxSwitch("469356","Pan","Homo","EGID","EGID")
[1] "34"
taxSwitch("469356","Pan","Homo","EGID","symbol")
[1] "ACADM"
taxSwitch("34","Homo","Mus","EGID","symbol")
[1] "Acadm"
taxSwitch("Acadm","Mus","Homo","symbol","EGID")
[1] "34"
taxSwitch("Acadm","Mus","Pan","symbol","EGID")
[1] "469356"
taxSwitch("Acadm","Mus","Bos","symbol","EGID")
[1] "505968"
taxSwitch("Acadm","Mus","Bos","symbol","Acc")
[1] "NP_001068703"
taxSwitch("NP_001068703","Bos","Rattus","Acc","symbol")
[1] "Acadm" ################################################################################ Or more than one ID: ################################################################################
taxSwitch(c("34","37","3211"),"Homo","Mus","EGID","Acc")
[1] "NP_031408" "NP_059062" "NP_032292"
taxSwitch(c("34","37","3211"),"Homo","Mus","EGID","symbol")
[1] "Acadm" "Acadvl" "Hoxb1" ################################################################################ and so on. I would be very happy to provide bioconductor with the code to make the moreFGS library and with the taxSwitch() function. Luigi PS: the session info is below ################################################################################
sessionInfo()
R version 2.11.0 Under development (unstable) (2009-10-01 r49916) i386-apple-darwin9.8.0 locale: [1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets [6] methods base other attached packages: [1] moreFGS_1.0.2 homolog.db_1.1.1 [3] PAnnBuilder_1.9.0 RSQLite_0.7-3 [5] DBI_0.2-4 funcBox_0.0.3 [7] annotate_1.25.0 AnnotationDbi_1.9.0 [9] Biobase_2.7.0 limma_3.3.1 loaded via a namespace (and not attached): [1] tools_2.11.0 xtable_1.5-5 ################################################################################
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
[[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel