[Bioc-devel] new package: annotate function interaction from Reactome DB
As we exchanged in separate email, I think that the SIZE of this data does NOT require that you produce an annotation package. The package guidelines http://bioconductor.org/developers/package-guidelines/#correctness say that the package should occupy less than 4MB on disk. Your package has your-pkg$ du -sh 134M . The biggest files are your-pkg$ find -type f -size +1M|xargs ls -sh 52M ./inst/extdata/all_gene_disease_associations.txt 7.4M ./inst/extdata/FIsInGene_121514_with_annotations.txt 5.0M ./inst/extdata/imgs/demoReactomeCmp.gif 17M ./inst/extdata/ListProfData.RData The file that you wish to make an annotation package is about 7 MB. When stored as RDS rather than text, it is only 478k. Saving as RDS (or RData) means that input is also fast. The largest file is used as a data.frame, and when saved as RDS it is only 3.5M. It might be that THIS data is a candidate for an annotation package, but likely the solution here is instead to develop the R script at http://www.disgenet.org/web/DisGeNET/menu/downloads#r into a function that returns web service requests as R objects for interactive use (see https://gist.github.com/mtmorgan/ea10d0d424bf7e414d8e064d903f026d) The gif can be stored as png and is then 156K. The ListProfData file seems to contain an environment with function definitions etc. Probably this file contains much more information than you intended; it is hard to know what it's actual size can be. I know your package also contained MathJax, at about 33M on disk. As you have discovered, it is not necessary to include MathJax. It seems that by appropriately representing the data, you will have a package that is close to the guidelines, and at the same time faster when accessing the data. It might be argued that the file FIsInGene_121514_with_annotations.txt is useful in general, and for that reason it should be an annotation package. But it is so easy and quick to obtain download.file("http://reactomews.oicr.on.ca:8080/caBigR3WebApp2014/FIsInGene_121514_with_annotations.txt.zip", tmp <- tempfile()) xx = read.delim(unzip(tmp)) that it doesn't seem to justify the additional package infrastructure. Finally, for the benefit of other package authors, we also mentioned in our off-list email the importance of appropriate attribution of data sources (clearly, in the DESCRIPTION file and / or in man pages describing the data) and ensuring that your use is consistent with how the data is licensed (via the License: field in the DESCRIPTION file, and / or the LICENSE file). So please, reconsider the need for an annotation package for this data. Your reviewer recognized that your package was much too large; make the changes above and it will not be much to large, and so you will not need to make an annotation package. Martin
On 04/07/2016 02:19 AM, Karim Mezhoud wrote:
Dear bioC devel, I write an annotate package named reactomeFI to avoid the big size files in /extdata folder. Finally when compressed the txt file to RDS format I reduce enough the size of files. reactomeFI provides annotation that do not exist in any other package (to my knowledge). Nor reactome.db nor PSICQUIC provide the arrow direction and the type of interaction. library(reactomeFI) dim(ld_reactomeFI(2014)) [1] 217249 5
dim(ld_reactomeFI(2015))
[1] 229300 5
head(ld_reactomeFI(version= 2015))
Gene1 Gene2 Annotation Direction Score 1 16-5-5 CDC42 predicted - 0.82 2 16-5-5 RHOJ predicted - 0.82 3 16-5-5 RHOQ predicted - 0.82 4 <DELTA>FAS/APO-1/CD95 BID activate -> 1.00 5 <DELTA>FAS/APO-1/CD95 CASP10 complex - 1.00 6 <DELTA>FAS/APO-1/CD95 DAXX complex; reaction - 1.00
tail(ld_reactomeFI(2015))
Gene1 Gene2 Annotation Direction Score 229295 ZP3 ZP4 complex - 1.00 229296 ZPR1 ZYX predicted - 0.59 229297 ZW10 ZWILCH complex; input - 1.00 229298 ZW10 ZWINT complex; input - 1.00 229299 ZWILCH ZWINT complex; input - 1.00 229300 ZXDA ZXDC predicted - 0.59 I can add other argument to specify the type of interaction or direction as ld_reactomeFI(version=2014, type=c(activated, complex), direction="arrowhead") I am ready to submit this package if you consider as new annotate information. Thank you, Karim [[alternative HTML version deleted]]
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel
This email message may contain legally privileged and/or confidential information. If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited. If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.