Skip to content
Prev 9023 / 21312 Next

[Bioc-devel] new package: annotate function interaction from Reactome DB

As we exchanged in separate email, I think that the SIZE of this data 
does NOT require that you produce an annotation package.

The package guidelines 
http://bioconductor.org/developers/package-guidelines/#correctness say 
that the package should occupy less than 4MB on disk. Your package has

your-pkg$ du -sh
134M	.

The biggest files are

your-pkg$ find -type f -size +1M|xargs ls -sh
52M ./inst/extdata/all_gene_disease_associations.txt
7.4M ./inst/extdata/FIsInGene_121514_with_annotations.txt
5.0M ./inst/extdata/imgs/demoReactomeCmp.gif
17M ./inst/extdata/ListProfData.RData

The file that you wish to make an annotation package is about 7 MB. When 
stored as RDS rather than text, it is only 478k. Saving as RDS (or 
RData) means that input is also fast.

The largest file is used as a data.frame, and when saved as RDS it is 
only 3.5M. It might be that THIS data is a candidate for an annotation 
package, but likely the solution here is instead to develop the R script 
at http://www.disgenet.org/web/DisGeNET/menu/downloads#r into a function 
that returns web service requests as R objects for interactive use (see 
https://gist.github.com/mtmorgan/ea10d0d424bf7e414d8e064d903f026d)

The gif can be stored as png and is then 156K.

The ListProfData file seems to contain an environment with function 
definitions etc. Probably this file contains much more information than 
you intended; it is hard to know what it's actual size can be.

I know your package also contained MathJax, at about 33M on disk. As you 
have discovered, it is not necessary to include MathJax.

It seems that by appropriately representing the data, you will have a 
package that is close to the guidelines, and at the same time faster 
when accessing the data.


It might be argued that the file FIsInGene_121514_with_annotations.txt 
is useful in general, and for that reason it should be an annotation 
package. But it is so easy and quick to obtain

 
download.file("http://reactomews.oicr.on.ca:8080/caBigR3WebApp2014/FIsInGene_121514_with_annotations.txt.zip", 
tmp <- tempfile())
   xx = read.delim(unzip(tmp))

that it doesn't seem to justify the additional package infrastructure.


Finally, for the benefit of other package authors, we also mentioned in 
our off-list email the importance of appropriate attribution of data 
sources (clearly, in the DESCRIPTION file and / or in man pages 
describing the data) and ensuring that your use is consistent with how 
the data is licensed (via the License: field in the DESCRIPTION file, 
and / or the LICENSE file).


So please, reconsider the need for an annotation package for this data. 
Your reviewer recognized that your package was much too large; make the 
changes above and it will not be much to large, and so you will not need 
to make an annotation package.

Martin
On 04/07/2016 02:19 AM, Karim Mezhoud wrote:
This email message may contain legally privileged and/or confidential information.  If you are not the intended recipient(s), or the employee or agent responsible for the delivery of this message to the intended recipient(s), you are hereby notified that any disclosure, copying, distribution, or use of this email message is prohibited.  If you have received this message in error, please notify the sender immediately by e-mail and delete this email message from your computer. Thank you.