[Bioc-devel] [devteam-bioc] Use Imports instead of Depends in the DESCRIPTION files of bioconductor packages.
On 12/31/2014 08:47 AM, Peng Yu wrote:
On Wed, Dec 31, 2014 at 9:41 AM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
On 12/24/2014 07:31 PM, Maintainer wrote:
Hi, Many bioconductor packages Depends on other packages but not Imports other packages. (e.g., IRanges Depends on BiocGenerics.) Imports is usually preferred to Depends. http://stackoverflow.com/questions/8637993/better-explanation-of-when-to-use-imports-depends http://obeautifulcode.com/R/How-R-Searches-And-Finds-Stuff/ Could the unnecessary Depends be forced to be replaced by Imports? This should improve the package load time significantly.
R package symbols and other objects are collated at build time into a 'name space'. When used, - Import: loads the name space from disk. - Depends: loads the name space from disk, and attaches it to the search() path. Attaching is very inexpensive compared to loading, so there is no speed improvement gained by Import'ing instead of Depend'ing.
Yes. For example, changing Depends to Imports does not improve the package load time much. But loading a package in 4 sec seems to be too long.
Generally, yes, it seems like this should at least give the illusion of fast load. 4 seconds is not long in comparison to the time spent in an interactive analysis session or processing sequence-scale data. Recognizing that package load times can be substantial may influence some approaches, e.g., avoiding unnecessary (re)loading of packages during development, preferring multi-core to socket or other parallelization strategies, using persistent R sessions when responding to web service requests. In MBASED, the DESCRIPTION file has Depends: RUnit, BiocGenerics, BiocParallel, GenomicRanges RUnit almost certainly belongs in Suggests: (no use to the end user; not used by R code except during package build / check) but this likely has minimal impact on load time; the major cost is the S4-heavy GenomicRanges and it's dependencies. During start-up a reasonable (e.g., 25%) performance benefit can be realized by telling R to allocate additional memory up-front; on my Linux box I have $ alias Rdev alias Rdev='R_LIBS_USER=/home/mtmorgan/R/x86_64-unknown-linux-gnu-library/devel /home/mtmorgan/bin/R-devel/bin/R --no-save --quiet --min-vsize=2048M --min-nsize=45M' Martin
system.time(suppressPackageStartupMessages(library(MBASED)))
user system elapsed 4.404 0.100 4.553 For example, it only takes 10% of the time to load ggplot2. It seems that many bioconductor packages have similar problems.
system.time(suppressPackageStartupMessages(library(ggplot2)))
user system elapsed 0.394 0.036 0.460
The main reason to Depend: on a package is because the symbols defined by the package are needed by the end-user. Import'ing a package is appropriate when the package provides functionality only relevant to the package author.
What causes the load time to be too long? Is it because exporting too many functions from all dependent packages to the global namespace?
There are likely to be specific packages that mis-use Depends; packages such as IRanges, GenomicRanges, etc use Depends: as intended, to provide functions that are useful to the end user. Maintainers are certainly encouraged to think carefully about adding packages providing functionality irrelevant to the end-user to the Depends: field. The codetoolsBioC package (available from svn, see http://bioconductor.org/developers/how-to/source-control/) provides some mostly reliable hints to package authors about correctly formulating a NAMESPACE file to facilitate using Imports: instead of Depends:. General questions about Bioconductor packages should be addressed to the support forum https://support.bioconductor.org. Questions about Bioconductor development (such as this) should be addressed to the bioc-devel mailing list (subscription required)
https://stat.ethz.ch/mailman/listinfo/bioc-devel. I have cc'd the bioc-devel mailing list; I hope that is ok.
Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793