Skip to content

[Bioc-devel] NAMESPACE best practices

5 messages · Alexander Blume, Hervé Pagès, Kasper Daniel Hansen

#
Dear All,

I recently took over maintenance of the ?fastseg? package (http://bioconductor.org/packages/3.16/bioc/html/fastseg.html) and after fixing the issues recommended by `R CMD Check` I wanted to optimize the package's NAMESPACE file and the Depends/Imports given in the DESCRIPTION file. 

Replacing the generic complete `import` of dependent packages with more fine-grained `importFrom` calls is rather obvious. 
However, I was wondering if there are any reasons that speak against doing so?

Concerning the DESCRIPTION file, given that the used functions were already specified in the NAMESPACE I was planning to edit the DESCRIPTION file and move the ?GenomicRanges? and ?Biobase? dependencies from Depends to Imports. 
In the package, the Biobase functions are used to query supported ExpressionSet objects, while GenomicRanges is used to support Granges objects and create the final output as Granges object.  
Is it legit to have GenomicRanges ?only" as Imports, even if the main function's output is in GRanges format? 

I want to keep the ?Depends? field as small as possible to not pollute downstream packages to attach everything and mask other functions. Is this reasonable, or should I just import ?GenomicRanges? plus all required packages from the beginning and live with it? I hope there are some general guidelines to follow. 

Best
Alex
#
Hi Alex,
On 24/05/2022 03:56, Alexander Blume wrote:
In my experience doing selective imports for core packages like methods, 
BiocGenerics, S4Vectors, IRanges, and GenomicRanges, is almost never 
worth it. It's just one more maintenance burden for virtually zero benefits.

However, the following 'R CMD check' NOTES:

 ??? Namespace in Imports field not imported from: ?stats?

and

 ??? Consider adding
 ? ? ? importFrom("grDevices", "dev.cur", "dev.interactive", "dev.new")

reveal real problems that should be addressed.
The consequence of moving GenomicRanges from Depends to Imports is that 
the basic GRanges functionalities would no longer be available to your 
users so it would feel like you're returning objects that "don't work". 
Unfortunately I see many Bioconductor packages doing similar things e.g. 
some packages return SummarizedExperiment derivatives but don't depend 
on the SummarizedExperiment package (they only import it). As a 
consequence basic things like assay() or colData() don't work on the object.

Here is a concrete example:

 ? library(AUCell)
 ? exprMatrix <- cbind(cell1=100*4:0, cell2=c(500, 0, 90, 0, 750))
 ? rownames(exprMatrix) <- sprintf("gene%02d", seq_len(nrow(exprMatrix)))
 ? rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE, 
verbose=FALSE)? # a SummarizedExperiment derivative

 ? assay(rankings)
 ? # Error in assay(rankings) : could not find function "assay"

 ? colData(rankings)
 ? # Error in colData(rankings) : could not find function "colData"

 ? library(SummarizedExperiment)
 ? assay(rankings)
 ? # ? ?? ???? cells
 ? # ? genes??? cell1 cell2
 ? # ? ? gene01???? 1???? 2
 ? # ? ? gene02???? 2???? 4
 ? # ? ? gene03???? 3???? 3
 ? # ? ? gene04???? 4???? 5
 ? # ? ? gene05???? 5???? 1
Keeping Depends as small as possible is definitely something to aim for, 
as long as your users can still "operate" on the objects that you expose 
to them. For example your users should not need to guess what package to 
load before they can use the accessor functions defined for the object 
your returned to them.
Definitely keep GenomicRanges in Depends.

Cheers,

H.

  
    
#
Hi Herv?,

Thank you so much for your detailed response! These are some really helpful
advices.
I will take care of the missing imports and leave the Depends field as is.
You are right, in the end, the usability is most important.

Best,
Alex

Sent from mobile.

Herv? Pag?s <hpages.on.github at gmail.com> schrieb am Di., 24. Mai 2022,
19:43:

  
  
#
I agree with Herve: packages that define objects that the user actually
interacts with, should IMO be Depends.

import vs importFrom depends a bit on which package and how many functions
I use. There is a limit where I'm just like screw it, I'll get everything.

codetoolsBioC has a useful function writeNamespace().

Best,
Kasper

On Wed, May 25, 2022 at 5:56 AM Alexander Blume <alex.gos90 at gmail.com>
wrote:

  
    
1 day later
#
Dear Kasper,

Yes, I will keep the depends as is, since it was working fine before.

However, I guess I have to be a bit more selective with imports from the core packages, since there is a warning when I just load them using `import`: 

W  checking whether package ?fastseg? can be installed (16.8s)
   Found the following significant warnings:
     Warning: replacing previous import ?IRanges::median? by ?stats::median? when loading ?fastseg?
     Warning: replacing previous import ?IRanges::quantile? by ?stats::quantile? when loading ?fastseg?
     Warning: replacing previous import ?S4Vectors::sd? by ?stats::sd? when loading ?fastseg?

This warning is almost solved if I `importFrom` IRanges and S4Vectors functions as required, but leaves me with a new warning:

W  checking whether package ?fastseg? can be installed (18s)
   Found the following significant warnings:
     Warning: replacing previous import ?BiocGenerics::sd? by ?stats::sd? when loading ?fastseg?

Now I wonder if I the sd function defined by BiocGenerics will fall back to stats::sd when a numeric vector is given, 
so that I could drop the import of sd() from stats completely. 


I saw some mentions of codetoolsBioC already on StackOverflow, but was not really able to fetch it somehow using svn. 
Is there some magic command to download the repository?

Best
Alex