Skip to content

[Bioc-devel] Help with creating first Bioconductor package

11 messages · Julian Gehring, Tim Triche, Jr., Laurent Gatto +4 more

#
Dear all,

I am building my first Bioconductor package and, before wasting
everyone's time with a faulty submission, I would like to clarify
certain things.

1) The package seems to fulfill the requirements of the Bioconductor
Package Guidelines and passes all checks except one "consideration":

CONSIDER: Indenting lines with a multiple of 4 spaces;

I love to indent my code with 2 spaces, is it a problem? Or do I have
to reformat all code before release? This is doable, but it would
complicate my workflow and if I am allowed to avoid it, I will. I see
that not all Bioconductor packages stick to this formatting (many even
use tabs instead of spaces).

2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?

3) Finally, I would love to have someone to go through my package and
tell me what they think. I'm not sure to what extent the reviewing
process is technical (like on CRAN), or whether the actual content of
the package is evaluated as well. The package is small, it is
basically one or two functions that I need in my work and could not
find anywhere else -- it boils down to using Mann-Whitney for GO
enrichment tests on sorted lists of genes, coupled to an algorithm
that works like the "conditional=TRUE" option in hyperGtest. I have
been using this for a while in my work and it does exactly what I
wanted it to do.

Thank you so much,

j.
#
Hi January,
On Fri, Nov 14, 2014 at 11:52, January Weiner wrote:
2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?
In most cases, an R package will be compatible with all platforms.  It
may get tricky if your package contains compiled code or relies on
external libraries.
Once you upload your package for review, it will automatically get build
and checked on all three platforms.  This may be sufficient for a small
package.
If you want to check it beforehand, have a look at
e.g. http://win-builder.r-project.org/.
3) Finally, I would love to have someone to go through my package and
tell me what they think. I'm not sure to what extent the reviewing
process is technical (like on CRAN), or whether the actual content of
the package is evaluated as well. 

Whoever reviews your package for Bioc, will give you comments regarding
your package.  In my experience, the comments are very helpful, and
typically cover aspects from both a technical and a user point of view.
Best
Julian
#
feel free to put the code up on GitHub if you want comments prior to review
(e.g. "vignette won't compile", "dependencies are screwed up", "S3 objects
as far as the eye can see", etc.)



Statistics is the grammar of science.
Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>

On Fri, Nov 14, 2014 at 2:51 AM, January Weiner <january.weiner at gmail.com>
wrote:

  
  
#
Dear January,
On 14 November 2014 10:51, January Weiner wrote:

            
I don't think this is a reason for rejection, as long as all other
aspects of the formatting are fine (for instance miles-long lines).
As mentioned by Julian, you could use http://win-builder.r-project.org/
for Windows. But I don't think you are expected to check it on all
platforms before submission. If your package contains straightforward R
code, there is no reason to anticipate issues on other platforms.
The review will not centre on any scientific aspects of the code, but
will focus on technical and documentation aspects and inter-operability
within the Bioconductor environment, as per package preparation and
submission guidelines:

  http://bioconductor.org/developers/package-submission/
  http://bioconductor.org/developers/package-guidelines/

Hope this helps.

Best wishes,

Laurent
#
On 11/14/14, 9:20 AM, Laurent Gatto wrote:
Emacs and RStudio both have 2 spaces as the default, so you will be in 
good company.  My own package is not even internally consistent, as it 
contains code written by multiple people, all using text editors with 
different indentation defaults.  Sometimes I get the urge to go through 
the (many) .R files and reformat it all, but it never makes it to the 
top of the to-do list.
I don't have a Windows machine either, so I rely on the Bioc build 
system to alert me if something goes wrong on Windows but not the other 
platforms.

Stephanie
#
On 11/14/2014 02:51 AM, January Weiner wrote:
As Julian mentioned, the package gets automatically built across platforms when 
you submit it to Bioconductor; you're then free to iterate on the submission 
process until the tractable issues are addressed. Many developers submit their 
package in this way multiple times, and both sides (the developer, and the bioc 
core team) benefit. Eventually you'll either have green across all platforms 
(like on the build report http://bioconductor.org/checkResults/3.1/bioc-LATEST/) 
or you'll be stymied, someone from the Bioc team will be assigned to review your 
package, and they'll get you through the rest.

You'll want to develop your package on Bioc- and R-devel, as this is the 
environment in which your package will be introduced to the Bioc community.

I think the r-project win-builder is really intended to support CRAN-bound 
packages, so I'm not sure that it's appropriate to use that service.

It's our intention to pilot a new approach to training -- an hour-long google 
'hangout' with presentation then Q & A -- using new package submission as our 
test case; we're likely to offer the hangout in early December.

Martin

  
    
#
Dear all,

thanks for your input, it was very helpful. I have some other specific
questions, though:

Martin
Specifically, I won't want to; I will have to. This is the last
obstacle and I am aware of it. There is no way that I do my research
on development version of R (not only for scientific reasons,
unfortunately), so I need two versions running concurrently. There are
means and ways to do it (I guess from the fact that it all runs on svn
and that one can set up scripts setting environmental variables; there
is no real guide on that, am I right?), but from my experience, for
someone who is not a full time developer it will be horrible, and
keeping it up to date -- without automata and apt-get -- will sooner
or later lead to a disaster.

Julian:
I use it regularly to check my CRAN packages (pca3d, riverplot and
tagcloud), but I assumed that it does not have org.Hs.eg.db and GO.db
which I need for my vignette.

True, most likely there will not be any problems -- but I have had at
least once troubles with a package that did not build correctly on
Windows only (well, it did include C code).

Tim Triche, Jr:
Is using S3 a problem? For simple things like to overload a few
standard functions like plot and print? (Also, as a I user, I much
prefer limma's EList than anything that was even lying next to an
ExpressionSet; but then, I like Perl much more than Python).

Nathaniel Hayden:
That is precisely the package I had in mind when I said that it would
be annoying -- I'd still need to switch (and worse, *remember to
switch*) between the two formatting styles whenever I was to submit a
package :-)

Kind regards,

j.
#
----- Original Message -----
No.

Keep R-release up to date with your OS's package manager. Build R-devel from source
and call either don't put it in your PATH and/or call its executable something else (Rdev?) instead of R.
You won't have to update it unless there are newer features of R-devel that break your package.
As Martin mentioned, BioC has its own package builder available to you once you submit a package. It's absolutely fine if your package fails on one or more platforms when you submit it. Just look at the reports produced, fix the issue, and upload a new tarball. We don't find this annoying at all; on the contrary.
Don't change your formatting if you don't want to. As the BioCheck vignette says (and as the word CONSIDER should tip you off), doing so is optional. These things are a matter of taste. If you like to indent two spaces, that's fine. We find really long lines (> 80 characters) a bit annoying but the number of spaces you use is a matter of personal taste. It's good to be consistent with yourself though....so if adding code from a developer who likes to use 4 spaces, that might be a thing to fix.

Dan
#
On 11/14/2014 01:11 PM, January Weiner wrote:
It's not horrible, no. On Linux I do

First time, R-devel:

mkdir -p ~/src/R-devel
cd ~/src/R-devel
svn co https://svn.r-project.org/R/trunk
tools/rsync-recommended
cd ~/

mkdir -p ~/bin/R-devel
cd ~/bin/R-devel
~/src/R-devel/configure && make -j

mkdir -p ~/R/x86_64-unknown-linux-gnu-library/devel/

To update:

cd ~/src/R-devel && svn up && tools/rysnc-recommended
cd ~/bin/R-devel && make -j

To use

R_LIBS_USER=~/R/x86_64-unknown-linux-gnu-library/devel 
/home/mtmorgan/bin/R-devel/bin/R

The later is usually abbreviated by editing ~/.bash_alias

alias Rdev='R_LIBS_USER=/home/jweiner/R/x86_64-unknown-linux-gnu-library/devel 
/home/jweiner/bin/R-devel/bin/R'

so that you can say Rdev when you want to use R-devel. Always use biocLite() to 
install packages, from CRAN or Bioc, and you're fine. Do the same for the 
current release branch with the svn url 
https://svn.r-project.org/R/branches/R-3-1-branch.

There are many other ways to skin this cat (perhaps why there is no definitive 
guide), likely we'll hear some of them...
EList is an S4 class (that's a technically true statement, but it extends 'list' 
so doesn't benefit from, e.g., type checking).

Many packages should NOT implement classes of their own, but rather re-use 
existing classes to make it easier for the user to integrate the package into 
their work flow. Many useful existing classes in Bioconductor are S4 classes, so...

If you do implement a new class, then most likely it should _extend_ an existing 
class, so see the previous point.

Likely you want to contribute your package to Bioconductor because you'd like to 
interoperate with other Bioconductor packages (else why not contribute to CRAN, 
or point prospective users to github, or...). Here it pays to play well with the 
other packages you want to work with, so using the same objects they work with. 
In the sequencing realm, this is almost always a GRanges-related class, e.g, 
GRanges, GRangesList, SummarizedExperiment, GAlignments, VCF. In microarrays, 
and whatever your own prejudices are, you'll likely want to support working with 
an ExpressionSet.

If you implement something completely novel, then the difference between 
implementing it in S3 and in S4 is not that large, the 'user experience' should 
be almost identical, and the advantages of using a formal class system become 
apparent as the complexity of the software grows. It's possible to take short 
cuts in creating  both types of classes, e.g., expecting the user to access list 
elements directly in S3 or using slot access in S4 rather than providing an 
accessor (often a plain-old-function, for both S3 and S4), but that undermines 
the S4 benefits of reliable data representation for robust software (which you 
value, based on your reluctance to use development versions of software for 
scientific work).

Martin

  
    
#
Dear Martin,

thanks for your description, just a question
I think this should be

trunk/tools/rsync-recommended
and

~/src/R-devel/trunk/configure  && make -j

(etc.)
am I right?
Why should I do that? I thought R-3-1-branch should be generally the
stable release -- the one that I is conveniently installed and
distributed using my regular package manager, without me having to
worry about it. You have confused me.

Kind regards,

j.
#
On 11/15/2014 12:04 PM, January Weiner wrote:
yes; for the record I guess my recipe is more like

mkdir -p ~/src
cd ~/src
svn co https://svn.r-project.org/R/trunk R-devel

which creates ~/src/R-devel without the intervening /trunk. Sorry for the 
imprecision, but your understanding seems to be correct.
R-3-1-branch gets updated with changes that will make it to the 'next' release 
in the R-3.1 series, so for instance changes made now will appear in 3.1.2 when 
that is released.

Whether it's important as a Bioc developer to track the R-3-1-branch or not is a 
little involved.

Based on past experience, my guess is that R 3.1.2 will be a final 'bug fix' 
release shortly before the release of R 3.2.0. This means that users of the 
current Bioc 3.0 release will expect Bioc 3.0 packages to work with R 3.1.2, and 
that the responsible developer will check that that is the case. On the other 
hand, since the changes are in the 3.1 series one would expect the 3.1.2 changes 
to consist of bug fixes, rather than new features or changed functionality that 
breaks code that works with R-3.1.1 or R-3.1.0 (I don't believe there is any 
formal statement to this effect from R-core). Also, relatively few Bioc users 
will switch to 3.1.2, but will instead move more or less immediately to R 3.2.0 
(and to what will become Bioc 3.1), which (again based on past experience) will 
be released more-or-less at the same time as R 3.1.2.

Looking forward to the next development cycle and assuming R and Bioc releases 
follow the pattern that they have in the recent past, Bioc 3.2 and 3.3 will both 
be based on the R-3.2 series. Bioc 3.2 and Bioc 'devel' at the time of the Bioc 
3.2 release will be built against the R-3-2-branch, with R-devel more-or-less 
irrelevant from a Bioc perspective. R-devel will again become relevant after 
Bioc 3.3 is released.

Presumably that's enough confusion for one email; sorry about that.

Martin