Dear all,
I am building my first Bioconductor package and, before wasting
everyone's time with a faulty submission, I would like to clarify
certain things.
1) The package seems to fulfill the requirements of the Bioconductor
Package Guidelines and passes all checks except one "consideration":
CONSIDER: Indenting lines with a multiple of 4 spaces;
I love to indent my code with 2 spaces, is it a problem? Or do I have
to reformat all code before release? This is doable, but it would
complicate my workflow and if I am allowed to avoid it, I will. I see
that not all Bioconductor packages stick to this formatting (many even
use tabs instead of spaces).
2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?
3) Finally, I would love to have someone to go through my package and
tell me what they think. I'm not sure to what extent the reviewing
process is technical (like on CRAN), or whether the actual content of
the package is evaluated as well. The package is small, it is
basically one or two functions that I need in my work and could not
find anywhere else -- it boils down to using Mann-Whitney for GO
enrichment tests on sorted lists of genes, coupled to an algorithm
that works like the "conditional=TRUE" option in hyperGtest. I have
been using this for a while in my work and it does exactly what I
wanted it to do.
Thank you so much,
j.
On Fri, Nov 14, 2014 at 11:52, January Weiner wrote:
2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?
In most cases, an R package will be compatible with all platforms. It
may get tricky if your package contains compiled code or relies on
external libraries.
Once you upload your package for review, it will automatically get build
and checked on all three platforms. This may be sufficient for a small
package.
If you want to check it beforehand, have a look at
e.g. http://win-builder.r-project.org/.
3) Finally, I would love to have someone to go through my package and
tell me what they think. I'm not sure to what extent the reviewing
process is technical (like on CRAN), or whether the actual content of
the package is evaluated as well.
Whoever reviews your package for Bioc, will give you comments regarding
your package. In my experience, the comments are very helpful, and
typically cover aspects from both a technical and a user point of view.
Best
Julian
feel free to put the code up on GitHub if you want comments prior to review
(e.g. "vignette won't compile", "dependencies are screwed up", "S3 objects
as far as the eye can see", etc.)
Statistics is the grammar of science.
Karl Pearson <http://en.wikipedia.org/wiki/The_Grammar_of_Science>
On Fri, Nov 14, 2014 at 2:51 AM, January Weiner <january.weiner at gmail.com>
wrote:
Dear all,
I am building my first Bioconductor package and, before wasting
everyone's time with a faulty submission, I would like to clarify
certain things.
1) The package seems to fulfill the requirements of the Bioconductor
Package Guidelines and passes all checks except one "consideration":
CONSIDER: Indenting lines with a multiple of 4 spaces;
I love to indent my code with 2 spaces, is it a problem? Or do I have
to reformat all code before release? This is doable, but it would
complicate my workflow and if I am allowed to avoid it, I will. I see
that not all Bioconductor packages stick to this formatting (many even
use tabs instead of spaces).
2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?
3) Finally, I would love to have someone to go through my package and
tell me what they think. I'm not sure to what extent the reviewing
process is technical (like on CRAN), or whether the actual content of
the package is evaluated as well. The package is small, it is
basically one or two functions that I need in my work and could not
find anywhere else -- it boils down to using Mann-Whitney for GO
enrichment tests on sorted lists of genes, coupled to an algorithm
that works like the "conditional=TRUE" option in hyperGtest. I have
been using this for a while in my work and it does exactly what I
wanted it to do.
Thank you so much,
j.
--
-------- January Weiner --------------------------------------
http://logfc.wordpress.com
Dear all,
I am building my first Bioconductor package and, before wasting
everyone's time with a faulty submission, I would like to clarify
certain things.
1) The package seems to fulfill the requirements of the Bioconductor
Package Guidelines and passes all checks except one "consideration":
CONSIDER: Indenting lines with a multiple of 4 spaces;
I love to indent my code with 2 spaces, is it a problem? Or do I have
to reformat all code before release? This is doable, but it would
complicate my workflow and if I am allowed to avoid it, I will. I see
that not all Bioconductor packages stick to this formatting (many even
use tabs instead of spaces).
I don't think this is a reason for rejection, as long as all other
aspects of the formatting are fine (for instance miles-long lines).
2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?
As mentioned by Julian, you could use http://win-builder.r-project.org/
for Windows. But I don't think you are expected to check it on all
platforms before submission. If your package contains straightforward R
code, there is no reason to anticipate issues on other platforms.
3) Finally, I would love to have someone to go through my package and
tell me what they think. I'm not sure to what extent the reviewing
process is technical (like on CRAN), or whether the actual content of
the package is evaluated as well. The package is small, it is
basically one or two functions that I need in my work and could not
find anywhere else -- it boils down to using Mann-Whitney for GO
enrichment tests on sorted lists of genes, coupled to an algorithm
that works like the "conditional=TRUE" option in hyperGtest. I have
been using this for a while in my work and it does exactly what I
wanted it to do.
Dear January,
On 14 November 2014 10:51, January Weiner wrote:
Dear all,
I am building my first Bioconductor package and, before wasting
everyone's time with a faulty submission, I would like to clarify
certain things.
1) The package seems to fulfill the requirements of the Bioconductor
Package Guidelines and passes all checks except one "consideration":
CONSIDER: Indenting lines with a multiple of 4 spaces;
I love to indent my code with 2 spaces, is it a problem? Or do I have
to reformat all code before release? This is doable, but it would
complicate my workflow and if I am allowed to avoid it, I will. I see
that not all Bioconductor packages stick to this formatting (many even
use tabs instead of spaces).
I don't think this is a reason for rejection, as long as all other
aspects of the formatting are fine (for instance miles-long lines).
Emacs and RStudio both have 2 spaces as the default, so you will be in
good company. My own package is not even internally consistent, as it
contains code written by multiple people, all using text editors with
different indentation defaults. Sometimes I get the urge to go through
the (many) .R files and reformat it all, but it never makes it to the
top of the to-do list.
2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?
As mentioned by Julian, you could use http://win-builder.r-project.org/
for Windows. But I don't think you are expected to check it on all
platforms before submission. If your package contains straightforward R
code, there is no reason to anticipate issues on other platforms.
I don't have a Windows machine either, so I rely on the Bioc build
system to alert me if something goes wrong on Windows but not the other
platforms.
Stephanie
Dear all,
I am building my first Bioconductor package and, before wasting
everyone's time with a faulty submission, I would like to clarify
certain things.
1) The package seems to fulfill the requirements of the Bioconductor
Package Guidelines and passes all checks except one "consideration":
CONSIDER: Indenting lines with a multiple of 4 spaces;
I love to indent my code with 2 spaces, is it a problem? Or do I have
to reformat all code before release? This is doable, but it would
complicate my workflow and if I am allowed to avoid it, I will. I see
that not all Bioconductor packages stick to this formatting (many even
use tabs instead of spaces).
2) Another problem I have is the testing package on other platforms. I
do not have a Windows machine to test my package. Could someone help
me and test my package (build, check and BiocCheck) on Windows and
MacOS? Otherwise -- how do you check your packages? You keep an up to
date R development environment on three platforms?
As Julian mentioned, the package gets automatically built across platforms when
you submit it to Bioconductor; you're then free to iterate on the submission
process until the tractable issues are addressed. Many developers submit their
package in this way multiple times, and both sides (the developer, and the bioc
core team) benefit. Eventually you'll either have green across all platforms
(like on the build report http://bioconductor.org/checkResults/3.1/bioc-LATEST/)
or you'll be stymied, someone from the Bioc team will be assigned to review your
package, and they'll get you through the rest.
You'll want to develop your package on Bioc- and R-devel, as this is the
environment in which your package will be introduced to the Bioc community.
I think the r-project win-builder is really intended to support CRAN-bound
packages, so I'm not sure that it's appropriate to use that service.
It's our intention to pilot a new approach to training -- an hour-long google
'hangout' with presentation then Q & A -- using new package submission as our
test case; we're likely to offer the hangout in early December.
Martin
3) Finally, I would love to have someone to go through my package and
tell me what they think. I'm not sure to what extent the reviewing
process is technical (like on CRAN), or whether the actual content of
the package is evaluated as well. The package is small, it is
basically one or two functions that I need in my work and could not
find anywhere else -- it boils down to using Mann-Whitney for GO
enrichment tests on sorted lists of genes, coupled to an algorithm
that works like the "conditional=TRUE" option in hyperGtest. I have
been using this for a while in my work and it does exactly what I
wanted it to do.
Thank you so much,
j.
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
Dear all,
thanks for your input, it was very helpful. I have some other specific
questions, though:
Martin
You'll want to develop your package on Bioc- and R-devel, as this
is the environment in which your package will be introduced to
the Bioc community.
Specifically, I won't want to; I will have to. This is the last
obstacle and I am aware of it. There is no way that I do my research
on development version of R (not only for scientific reasons,
unfortunately), so I need two versions running concurrently. There are
means and ways to do it (I guess from the fact that it all runs on svn
and that one can set up scripts setting environmental variables; there
is no real guide on that, am I right?), but from my experience, for
someone who is not a full time developer it will be horrible, and
keeping it up to date -- without automata and apt-get -- will sooner
or later lead to a disaster.
Julian:
I use it regularly to check my CRAN packages (pca3d, riverplot and
tagcloud), but I assumed that it does not have org.Hs.eg.db and GO.db
which I need for my vignette.
True, most likely there will not be any problems -- but I have had at
least once troubles with a package that did not build correctly on
Windows only (well, it did include C code).
Tim Triche, Jr:
"S3 objects as far as the eye can see"
Is using S3 a problem? For simple things like to overload a few
standard functions like plot and print? (Also, as a I user, I much
prefer limma's EList than anything that was even lying next to an
ExpressionSet; but then, I like Perl much more than Python).
Nathaniel Hayden:
That is precisely the package I had in mind when I said that it would
be annoying -- I'd still need to switch (and worse, *remember to
switch*) between the two formatting styles whenever I was to submit a
package :-)
Kind regards,
j.
From: "January Weiner" <january.weiner at gmail.com>
Cc: "bioc-devel" <bioc-devel at r-project.org>
Sent: Friday, November 14, 2014 1:11:32 PM
Subject: Re: [Bioc-devel] Help with creating first Bioconductor package
Dear all,
thanks for your input, it was very helpful. I have some other
specific
questions, though:
Martin
You'll want to develop your package on Bioc- and R-devel, as this
is the environment in which your package will be introduced to
the Bioc community.
Specifically, I won't want to; I will have to. This is the last
obstacle and I am aware of it. There is no way that I do my research
on development version of R (not only for scientific reasons,
unfortunately), so I need two versions running concurrently. There
are
means and ways to do it (I guess from the fact that it all runs on
svn
and that one can set up scripts setting environmental variables;
there
is no real guide on that, am I right?), but from my experience, for
someone who is not a full time developer it will be horrible, and
keeping it up to date -- without automata and apt-get -- will sooner
or later lead to a disaster.
No.
Keep R-release up to date with your OS's package manager. Build R-devel from source
and call either don't put it in your PATH and/or call its executable something else (Rdev?) instead of R.
You won't have to update it unless there are newer features of R-devel that break your package.
I use it regularly to check my CRAN packages (pca3d, riverplot and
tagcloud), but I assumed that it does not have org.Hs.eg.db and GO.db
which I need for my vignette.
True, most likely there will not be any problems -- but I have had at
least once troubles with a package that did not build correctly on
Windows only (well, it did include C code).
As Martin mentioned, BioC has its own package builder available to you once you submit a package. It's absolutely fine if your package fails on one or more platforms when you submit it. Just look at the reports produced, fix the issue, and upload a new tarball. We don't find this annoying at all; on the contrary.
Tim Triche, Jr:
"S3 objects as far as the eye can see"
Is using S3 a problem? For simple things like to overload a few
standard functions like plot and print? (Also, as a I user, I much
prefer limma's EList than anything that was even lying next to an
ExpressionSet; but then, I like Perl much more than Python).
Nathaniel Hayden:
That is precisely the package I had in mind when I said that it would
be annoying -- I'd still need to switch (and worse, *remember to
switch*) between the two formatting styles whenever I was to submit a
package :-)
Don't change your formatting if you don't want to. As the BioCheck vignette says (and as the word CONSIDER should tip you off), doing so is optional. These things are a matter of taste. If you like to indent two spaces, that's fine. We find really long lines (> 80 characters) a bit annoying but the number of spaces you use is a matter of personal taste. It's good to be consistent with yourself though....so if adding code from a developer who likes to use 4 spaces, that might be a thing to fix.
Dan
Kind regards,
j.
--
-------- January Weiner --------------------------------------
http://logfc.wordpress.com
Dear all,
thanks for your input, it was very helpful. I have some other specific
questions, though:
Martin
You'll want to develop your package on Bioc- and R-devel, as this
is the environment in which your package will be introduced to
the Bioc community.
Specifically, I won't want to; I will have to. This is the last
obstacle and I am aware of it. There is no way that I do my research
on development version of R (not only for scientific reasons,
unfortunately), so I need two versions running concurrently. There are
means and ways to do it (I guess from the fact that it all runs on svn
and that one can set up scripts setting environmental variables; there
is no real guide on that, am I right?), but from my experience, for
someone who is not a full time developer it will be horrible, and
keeping it up to date -- without automata and apt-get -- will sooner
or later lead to a disaster.
It's not horrible, no. On Linux I do
First time, R-devel:
mkdir -p ~/src/R-devel
cd ~/src/R-devel
svn co https://svn.r-project.org/R/trunk
tools/rsync-recommended
cd ~/
mkdir -p ~/bin/R-devel
cd ~/bin/R-devel
~/src/R-devel/configure && make -j
mkdir -p ~/R/x86_64-unknown-linux-gnu-library/devel/
To update:
cd ~/src/R-devel && svn up && tools/rysnc-recommended
cd ~/bin/R-devel && make -j
To use
R_LIBS_USER=~/R/x86_64-unknown-linux-gnu-library/devel
/home/mtmorgan/bin/R-devel/bin/R
The later is usually abbreviated by editing ~/.bash_alias
alias Rdev='R_LIBS_USER=/home/jweiner/R/x86_64-unknown-linux-gnu-library/devel
/home/jweiner/bin/R-devel/bin/R'
so that you can say Rdev when you want to use R-devel. Always use biocLite() to
install packages, from CRAN or Bioc, and you're fine. Do the same for the
current release branch with the svn url
https://svn.r-project.org/R/branches/R-3-1-branch.
There are many other ways to skin this cat (perhaps why there is no definitive
guide), likely we'll hear some of them...
I use it regularly to check my CRAN packages (pca3d, riverplot and
tagcloud), but I assumed that it does not have org.Hs.eg.db and GO.db
which I need for my vignette.
True, most likely there will not be any problems -- but I have had at
least once troubles with a package that did not build correctly on
Windows only (well, it did include C code).
Tim Triche, Jr:
"S3 objects as far as the eye can see"
Is using S3 a problem? For simple things like to overload a few
standard functions like plot and print? (Also, as a I user, I much
prefer limma's EList than anything that was even lying next to an
ExpressionSet; but then, I like Perl much more than Python).
EList is an S4 class (that's a technically true statement, but it extends 'list'
so doesn't benefit from, e.g., type checking).
Many packages should NOT implement classes of their own, but rather re-use
existing classes to make it easier for the user to integrate the package into
their work flow. Many useful existing classes in Bioconductor are S4 classes, so...
If you do implement a new class, then most likely it should _extend_ an existing
class, so see the previous point.
Likely you want to contribute your package to Bioconductor because you'd like to
interoperate with other Bioconductor packages (else why not contribute to CRAN,
or point prospective users to github, or...). Here it pays to play well with the
other packages you want to work with, so using the same objects they work with.
In the sequencing realm, this is almost always a GRanges-related class, e.g,
GRanges, GRangesList, SummarizedExperiment, GAlignments, VCF. In microarrays,
and whatever your own prejudices are, you'll likely want to support working with
an ExpressionSet.
If you implement something completely novel, then the difference between
implementing it in S3 and in S4 is not that large, the 'user experience' should
be almost identical, and the advantages of using a formal class system become
apparent as the complexity of the software grows. It's possible to take short
cuts in creating both types of classes, e.g., expecting the user to access list
elements directly in S3 or using slot access in S4 rather than providing an
accessor (often a plain-old-function, for both S3 and S4), but that undermines
the S4 benefits of reliable data representation for robust software (which you
value, based on your reluctance to use development versions of software for
scientific work).
Martin
That is precisely the package I had in mind when I said that it would
be annoying -- I'd still need to switch (and worse, *remember to
switch*) between the two formatting styles whenever I was to submit a
package :-)
Kind regards,
j.
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
I think this should be
trunk/tools/rsync-recommended
mkdir -p ~/bin/R-devel
cd ~/bin/R-devel
~/src/R-devel/configure && make -j
and
~/src/R-devel/trunk/configure && make -j
(etc.)
am I right?
so that you can say Rdev when you want to use R-devel. Always use biocLite()
to install packages, from CRAN or Bioc, and you're fine. Do the same for the
current release branch with the svn url
https://svn.r-project.org/R/branches/R-3-1-branch.
Why should I do that? I thought R-3-1-branch should be generally the
stable release -- the one that I is conveniently installed and
distributed using my regular package manager, without me having to
worry about it. You have confused me.
Kind regards,
j.
-------- January Weiner --------------------------------------
I think this should be
trunk/tools/rsync-recommended
mkdir -p ~/bin/R-devel
cd ~/bin/R-devel
~/src/R-devel/configure && make -j
and
~/src/R-devel/trunk/configure && make -j
(etc.)
am I right?
yes; for the record I guess my recipe is more like
mkdir -p ~/src
cd ~/src
svn co https://svn.r-project.org/R/trunk R-devel
which creates ~/src/R-devel without the intervening /trunk. Sorry for the
imprecision, but your understanding seems to be correct.
so that you can say Rdev when you want to use R-devel. Always use biocLite()
to install packages, from CRAN or Bioc, and you're fine. Do the same for the
current release branch with the svn url
https://svn.r-project.org/R/branches/R-3-1-branch.
Why should I do that? I thought R-3-1-branch should be generally the
stable release -- the one that I is conveniently installed and
distributed using my regular package manager, without me having to
worry about it. You have confused me.
R-3-1-branch gets updated with changes that will make it to the 'next' release
in the R-3.1 series, so for instance changes made now will appear in 3.1.2 when
that is released.
Whether it's important as a Bioc developer to track the R-3-1-branch or not is a
little involved.
Based on past experience, my guess is that R 3.1.2 will be a final 'bug fix'
release shortly before the release of R 3.2.0. This means that users of the
current Bioc 3.0 release will expect Bioc 3.0 packages to work with R 3.1.2, and
that the responsible developer will check that that is the case. On the other
hand, since the changes are in the 3.1 series one would expect the 3.1.2 changes
to consist of bug fixes, rather than new features or changed functionality that
breaks code that works with R-3.1.1 or R-3.1.0 (I don't believe there is any
formal statement to this effect from R-core). Also, relatively few Bioc users
will switch to 3.1.2, but will instead move more or less immediately to R 3.2.0
(and to what will become Bioc 3.1), which (again based on past experience) will
be released more-or-less at the same time as R 3.1.2.
Looking forward to the next development cycle and assuming R and Bioc releases
follow the pattern that they have in the recent past, Bioc 3.2 and 3.3 will both
be based on the R-3.2 series. Bioc 3.2 and Bioc 'devel' at the time of the Bioc
3.2 release will be built against the R-3-2-branch, with R-devel more-or-less
irrelevant from a Bioc perspective. R-devel will again become relevant after
Bioc 3.3 is released.
Presumably that's enough confusion for one email; sorry about that.
Martin
Kind regards,
j.
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793