[Bioc-devel] library() calls removed in simpleSingleCell workflow
I haven't tried (= had to do it) myself, so I don't know exactly what it takes, but you can configure this "ulimit" of number of open files, e.g. instructions in https://stackoverflow.com/a/34645/1072091. I suspect it requires admin rights, but I'm not sure - maybe this is what goes on when you run it in different types terminals. About this open file/DLL limit: in src/main/Rdynload.c (https://github.com/wch/r-source/blob/tags/R-3-4-2/src/main/Rdynload.c#L173-L180) there's the following comment/clarification: /* Note that it is likely that dlopen will use up at least one file descriptor for each DLL loaded (it may load further dynamically linked libraries), so we do not want to get close to the fd limit (which may be as low as 256). By default, the maximum number of DLLs that can be loaded is 100. When the fd limit is known, we allow increasing the maximum number of DLLs via environment variable up to 60% of the limit on open files, but to no more than 1000. g */ I always thought that "as low as 256" was for some archaic system, but, as Wolfgang points out, it's a relevant limit. Since 0.6*256 = 153, this explains that the choice of the current default of a maximum 100 DLLs is reasonable and requests to bump it up much higher may not be feasible (not cross-platform). Related to this - "Garbage collection of DLLs": I've implemented R.utils::gcDLLs() that "Identifies and removes ["stray"] DLLs of packages already unloaded". This function will free up DLL slots otherwise occupied by unloaded packages. I've used is successfully in many places, e.g. trying to load and unload all my installed packages in a single R session (don't ask why ;)). However, as argued by Karl Millar (https://stat.ethz.ch/pipermail/r-devel/2016-December/073528.html), there is a risk that unregistering such DLLs may render the state of R unstable because we cannot know for sure whether there are some registered finalizers that rely on such DLLs that yet haven't been called. R.utils::gcDLLs() forces the garbage collector to run prior to unregistering DLLs, which should eliminate the risk for this problem. As far as I understand the current R implementation, this should be enough. On the other hand, I've been wrong before, I don't know about future version of R, and it has only been tested so much. Guaranteeing reentrancy of finalizers is really tricky. /Henrik
On Fri, Oct 6, 2017 at 10:16 AM, Wolfgang Huber <wolfgang.huber at embl.de> wrote:
Interesting! In iTerm2, I get $ ulimit -Sn 4864 and env R_MAX_NUM_DLLS=1000 R works, which means that on Mac it IS possible to have many more DLLs open than 100 if R is started in the right way. Wolfgang PS I meant OS X 10.12.6, too. SOrry for the typo. 6.10.17 14:50, Kasper Daniel Hansen scripsit:
On OS X 10.12.6 (I don't think 10.12.16 exists), I get
$ ulimit -Sn
7168
Interestingly, this is because I use iTerm2 for my command line prompt.
If I do the same command in Terminal I get 256. If I start R inside of
Emacs I get 256 as well. I don't know anything about ulimit and how it is
set, but that is a pretty start difference.
Best,
Kasper
On Fri, Oct 6, 2017 at 3:12 AM, Wolfgang Huber <wolfgang.huber at embl.de
<mailto:wolfgang.huber at embl.de>> wrote:
On Mac OSX 10.12.16:
$ ulimit -Sn
256
so the maximum value of R_MAX_NUM_DLLS is 153 ...
Wolfgang
5.10.17 23:02, Henrik Bengtsson scripsit:
About the DLL limit:
Just wanna make sure you're aware of "new" environment variable
R_MAX_NUM_DLLS available in R (>= 3.4.0). It allows you to push
the
current default limit of 100 open DLLs a bit higher. It can be
set in
.Renviron or before, e.g.
$ R_MAX_NUM_DLLS=500 R
This, of course, assumes that you can set it, which you might not
be
able to do on build servers. Also, there is an upper limit
min(0.6*fd_limit,1000) that depends on the number of files you can
have open at the same time (fd_limit), e.g. on my Ubuntu 16.04
I've
got:
$ ulimit -Sn
1024
so R_MAX_NUM_DLLS=614 is the maximum for me.
/Henrik
On Thu, Oct 5, 2017 at 11:22 AM, Wolfgang Huber
<wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>> wrote:
Breaking up long workflows into several smaller "modules"
each with a
clearly defined input and output is a good idea, certainly
for didactic &
maintenance reasons.
It doesn't "solve" the DLL issue though, it only avoids it
(for now)...
I believe you can use a Makefile for your vignettes
(https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes
<https://cran.r-project.org/doc/manuals/R-exts.html#Writing-package-vignettes>),
and this might be a good way of managing which depends on
which. For passing
along output/input, perhaps local .RData files are good
enough, perhaps some
wheel-reinventing can also be avoided by using
https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html
<https://bioconductor.org/packages/release/bioc/html/BiocFileCache.html>
(haven't actually used it yet, though).
Wolfgang
5.10.17 20:02, Aaron Lun scripsit:
This may relate to what I was thinking with respect to
solving the DLL
problem, by breaking up large workflows into modules
that can be executed in
separate R sessions. The same approach would also make
it easier to
associate package dependencies with specific parts of
the workflow.
In my particular situation, it is easy to break up the
workflow into
sections that can be executed completely independently.
However, I can also
imagine situations where dependencies on previous
objects, etc. make it
difficult to break up the workflow. If multiple files
are present in
vignettes/, can they be directed to execute in a
specific order, and would
output files from one vignette persist during the
execution of another?
-Aaron
------------------------------------------------------------------------
*From:* Wolfgang Huber <wolfgang.huber at embl.de
<mailto:wolfgang.huber at embl.de>>
*Sent:* Thursday, 5 October 2017 6:23:47 PM
*To:* Laurent Gatto; Aaron Lun
*Cc:* bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>
*Subject:* Re: [Bioc-devel] library() calls removed in
simpleSingleCell
workflow
I agree it is nice to be able to only load the packages
needed for a
certain section of a vignette and not the whole thing.
And that too many
`::` can make code look unwieldy (though some may
actually increase
readability).
But relying on manually sprinkled in `library` calls
seems like a hack
prone to error. And there are always bound to be
dependencies that are
non-local, e.g. on general infrastructure like
SummarizedExperiment,
ggplot2, dplyr.
So: do we need a way to computationally determine the
dependencies of a
vignette section, including highlighting/eliminating
potential name
clashes (b/c the warnings about masking emitted at
package loading are
easily ignored)? This seems like a straightforward
engineering task.
Eventually with such code analysis we could get rid of
explicit
`library` calls altogether :)
Wolfgang
5.10.17 08:53, Laurent Gatto scripsit:
On 5 October 2017 00:11, Aaron Lun wrote:
Here's another two cents from me:
The explicit library() calls allow for easy
copy-pasting if people
only want to use/adapt a section of the
workflow. In such cases,
calling "library(simpleSingleCell)" could drag
in a lot of unnecessary
packages (e.g., which could hit the DLL limit).
Reading through the
text to figure out the requirements for each
code chunk seems like a
pain, and lots of "::" are unwieldy.
More generally, the removal of individual
library() calls seems to
encourage the use of a single
"library(simpleSingleCell)" call at the
top of any user-developed custom analysis
scripts based on the
workflow. This seems conceptually odd to me -
the simpleSingleCell
package is simply a vehicle for the compiled
workflow, it shouldn't be
involved in analyses of other data.
I can confirm that this is a possibility.
Before workflows became available, I created the
RforProteomics package
that essentially provided one relatively large
vignette to demonstrate a
variety of applications of R/Bioconductor for mass
spectrometry and
proteomics. I think this has been a useful way to
disseminate R and
Bioconductor in these respective communities, but
also lead to the
confusion that it was that package that "did all the
stuff", i.e. people
saying that they were using RforProteomics to do a
task that was
described in the vignette. The RforProteomics
vignette does explicitly
call library at the beginning of each section and
explained that the
package was only a collection of analyses stemming
from other packages,
but that wasn't enough apparently.
Laurent
-Aaron
________________________________
From: Bioc-devel
<bioc-devel-bounces at r-project.org
<mailto:bioc-devel-bounces at r-project.org>> on
behalf of
Wolfgang Huber <wolfgang.huber at embl.de
<mailto:wolfgang.huber at embl.de>>
Sent: Thursday, 5 October 2017 8:26 AM
To: bioc-devel at r-project.org
<mailto:bioc-devel at r-project.org>
Subject: Re: [Bioc-devel] library() calls
removed in simpleSingleCell
workflow
I find `eval=FALSE` chunks not a good idea, since
- they confuse users who only see the rendered
HTML/PDF (where this flag
is not shown)
- they are not tested, so more prone to code rot.
I'd also like to object to the idea that
proximity of a `library` call
to code that uses a package is somehow didactic.
It's actually a bad
habit: the R interpreter does not care. The
relevant package
- can be mentioned in the narrative,
- stated in the code with the pkgname:: prefix.
The latter is good didactics to get people used
to the idea of
namespaces, especially since there is an
increasing frequency of name
clashes in CRAN, tidyverse, BioC (e.g. consider
the various functions
named 'filter' and the obscure malbehaviors that
can result from these).
Best wishes
Wolfgang
On 04/10/2017 22:20, Turaga, Nitesh wrote:
Hi Aaron,
A work around solution maybe to, put all
libraries in a ?eval=FALSE?
block in the r code chunk
```{r, eval=FALSE}
library(scran)
library(scater)
```
etc.
This way the users can see the library()
calls in the vignette.
Best,
Nitesh
On Oct 4, 2017, at 4:14 PM, Obenchain,
Valerie
<Valerie.Obenchain at RoswellPark.org> wrote:
Hi guys,
A little background on this vignette ->
package conversion. The
workflows were converted to package form
because we want to integrate them
into the nightly build system instead of
supporting separate machines as
we're now doing.
As part of this conversion, packages
loaded in workflow vignettes were
moved to Depends in DESCRIPTION. This
enables the user to load a single
package instead of many. Packages were
moved to Depends instead of Suggests
(as is usually done with software
packages) because these vignette is the
only thing these workflow
packages have going - no defined classes or methods.
This seemed a more
tidy approach and the dependencies are listed in Depends
for the user to
see. This was my (maybe bad?) idea and Nitesh was the
messenger. If you feel
the individual loading of packages in the vignette is a
key part of the
instruction/learning we can leave them as is and list
the packages in
Suggests.
I should also mention that incorporating
the workflows into the build
system won't happen until after the
release. At that time we'll move the
repositories from svn to git and it's
likely we'll have to ask maintainers
to abide by some time/space guidelines.
At that point the build machines
will be building software,
experimental data and workflows and resources aren't
unlimited. When that
time comes we'll update the workflow guidelines and
contact maintainers.
Thanks.
Valerie
On 10/04/2017 12:27 PM, Kasper Daniel
Hansen wrote:
yeah, that is super super useful to
people. In my vignettes (granted,
not
workflows) I have a separate
"Dependencies" section which is basically
a
series of library() calls.
On Wed, Oct 4, 2017 at 3:18 PM, Aaron Lun
<alun at wehi.edu.au
<mailto:alun at wehi.edu.au>><mailto:alun at wehi.edu.au
<mailto:alun at wehi.edu.au>> wrote:
Dear Nitesh, list;
The library() calls in the
simpleSingleCell workflow have been
removed.
Why is this? I find explicit library()
calls to be quite useful for
readers
of the compiled vignette, because it
makes it easier for them to
determine
the packages that are required to adapt
parts of the workflow for
their own
analyses. If it doesn't hurt the build
system, I would prefer to have
these
library() calls in the vignette.
Cheers,
Aaron
[[alternative HTML version
deleted]]
_______________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
[[alternative HTML version
deleted]]
_______________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org><mailto:Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
This email message may contain legally
privileged and/or confidential
information. If you are not the
intended recipient(s), or the employee or
agent responsible for the delivery of
this message to the intended
recipient(s), you are hereby notified
that any disclosure, copying,
distribution, or use of this email
message is
prohibited. If you have received this message in error,
please notify the
sender immediately by e-mail and delete this email
message from your
computer. Thank you.
[[alternative HTML version
deleted]]
_______________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
Bioc-devel Info Page - ETH
Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
stat.ethz.ch <http://stat.ethz.ch>
Your email address: Your name (optional): You
may enter a privacy
password below. This provides only mild
security, but should prevent others
from messing with ...
This email message may contain legally
privileged and/or confidential
information. If you are not the intended
recipient(s), or the employee or
agent responsible for the delivery of this
message to the intended
recipient(s), you are hereby notified that
any disclosure, copying,
distribution, or use of this email message is
prohibited. If you have received this message in error,
please notify the
sender immediately by e-mail and delete this email
message from your
computer. Thank you.
_______________________________________________
Bioc-devel at r-project.org
<mailto:Bioc-devel at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
Bioc-devel Info Page - ETH
Zurich<https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
stat.ethz.ch <http://stat.ethz.ch>
Your email address: Your name (optional): You
may enter a privacy
password below. This provides only mild
security, but should prevent others
from messing with ...
--
With thanks in advance-
Wolfgang
-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany
wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
http://www.huber.embl.de
--
With thanks in advance-
Wolfgang
-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany
wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
http://www.huber.embl.de
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
-- With thanks in advance-
Wolfgang
-------
Wolfgang Huber
Principal Investigator, EMBL Senior Scientist
European Molecular Biology Laboratory (EMBL)
Heidelberg, Germany
wolfgang.huber at embl.de <mailto:wolfgang.huber at embl.de>
http://www.huber.embl.de
_______________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org> mailing
list
https://stat.ethz.ch/mailman/listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>
-- With thanks in advance- Wolfgang ------- Wolfgang Huber Principal Investigator, EMBL Senior Scientist European Molecular Biology Laboratory (EMBL) Heidelberg, Germany wolfgang.huber at embl.de http://www.huber.embl.de
_______________________________________________ Bioc-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel