[Bioc-devel] depends on packages providing classes
Hi,
On 10/28/2014 08:51 PM, Vincent Carey wrote:
On Tue, Oct 28, 2014 at 5:48 PM, Herv? Pag?s <hpages at fredhutch.org
<mailto:hpages at fredhutch.org>> wrote:
On 10/28/2014 12:42 PM, Vincent Carey wrote:
On Tue, Oct 28, 2014 at 2:29 PM, Herv? Pag?s
<hpages at fredhutch.org <mailto:hpages at fredhutch.org>
<mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>> wrote:
Hi,
On 10/28/2014 08:48 AM, Vincent Carey wrote:
On Tue, Oct 28, 2014 at 11:23 AM, Kasper Daniel Hansen <
kasperdanielhansen at gmail.com <mailto:kasperdanielhansen at gmail.com>
<mailto:kasperdanielhansen at __gmail.com
<mailto:kasperdanielhansen at gmail.com>>> wrote:
Well, first I want to make sure that there is not
something
special
regarding S4 methods and classes. I have a feeling
that they
are a special
case.
Second, while I agree with Jim's general opinion,
it is a
little bit
different when I have return objects which are
defined in
other packages.
If I don't depend on this other package, the user
is hosed
wrt. the return
object, unless I manually export all classes from
this other
In what sense? If you return an instance of GRanges,
certain
things can be
done
even if GenomicRanges is not attached.
Yes certain things maybe, but it's hard to predict which ones.
You can get values of slots, for
example.
With the following little package
%vjcair> cat foo/NAMESPACE
importFrom(IRanges, IRanges)
importClassesFrom(____GenomicRanges, GRanges)
importFrom(GenomicRanges, GRanges)
export(myfun)
%vjcair> cat foo/DESCRIPTION
Package: foo
Title: foo
Version: 0.0.0
Author: VJ Carey <stvjc at channing.harvard.edu
<mailto:stvjc at channing.harvard.edu>
<mailto:stvjc at channing.__harvard.edu
<mailto:stvjc at channing.harvard.edu>>>
Description:
Suggests:
Depends:
Imports: GenomicRanges
Maintainer: VJ Carey <stvjc at channing.harvard.edu
<mailto:stvjc at channing.harvard.edu>
<mailto:stvjc at channing.__harvard.edu
<mailto:stvjc at channing.harvard.edu>>>
License: Private
LazyLoad: yes
%vjcair> cat foo/R/*
myfun = function(seqnames="1", ranges=IRanges(1,2), ...)
GRanges(seqnames=seqnames, ranges=ranges, ...)
The following works:
library(foo)
x = myfun()
x
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 1 [1, 2] *
-------
seqinfo: 1 sequence from an unspecified genome; no
seqlengths
So the show method works, even though I have not
touched it. (I
did not
expect it to work, in fact.)
Exactly. Let's call it luck ;-)
Additionally, I can get access to slots.
The end user should never try to access slots directly but
use getters
and setters instead. And most getters and setters for
GRanges objects
are defined and documented in the GenomicRanges package.
Those that are
not are defined in packages that GenomicRanges depends on.
But
ranges()
fails. If I, the user, want to use it, I need to
arrange for that.
IMO if your package returns a GRanges object to the user,
then the user
should be able to access the man page for GRanges objects
with ?GRanges.
Oddly enough, that seems to be incorrect. I added a man page to foo
that has
a \link[GenomicRanges]{GRanges-__class}. I ran help.start and
the cross
reference
from my man page succeeds. Furthermore with the sessionInfo
below, ?GRanges
succeeds at the CLI.
Did you try to run example(GRanges)? I'm not sure that will work.
Correct. Cursory look at source shows that help() uses loadedNamespaces()
to find the help file. example() could probably do likewise.
Sounds reasonable. So it seems that some recent changes in R make
it possible to access the man page and examples for stuff that
is imported but not attached. This is an important shift in paradigm
to me. In the past I would just rely on the simple notion that
what I can access with ? or example() reflects what's in my
search pass. Now if I do ?DNAStringSet and it succeeds, I can't
assume DNAStringSet() is in my search path anymore. And if I
want to copy/paste a few commands from the examples in order to
try them in my session, they might fail because the package where
these examples belong is not necessarily attached.
I wonder whether that means we should now start every example
section with library(foo)? The rationale for not doing it so far
was that if you can access the man page with ? then that means
the package is already attached.
As a side note the decision to extend the scope of ? to attached
packages and not to all installed packages feels arbitrary to me.
Going all the way would make ? even more useful and would be
consistent with what I see when navigating the documentation in
a browser. So when the user wants to call DNAStringSet() but
doesn't remember where it lives, ?DNAStringSet would be a quick
and easy way to know, and this whether the package is loaded via
a namespace or not.
Anyway, to get back to the original topic, IMO this change in R
still doesn't justify changing the Depends vs Imports game. I see
at least 3 strong cases for using 'Depends: A' instead of 'Imports: A'
in package B:
(1) B defines (and exports) a class that extend a class defined in A.
(2) B defines (and exports) methods for a generic defined in A.
(3) B defines (and exports) functions or methods that return
objects of a class defined in package A.
'Imports: A' should be reserved to situations where A is used
internally by B and in a way that is B's internal business only
and none of the end-user's business. A typical example is the
internal use of RSQLite and biomaRt in GenomicFeatures.
I can see the attractiveness of trying to minimize what gets attached
to the user's session but I'm also concerned that trying to go to far
in that direction ultimately has no real benefit and can hurt the
user-friendliness of the software.
H.
For example after I do library(rtracklayer), I can indeed do
?DNAStringSet at the command line (I'm surprised this works), but
then example(DNAStringSet) fails:
> example(DNAStringSet)
Warning message:
In example(DNAStringSet) : no help found for ?DNAStringSet?
I'm also surprised this is just a warning but that's another story...
H.
I am not trying to defend the NOTE but the
principle of minimizing
Depends declarations needs to be considered critically, and I am
just
exploring the space.
> ?GRanges # it worked as usual in the tty
> sessionInfo()
R version 3.1.1 (2014-07-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1]
en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8
attached base packages:
[1] stats graphics grDevices datasets utils tools
methods
[8] base
other attached packages:
[1] foo_0.0.0 rmarkdown_0.3.8 knitr_1.6
[4] weaver_1.31.0 codetools_0.2-9 digest_0.6.4
[7] BiocInstaller_1.16.0
loaded via a namespace (and not attached):
[1] BiocGenerics_0.11.5 evaluate_0.5.5 formatR_1.0
[4] GenomeInfoDb_1.1.26 GenomicRanges_1.17.48 htmltools_0.2.6
[7] IRanges_1.99.32 parallel_3.1.1 S4Vectors_0.2.8
[10] stats4_3.1.1 stringr_0.6.2 XVector_0.5.8
And that works only if the GenomicRanges package is
attached. Attaching
GenomicRanges will also attach other packages that
GenomicRanges depends
on where some GRanges accessors might be defined and
documented (e.g.
metadata()).
In some cases you'll decide you want the user to have a
full
complement of
methods for your package to function meaningfully. For
example,
I am
considering
using dplyr idioms to work with data structures in a
package,
and it seems
I should
just depend on dplyr rather than pick out and document
which
things I want
to expose. But that
may still be an undesirable design.
package, like
importClassesFrom("____GenomicRanges", "GRanges")
exportClasses("GRanges")
Surely that is not intended.
It is important that my package works without being
attached
to the search
path and I do this by carefully importing what I
need, ie.
my code does not
require that my dependencies are attached to the search
path. But the end
user will be hosed without it.
Yes s/he will. Fortunately when your package namespace gets
loaded by
another package, then nothing gets attached to the search
path, even if
your package depends (instead of imports) on other
packages. So using
Depends instead of Imports for your own dependencies won't
make any
difference in that respect, which is good.
My impression is that the NOTE in R CMD check was
written by
someone who
did not anticipate large-scale use and re-use of
classes and
methods across
many packages.
That's my impression too.
Cheers,
H.
Best,
Kasper
On Tue, Oct 28, 2014 at 11:14 AM, James W. MacDonald
<jmacdon at uw.edu <mailto:jmacdon at uw.edu>
<mailto:jmacdon at uw.edu <mailto:jmacdon at uw.edu>>>
wrote:
I agree with Vince. It's your job as a package
developer
to make
available to your package all the functions
necessary
for the package to
work. But I am not sure it is your job to load
all the
packages that your
end user might need.
Best,
Jim
On Tue, Oct 28, 2014 at 11:04 AM, Vincent Carey <
stvjc at channing.harvard.edu <mailto:stvjc at channing.harvard.edu>
<mailto:stvjc at channing.__harvard.edu
<mailto:stvjc at channing.harvard.edu>>> wrote:
On Tue, Oct 28, 2014 at 10:19 AM, Kasper
Daniel Hansen <
kasperdanielhansen at gmail.com <mailto:kasperdanielhansen at gmail.com>
<mailto:kasperdanielhansen at __gmail.com
<mailto:kasperdanielhansen at gmail.com>>> wrote:
What is the current best paradigm for
using all
the classes in
S4Vectors/GenomeInfoDb/____GenomicRanges/IRanges
I obviously import methods and classes
from the
relevant packages.
But shouldn't I depend on these packages as
well? Since I basically
want
the user to have this functionality at the
command line? That is what
I do
now.
I've wondered about this as well. It seems the
principle is that the
user
should
take care of attaching additional packages when
needed. It might be
appropriate
to give a hint in the package startup
message, if
having some other
package
attached
would typically be of great utility.
Given your list above, I would think that
depending
on GenomicRanges
would
often
be sufficient, and IRanges/S4Vectors would not
require dependency
assertion. I would
think that GenomeInfoDb should be a voluntary
attachment for a specific
session.
These are just my guesses -- I doubt there
will be
complete consensus,
but
I have
started to think very critically about using
Depends, and I think it is
better when its
use is minimized.
That of course leads to the R CMD check
NOTE on
depending on too many
packages.... I guess I should ignore
that one.
Best,
Kasper
[[alternative HTML version
deleted]]
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
[[alternative HTML version deleted]]
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>> mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
[[alternative HTML version deleted]]
___________________________________________________
Bioc-devel at r-project.org <mailto:Bioc-devel at r-project.org>
<mailto:Bioc-devel at r-project.__org
<mailto:Bioc-devel at r-project.org>>
mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
<mailto:hpages at fredhutch.org <mailto:hpages at fredhutch.org>>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Herv? Pag?s
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpages at fredhutch.org <mailto:hpages at fredhutch.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319