Hi Martin, John, Thanks for the responses! I've tidied up some of the notes from this mailing list thread and posted them on the bug tracker. John, in this case, I think namespaces are relevant because for non-exported S4 classes, the class information is made available through the '.__C__<package>' symbol in the package's namespace, but not the package environment that gets attached to the search path. In this (rare, yet not impossible) sequence of events, it looks like R attempts to resolve the '.__C__<package>' symbol in the wrong environment, and so class information lookup fails, and we end up caching the wrong inheritance information. Thanks, Kevin
On Sun, Jul 31, 2016 at 5:12 AM, John Chambers <jmc at r-project.org> wrote:
(Just returning from the "wilds" of Canada, so not able to comment on the specifics, but ...) There is a basic point about generic functions that may be related to the "private" class question and my earlier remarks that Martin alluded to. R (and S4 before it) allows packages to define methods for a generic function in another package. Say, for plot() in graphics. The current model is that the generic plot() remains one function, specifically a generic associated with the graphics package. Package A might define a method corresponding to one or two classes defined in that package. When A is loaded, those methods are added to the table for plot() in the current session. Now suppose a user calls a function, foo(), in package B, and that foo() in turn calls plot(). This is the same plot() function, and in particular will include the methods supplied from package A. This is regardless of the two packages having any overt connection. Also, the methods are accepted by the generic function regardless of whether the class is explicitly exported or not. In this sense, classes cannot be entirely private if methods are defined for a non-local function. Namespaces are not directly relevant. Whether this can lead to strange behavior isn't clear, and if so, is it a sign that something undesirable was done in the particular example? (In Extending R, I suggested a "right to write methods" that would discourage a package from having methods unless it owned the function or some of the classes.) R could adopt a different model for generic functions, where a package that defined a method for a non-exported class would create a "local" version of the generic, but that would likely raise some other issues. But seems like a useful topic for discussion. John On Jul 30, 2016, at 11:07 AM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Kevin Ushey <kevinushey at gmail.com> on Fri, 29 Jul 2016 11:46:19 -0700 writes:
I should add one more item that may be related here -- calling 'methods:::.requirePackage' returns a different result based on whether the package namespace is already loaded or not.
If the package namespace is not loaded, the package is loaded and attached, and the package environment is returned:
methods:::.requirePackage("digest")
Loading required package: digest <environment: package:digest> attr(,"name") [1] "package:digest" attr(,"path") [1] "/Users/kevin/Library/R/3.3/library/digest"
"digest" %in% loadedNamespaces()
[1] TRUE
"package:digest" %in% search()
[1] TRUE
On the other hand, if the package namespace has already been loaded, the package namespace is returned without attaching the package:
requireNamespace("digest")
Loading required namespace: digest
methods:::.requirePackage("digest")
<environment: namespace:digest>
"digest" %in% loadedNamespaces()
[1] TRUE
"package:digest" %in% search()
[1] FALSE
This may be intentional, but the behavior seems surprising and could be responsible for the behavior outlined earlier.
Yes, the behavior you outlined earlier is buggy, and I also have
seen similar bugous behavior for the case of non-exported
classes.
Part of it is historical: The S4 code was mostly written before
namespaces were introduced into R; I vaguely remember John
Chambers (the principal creator of S4) saying that he did not
intend the formal classes to be not visible... which in some
sense only contains the fact that he (or anybody) would not
think much about hidden objects before they were introduced.
Still, in the mean time, most of us have seen many cases where
we wanted to have "private" classes, and many packages do have
them, too.... and they "mostly work" ;-)
In other words, I agree that it would be very desirable to get
to the bottom of this and fix such problems.
.requirePackage() is among the parts of the methods package code
which are quite delicate (and not much documented AFAIK, the hidden
.requirePackage() function is a good example!).
Delicate for at least two reasons:
1) They are not only used in crucial steps when "bootstrapping"
the methods package ('methods' has to define its own S4
generics, methods, and classes before the package "exists"),
1b) they are also used both when building and installing another
'methods'-dependent package. This could be called
"bootstrapping another package".
2) they are also much used whenever S4 code (aka
'methods'-dependent) packages are loaded and attached,
so should be as fast as possible.
I think you know that in recent years there have been
considerable (and successful!) efforts to speedup package
loading time, e.g. speeding up 'library(Matrix)' by about
2 factors of 2, and changing such functions should not
deteriorate package-load speed noticably.
Still, it is very desirable to improve / fix the issue:
After all this musings, I'd currently guess that it would be a
good idea if .requirePackage() would always return the
*namespace* of the corresponding package unless that does not
yet exist, as in the the 'bootstrap' situations above.
... or we'd add a new argument to .requirePackage() dealing with
that, or we use two functions: .requirePackage() needed for
boostrapping, returning a package envionment, and
.requireNamespace() used to access class (and generic function!)
environments.
Well tested patches are very welcome; as is filing a formal
bug report with bugzilla (https://bugs.r-project.org/ ;
(if you have no "account" there, we will have to
manually create one, for now, see the 'Note' on
https://www.r-project.org/bugs.html because of the spammer
attacks earlier this month ... but I see you, Kevin are
registered there),
or
just continuing your findings here.
Martin
Best, Kevin
On Fri, Jul 29, 2016 at 11:37 AM, Kevin Ushey <kevinushey at gmail.com> wrote:
I have a small idea as to what's going on now; at least, why exporting the class resolves this particular issue. Firstly, when an S4 class is not exported, the associated '.__C__<class>' object is not made part of the package environment. For example, I see:
getAnywhere(".__C__SubMatrix") A single object matching
'.__C__SubMatrix' was found It was found in the following places namespace:s4inherits with value < ... > Note that the symbol is only discovered in the package namespace. When the class is exported (e.g. with 'exportClasses(SubMatrix)' in the NAMESPACE file), it's found both in 'package:s4inherits' and 'namespace:s4inherits'. Secondly, when R attempts to resolve the superclasses for an S3 class, the function 'methods:::.extendsForS3' is called. Tracing that code eventually gets us here: https://github.com/wch/r-source/blob/trunk/src/library/methods/R/SClasses.R#L255 Note that we reach this code path as the S3 class cache has not been populated yet; ie, this code returns NULL: https://github.com/wch/r-source/blob/trunk/src/library/methods/R/SClasses.R#L238-L240 So, the class hierarchy is looked up using this code: if(isTRUE(nzchar(package))) { whereP <- .requirePackage(package) value <- get0(cname, whereP, inherits = inherits) } However, because the '.__C__SubMatrix' object is only made available in the package's namespace, not within the package environment, it is not resolved, and so lookup fails. (Presumedly, that lookup is done when initially building a cache for S3 dispatch?) So, I wonder if that class lookup should occur within the package's namespace instead? Thanks for your time, Kevin On Sat, Jun 25, 2016 at 12:46 PM, Kevin Ushey <kevinushey at gmail.com> wrote:
Hi, (sorry for the wall of text; the issue here appears to be rather complicated) I'm seeing a somewhat strange case where checking whether an S4 object inherits from a parent class defined from another package with 'inherits' fails if that object is materialized through a call to 'load'. That's a mouthful, so I've put together a relatively small reproducible example online here: https://github.com/kevinushey/s4inherits This package, 's4inherits', defines an S4 class, 'SubMatrix', that inherits from the 'dsyMatrix' class defined in the Matrix package. After installing the package, I run some simple tests: $ R -f test-save.R
library(s4inherits) data <- SubMatrix(1) is(data, "SubMatrix")
[1] TRUE
inherits(data, "SubMatrix")
[1] TRUE
is(data, "dsyMatrix")
[1] TRUE
inherits(data, "dsyMatrix")
[1] TRUE
save(data, file = "test.RData")
All the inheritance checks report as we would expect. I check that the inheritance reports are as expected, then save that object to 'test.RData'. I then load that data file in a new R session and run the same checks: $ R -f test-load.R
library(methods) load("test.RData")
inherits(data, "SubMatrix")
Loading required package: s4inherits [1] TRUE
is(data, "SubMatrix")
[1] TRUE
inherits(data, "dsyMatrix")
[1] FALSE # (??)
is(data, "dsyMatrix")
[1] TRUE
Note that R now reports that my loaded object does _not_ inherit from "dsyMatrix", yet this occurs only when checked with 'inherits()' -- 'is' produces the expected result. I do not see the behavior if I explicitly load / attach the 's4inherits' package before loading the associated data file; it only occurs if the package namespace is loaded in response to loading the data object hosting a 'SubMatrix' object. More precisely, the package namespace is loaded when the promise hosting the data object is evaluated; that promise being generated by 'load', if I understand correctly. Somehow, evaluation of that promise within the call to 'inherits' in this very special case causes the unexpected behavior -- ie, if the first thing that you do with the loaded object is check its class with 'inherits', then you get this unexpected result. Even more, this behavior seems to go away if the 's4inherits' package explicitly exports the class -- however, you could imagine this class normally being internal to the package, and so it may be undesirable to export it. I checked a bit into the C code, and IIUC the check here looks up the class hierarchy in the R_S4_extends_table object defined in 'attrib.c', so it seems like that mapping is potentially not getting constructed with the full hierarchy (although hopefully someone with more knowledge of the S4 internals + interaction with S3 can elaborate). (FWIW, this was distilled from a case where S3 dispatch on a similar loaded S4 object failed, due to failure to resolve the class hierarchy for the S3 dispatch context.) Thanks, Kevin --- $ R --slave -e "utils::sessionInfo()" R Under development (unstable) (2016-06-13 r70769) Platform: x86_64-apple-darwin15.5.0 (64-bit) Running under: OS X 10.11.5 (El Capitan)
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel