Dear core group,
Which operation in R guarantees to get a true copy of an atomic vector,
not just a second symbol pointing to the same shared memory?
y <- x[]
#?
y <- x
y[1] <- y[1]
#?
Is there any function that returns its argument as a non-shared atomic
but only copies if the argument was shared?
Given an atomic vector x, what is the best official way to find out
whether other symbols share the vector RAM? Querying NAMED() < 2 doesn't
work because .Call sets sxpinfo_struct.named to 2. It even sets it to 2
if the argument to .Call was a never-named expression!?
> named(1:3)
[1] 2
And it seems to set it permanently, pure read-access can trigger
copy-on-modify:
> x <- integer(1e8)
> system.time(x[1]<-1L)
User System verstrichen
0 0 0
> system.time(x[1]<-2L)
User System verstrichen
0 0 0
having called .Call now leads to an unnecessary copy on the next assignment
> named(x)
[1] 2
> system.time(x[1]<-3L)
User System verstrichen
0.14 0.07 0.20
> system.time(x[1]<-4L)
User System verstrichen
0 0 0
this not only happens with user written functions doing read-access
> is.unsorted(x)
[1] TRUE
> system.time(x[1]<-5L)
User System verstrichen
0.11 0.09 0.21
Why don't you simply give package authors read-access to
sxpinfo_struct.named in .Call (without setting it to 2)? That would give
us more control and also save some unnecessary copying. I guess once R
switches to reference-counting preventive increasing in .Call could not
be continued anyhow.
Kind regards
Jens Oehlschl?gel
P.S. please cc me in answers as I am not member of r-devel
P.P.S. function named() was tentatively defined as follows:
named <- function(x)
.Call("R_bit_named", x, PACKAGE="bit")
SEXP R_bit_named(SEXP x){
SEXP ret_;
PROTECT( ret_ = allocVector(INTSXP,1) );
INTEGER(ret_)[0] = NAMED(x);
UNPROTECT(1);
return ret_;
}
> version
_
platform x86_64-w64-mingw32
arch x86_64
os mingw32
system x86_64, mingw32
status Under development (unstable)
major 3
minor 1.0
year 2014
month 02
day 28
svn rev 65091
language R
version.string R Under development (unstable) (2014-02-28 r65091)
nickname Unsuffered Consequences
internal copying in R (soon to be released R-3.1.0
5 messages · Jens Oehlschlägel, Simon Urbanek, Thomas Lumley
1 day later
On Mar 2, 2014, at 12:37 PM, Jens Oehlschl?gel <jens.oehlschlaegel at truecluster.com> wrote:
Dear core group, Which operation in R guarantees to get a true copy of an atomic vector, not just a second symbol pointing to the same shared memory?
None, there is no concept of "shared" memory at R level. You seem to be mixing C level API specifics and the R language. In the former duplicate() creates a new copy.
y <- x[] #? y <- x y[1] <- y[1] #? Is there any function that returns its argument as a non-shared atomic but only copies if the argument was shared? Given an atomic vector x, what is the best official way to find out whether other symbols share the vector RAM? Querying NAMED() < 2 doesn't work because .Call sets sxpinfo_struct.named to 2. It even sets it to 2 if the argument to .Call was a never-named expression!?
named(1:3)
[1] 2
Assuming that you are talking about the C API, please consider reading about the concepts involved. .Call() doesn't set named to 2 at all - it passes whatever object is passed so it is the C code's responsibility to handle incoming objects according to the desired semantics (see the previous post here).
And it seems to set it permanently, pure read-access can trigger copy-on-modify:
x <- integer(1e8) system.time(x[1]<-1L)
User System verstrichen
0 0 0
system.time(x[1]<-2L)
User System verstrichen
0 0 0
having called .Call now leads to an unnecessary copy on the next assignment
named(x)
[1] 2
system.time(x[1]<-3L)
User System verstrichen
0.14 0.07 0.20
system.time(x[1]<-4L)
User System verstrichen
0 0 0
this not only happens with user written functions doing read-access
is.unsorted(x)
[1] TRUE
system.time(x[1]<-5L)
User System verstrichen
0.11 0.09 0.21
Why don't you simply give package authors read-access to sxpinfo_struct.named in .Call (without setting it to 2)? That would give us more control and also save some unnecessary copying.
Again, you're barking up the wrong tree - .Call() doesn't bump NAMED at all - it simply passes the object:
#include <Rinternals.h>
SEXP nam(SEXP x) { return ScalarInteger(NAMED(x)); }
.Call("nam", 1+1)
[1] 0
x=1+1
.Call("nam", x)
[1] 1
y=x
.Call("nam", x)
[1] 2 Cheers, Simon
I guess once R switches to reference-counting preventive increasing in .Call could not be continued anyhow.
Kind regards
Jens Oehlschl?gel
P.S. please cc me in answers as I am not member of r-devel
P.P.S. function named() was tentatively defined as follows:
named <- function(x)
.Call("R_bit_named", x, PACKAGE="bit")
SEXP R_bit_named(SEXP x){
SEXP ret_;
PROTECT( ret_ = allocVector(INTSXP,1) );
INTEGER(ret_)[0] = NAMED(x);
UNPROTECT(1);
return ret_;
}
version
_ platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status Under development (unstable) major 3 minor 1.0 year 2014 month 02 day 28 svn rev 65091 language R version.string R Under development (unstable) (2014-02-28 r65091) nickname Unsuffered Consequences
______________________________________________ R-devel at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Thanks for answering Simon, > None, there is no concept of "shared" memory at R level. You seem to be mixing C level API specifics and the R language. In the former duplicate() creates a new copy. I take this as evidence that calling duplicate() is the only way to make sure I have a non-shared object. > Assuming that you are talking about the C API, please consider reading about the concepts involved. .Call() doesn't set named to 2 at all - it passes whatever object is passed so it is the C code's responsibility to handle incoming objects according to the desired semantics (see the previous post here). Well, I did read, for example "Writing R Extensions" (Version 3.1.0 Under development (2014-02-28)) chapter "5.9.10 Named objects and copying" which says "Currently all arguments to a .Call call will have NAMED set to 2, and so users must assume that they need to be duplicated before alteration." This is consistent with the observation of my test code: that NAMED() in .Call always returns 2. And that a .Call doing pure read access will trigger some delay most likely due to a full vector copy is a sign of .Call not only setting NAMED to 2 but also not resetting it once .Call terminates. So what is needed to find NAMED(SEXP argument) < 2 during .Call? Kind regards Jens
Jens,
On Mar 3, 2014, at 3:35 PM, Jens Oehlschl?gel <jens.oehlschlaegel at truecluster.com> wrote:
Thanks for answering Simon,
None, there is no concept of "shared" memory at R level. You seem to be mixing C level API specifics and the R language. In the former duplicate() creates a new copy.
I take this as evidence that calling duplicate() is the only way to make sure I have a non-shared object.
If NAMED > 0 then calling duplicate() is necessary to make sure you have a non-shared copy.
Assuming that you are talking about the C API, please consider reading about the concepts involved. .Call() doesn't set named to 2 at all - it passes whatever object is passed so it is the C code's responsibility to handle incoming objects according to the desired semantics (see the previous post here).
Well, I did read, for example "Writing R Extensions" (Version 3.1.0 Under development (2014-02-28)) chapter "5.9.10 Named objects and copying" which says "Currently all arguments to a .Call call will have NAMED set to 2, and so users must assume that they need to be duplicated before alteration."
Matthew pointed out that line and I cannot shed more light on it, since it's not true - at least not currently.
This is consistent with the observation of my test code: that NAMED() in .Call always returns 2.
It is not - you're not testing .Call() - your'e testing the assignments in frames which cause additional bumps of NAMED. If you actually test .Call() you'll see what I have reported - .Call() itself does NOT affect NAMED.
And that a .Call doing pure read access will trigger some delay most likely due to a full vector copy is a sign of .Call not only setting NAMED to 2 but also not resetting it once .Call terminates.
Again, as I said earlier, you're on the wrong track here - .Call() doesn't touch it - it is left to the C code. Note that NAMED cannot be decremented (unless you use a ref counting version of R) once it reaches 2 since that means "two or more" so. The only time where you can decrement it is if you are the owner that set it from 0 to 1. Cheers, Simon
So what is needed to find NAMED(SEXP argument) < 2 during .Call? Kind regards Jens
14 days later
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20140318/9c2eb559/attachment.pl>