Skip to content

Confused about NAMED

13 messages · Peter Dalgaard, Duncan Murdoch, Luke Tierney +3 more

#
Hi,

I expected NAMED to be 1 in all these three cases. It is for one of them,
but not the other two?
R version 2.14.0 (2011-10-31)
Platform: i386-pc-mingw32/i386 (32-bit)
@2514aa0 13 INTSXP g0c1 [NAM(2)] (len=1, tl=0) 1
@272f788 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
@24fc28c 19 VECSXP g0c0 [OBJ,NAM(2),ATT] (len=0, tl=0)
ATTRIB:
  @24fc270 02 LISTSXP g0c0 []
    TAG: @3f2120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @24fc334 16 STRSXP g0c0 [] (len=0, tl=0)
    TAG: @3f2040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @24fc318 13 INTSXP g0c0 [] (len=0, tl=0)
    TAG: @3f2388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
    @25be500 16 STRSXP g0c1 [] (len=1, tl=0)
      @1d38af0 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"

It's a little difficult to search for the word "named" but I tried and
found this in R-ints :

    "Note that optimizing NAMED = 1 is only effective within a primitive
(as the closure wrapper of a .Internal will set NAMED = 2 when the
promise to the argument is evaluated)"

So might it be that just looking at NAMED using .Internal(inspect()) is
setting NAMED=2?  But if so, why does y have NAMED==1?

Thanks!
Matthew
#
On Nov 24, 2011, at 11:13 , Matthew Dowle wrote:

            
This is tricky business... I'm not quite sure I'll get it right, but let's try

When you are assigning a constant, the value you assign is already part of the assignment expression, so if you want to modify it, you must duplicate. So NAMED==2 on z <- 1 is basically to prevent you from accidentally "changing the value of 1". If it weren't, then you could get bitten by code like for(i in 1:2) {z <- 1; if(i==1) z[1] <- 2}.

If you're assigning the result of a computation, then the object only exists once, so 
z <- 0+1  gets NAMED==1.

However, if the computation is done by returning a named value from within a function, as in
then again NAMED==2. This is because the side effects of the function _might_ result in something having a hold on the function environment, e.g. if we had 

e <- NULL
f <- function(){e <<-environment(); v <- 1+0; v}
z <- f()

then z[1] <- 5 would change e$v too. As it happens, there aren't any side effects in the forme case, but R loses track and assumes the worst.

  
    
#
Thanks a lot, think I follow. That explains x vs y, but why is z NAMED==2?
The result of data.frame() is an object that exists once (similar to 1:10)
so shouldn't it be NAMED==1 too?  Or, R loses track and assumes the worst
even on its own functions such as data.frame()?
#
On 11-11-24 6:34 AM, Matthew Dowle wrote:
R has several types of functions -- see the R Internals manual for 
details.  data.frame() is a plain R function, so it is treated no 
differently than any user-written function.  On the other hand, the 
internal function that implements the : operator is a "primitive", so it 
has complete control over its return value, and it can set NAMED in the 
most efficient way.

So you might think that returning a value as an evaluation of a 
primitive adds efficiency, e.g. in Peter's example

f<- function(){v<- 1+0; v + 0}

will return NAMED == 1.  But that's because internally it had to make a 
copy of v before adding 0 to it, so you've probably really made it less 
efficient:  the original version might never modify the result, so it 
might never make a copy.

Duncan Murdoch
#
On Nov 24, 2011, at 12:34 , Matthew Dowle wrote:

            
R loses track. I suspect that is really all it can do without actual reference counting. The function data.frame is more than 150 lines of code, and if any of those end up invoking user code, possibly via a class method, you can't tell definitively whether or not the evaluation environment dies at the return.
#
Ohhh, think I see now. After Duncan's reply I was going to ask if it was
possible to change data.frame() to be primitive so it could set NAMED=1.
But it seems primitive functions can't use R code so data.frame() would
need to be ported to C. Ok! - not quick or easy, and not without
consideable risk. And, data.frame() can invoke user code inside it anyway
then.

Since list() is primitive I tried to construct a data.frame starting with
list() [since structure() isn't primitive], but then merely adding an
attribute seems to set NAMED==2 too ?
@25149e0 19 VECSXP g0c1 [NAM(1),ATT] (len=2, tl=0)
  @263ea50 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @263eaa0 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @2457984 02 LISTSXP g0c0 []
    TAG: @3f2120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @25149c0 16 STRSXP g0c1 [] (len=2, tl=0)
      @1e987d8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1e56948 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
@25149e0 19 VECSXP g0c1 [NAM(2),ATT] (len=2, tl=0)
  @263ea50 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @263eaa0 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @2457984 02 LISTSXP g0c0 []
    TAG: @3f2120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @25149c0 16 STRSXP g0c1 [] (len=2, tl=0)
      @1e987d8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1e56948 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @245732c 01 SYMSXP g0c0 [] "foo"
    @25148a0 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @2514920 09 CHARSXP g0c1 [gp=0x20] "bar"


Matthew
#
On Nov 24, 2011, at 14:05 , Matthew Dowle wrote:

            
Yes. As soon as there is the slightest risk of having (had) two references to the same object NAMED==2 and it is never reduced. While your mind is boggling, I might boggle it a bit more:
@116e11788 13 INTSXP g0c4 [NAM(1)] (len=10, tl=0) 1,2,3,4,5,...
@116e11788 13 INTSXP g0c4 [NAM(2)] (len=10, tl=0) 1,2,3,4,5,...

This happens because while mean() is running, there is a second reference to z, namely mean's argument x. (With languages like R, you have no insurance that there will be no changes to the global environment while a function call is being evaluated, so bugs can bite in both places -- z or x.)

There are many of these cases where you might pragmatically want to override the default NAMED logic, but you'd be stepping into treacherous waters. Luke has probably been giving these matters quite some thought in connection with his compiler project.
#
Ok, very interesting. Think I'm there.
Thanks for all the info.

Matthew
#
On Nov 24, 2011, at 8:05 AM, Matthew Dowle wrote:

            
Yes, because attr(x,y) <- z is the same as

`*tmp*` <- x
x <- `attr<-`(`*tmp*`, y, z)
rm(`*tmp*`)

so there are two references to the data frame: one in DF and one in `*tmp*`. It is the first line that causes the NAMED bump. And, yes, it's real:
[1] "*tmp*" "f<-"   "x"    

You could skip that by using the function directly (I don't think it's recommended, though):
@1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
  @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
ATTRIB:
  @100b6e748 02 LISTSXP g0c0 [] 
    TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @1028c82c8 16 STRSXP g0c1 [] (len=1, tl=0)
      @1009cd388 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
@1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
  @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
ATTRIB:
  @100b6e748 02 LISTSXP g0c0 [] 
    TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @1028c8178 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @100967af8 09 CHARSXP g0c1 [MARK,gp=0x20] "b"
@1028c82f8 19 VECSXP g0c1 [NAM(1),ATT] (len=1, tl=0)
  @1028c8268 14 REALSXP g0c1 [] (len=1, tl=0) 1
ATTRIB:
  @100b6e748 02 LISTSXP g0c0 [] 
    TAG: @100843878 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @1028c8178 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @100967af8 09 CHARSXP g0c1 [MARK,gp=0x20] "b"

Cheers,
Simon
#
The details of complex assignment expressions are fairly intricate.  I
wrote up some notes ont his a couple of months back and have meant to
get them into the internals manual but have not gotten around to it
yet.  I'll see if I can get to it in the next week or two and will
send a note to this thread wehen I do. In terms of the issues
discussed so far

   Calling a foo<- function directly is not a good idea unless you
   really undestand what is going on in the assignment mechanism in
   general and in the particular foo<- function. It is definitely not
   something to be done in routine programming unless you like
   unpleasant surprises.

   attr<- could probably be modified to avine the NAMED increment in
   this example, but I'd want to think that through fairly carefully
   before making such a change.  (Most foo<- functions that are
   primitives are written to that they avoid a NAMED increment when
   used in an assignment expression, but a few are not -- I believe
   these are almost all, maybe even all, oversights, but again I
   wouldn't want to make any changes without ceareful review.)

Best,

luke
On Thu, 24 Nov 2011, Simon Urbanek wrote:

            

  
    
#
Interesting, I tried it. I found that setting the "row.names" attribute
that way keeps NAMED==1 ok, and that setting "class" attribute keeps
NAMED==1 ok too. Fantastic! But, it seems that merely printing it on the
console (when the class is set) bumps NAMED to 2. Here is the output :
$a
[1] 1 2 3

$b
[1] 4 5 6

attr(,"row.names")
[1] 1 2 3
@261e730 19 VECSXP g0c1 [NAM(1),ATT] (len=2, tl=0)
  @2abd088 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 []
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) -2147483648,-3
@261e730 19 VECSXP g0c1 [OBJ,NAM(1),ATT] (len=2, tl=0)  # great, NAM(1)
  @2abd088 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 []
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) -2147483648,-3
    TAG: @1612388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
    @2a758e8 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @1647f38 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
@261e730 19 VECSXP g0c1 [OBJ,NAM(1),ATT] (len=2, tl=0)
  @2abd088 13 INTSXP g0c2 [] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 []
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [NAM(2)] (len=2, tl=0) -2147483648,-3
    TAG: @1612388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
    @2a758e8 16 STRSXP g0c1 [NAM(1)] (len=1, tl=0)
      @1647f38 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
a b
1 1 4
2 2 5
3 3 6
@261e730 19 VECSXP g0c1 [OBJ,MARK,NAM(2),ATT] (len=2, tl=0)
  @2abd088 13 INTSXP g0c2 [MARK,NAM(2)] (len=3, tl=0) 1,2,3
  @2abd060 13 INTSXP g0c2 [MARK,NAM(2)] (len=3, tl=0) 4,5,6
ATTRIB:
  @258d4f4 02 LISTSXP g0c0 [MARK]
    TAG: @1612120 01 SYMSXP g0c0 [MARK,gp=0x4000] "names"
    @261e710 16 STRSXP g0c1 [MARK,NAM(2)] (len=2, tl=0)
      @17a86f8 09 CHARSXP g0c1 [MARK,gp=0x21] "a"
      @1766868 09 CHARSXP g0c1 [MARK,gp=0x21] "b"
    TAG: @1612040 01 SYMSXP g0c0 [MARK,gp=0x4000] "row.names"
    @261e5d0 13 INTSXP g0c1 [MARK,NAM(2)] (len=2, tl=0) -2147483648,-3
    TAG: @1612388 01 SYMSXP g0c0 [MARK,gp=0x4000] "class"
    @2a758e8 16 STRSXP g0c1 [MARK,NAM(2)] (len=1, tl=0)
      @1647f38 09 CHARSXP g0c2 [MARK,gp=0x21,ATT] "data.frame"
[1] TRUE

Matthew
#
On Thu, 24 Nov 2011, Simon Urbanek wrote:

            
Maybe some review of the 'R Internals' manual about what a primitive 
function is would be desirable.  Converting such a function to C would 
ossify it, which is the major reason it has not been done (it has been 
contemplated).
Only if it were an interpreted function.
You have just explained why interpreted replacement functions set 
NAMED=2, but this does not apply to primitives.

To help convince you, consider
[1] 1 2
attr(,"x")
[1] 13
@11be748 13 INTSXP g0c1 [NAM(1),ATT] (len=2, tl=0) 1,2
ATTRIB:
   @1552054 02 LISTSXP g0c0 []
     TAG: @102b1c0 01 SYMSXP g0c0 [MARK,NAM(2)] "x"
     @11be768 14 REALSXP g0c1 [] (len=1, tl=0) 13

Now, as to why attr<- (which is primitive) does what it does you will 
need to read (and understand) the code.

  
    
#
On Nov 24, 2011, at 1:48 PM, Prof Brian Ripley wrote:

            
It does - see eval.c l1680-2 which causes it to go through do_set which is turn bumps NAMED. I have responded only to Luke but I guess I should have included everyone..
Because do_attributesgets duplicates (attrib.c l1178) which you can easily see:
@155aba8 13 INTSXP g0c1 [NAM(1)] (len=2, tl=0) 1,2
@15dbe28 13 INTSXP g0c1 [NAM(1),ATT] (len=2, tl=0) 1,2
ATTRIB:
  @16da5a8 02 LISTSXP g0c0 [] 
    TAG: @660008 01 SYMSXP g0c0 [MARK,NAM(2)] "x"
    @15dbe58 14 REALSXP g0c1 [] (len=1, tl=0) 13

Note the different pointer of the value of d now -- do_attributesgets returns a duplicate with NAMED=0 so do_set assignment bumps it to 1.

Cheers,
Simon