Definition of [[ - R-devel | R Mailing Lists

Sun, Mar 15, 2009 11:31 AM #

The semantics of [ and [[ don't seem to be fully specified in the
Reference manual.  In particular, I can't find where the following
cases are covered:

[1] NA
OK, RefMan says: If i is positive and exceeds length(x) then the
corresponding selection is NA.

list(NULL)
? i is positive and exceeds length(x); why isn't this list(NA)?

Error in list(1)[[3]] : subscript out of bounds
? Why does this return NA for an atomic vector, but give an error for
a generic vector?

c(1, NA, 34)
OK

ll[[3]] <- 34; dput(ll)
list(1, NULL, 34)
Why is second element NULL, not NA?
And why is it OK to set an undefined ll[[3]], but not to get it?

I assume that these are features, not bugs, but I can't find
documentation for them.

            -s

Duncan Murdoch

Sun, Mar 15, 2009 1:43 PM #

On 15/03/2009 2:31 PM, Stavros Macrakis wrote:

Because the sentence you read was talking about "simple vectors", and ll 
is presumably not a simple vector.  So what is a simple vector?  That is 
not explicitly defined, and it probably should be.  I think it is 
"atomic vectors, except those with a class that has a method for [".

NA is a length 1 atomic vector with a specific type matching the type of 
c.  It makes more sense in this context to put in a NULL, and return a 
list(NULL) for ll[3].

Lots of code grows vectors by setting elements beyond the end of them, 
so whether or not that's a good idea, it's not likely to change.

I think an argument could be made that ll[[toobig]] should return NULL 
rather than trigger an error, but on the other hand, the current 
behaviour allows the programmer to choose:  if you are assuming that a 
particular element exists, use ll[[element]], and R will tell you when 
your assumption is wrong.  If you aren't sure, use ll[element] and 
you'll get NA or list(NULL) if the element isn't there.

There is more documentation in the man page for Extract, but I think it 
is incomplete.  The most complete documentation is of course the source 
code, but it may not answer the question of what's intentional and 
what's accidental.

Duncan Murdoch

Stavros Macrakis

Sun, Mar 15, 2009 2:30 PM #

Duncan,

Thanks for the reply.

On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:

The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors,
3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures,
and 3.4.4 Subset assignment, so the context seems to be saying that
"simple vectors" are those which are not matrices or arrays, and those
("other structures") which do not overload [.

Even if the definition of 'simple vector' were clarified to cover only
atomic vectors, I still can't find any text specifying that list(3)[5]
=> lsit(NULL).

For that matter, it would leave the subscripting of important
built-ins such as factors and dates, etc. undefined. Obviously the
intuition is that vectors of factors or vectors of dates would do the
'same thing' as vectors of integers or of strings, but 3.4.3 doesn't
say what that thing is....

Understood that that's the rationale, but where is it documented?

Also, if that's the rationale, it seems to say that NULL is the
equivalent of NA for list elements, but in fact NULL does not function
like NA:

logical(0)
Warning message:
In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL'

[1] FALSE

Indeed, NA seems to both up-convert and down-convert nicely to other
forms of NA:

c(1L, NA, 1L)

c(TRUE, NA, TRUE)

and are not converted to NULL when converted to generic vector:

list(TRUE, NA, TRUE)

and NA is preserved when downconverting:

c(TRUE, NA, TRUE)

But if you try to downconvert NULL, you get an error

Error in isS4(x) : (list) object cannot be coerced to type 'integer'

So I don't see why NULL is the right way to represent NA, especially
since NULL is a perfectly good list element, distinct from NA.

I wasn't suggesting changing this.

Yes, that could make sense, but why would it be true for ll[[toobig]]
but not cc[[toobig]]?

Yes, I was looking at that man page, and I don't think it resolves any
of the above questions.

Well, that's one issue.  But another is that there should be a
specification addressed to users, who should not have to understand
internals.

             -s

Wacek Kusnierczyk

Sun, Mar 15, 2009 2:44 PM #

Stavros Macrakis wrote:

this should really be taken seriously.

vQ

Duncan Murdoch

Sun, Mar 15, 2009 4:46 PM #

Just a couple of inline comments down below:

On 15/03/2009 5:30 PM, Stavros Macrakis wrote:

Duncan,

Thanks for the reply.

On Sun, Mar 15, 2009 at 4:43 PM, Duncan Murdoch <murdoch at stats.uwo.ca> wrote:

On 15/03/2009 2:31 PM, Stavros Macrakis wrote:

dput(ll[3])
list(NULL)
? i is positive and exceeds length(x); why isn't this list(NA)?

Because the sentence you read was talking about "simple vectors", and ll is
presumably not a simple vector.  So what is a simple vector?  That is not
explicitly defined, and it probably should be.  I think it is "atomic
vectors, except those with a class that has a method for [".

The three subsections of 3.4 Indexing are 3.4.1 Indexing by vectors,
3.4.2 Indexing matrices and arrays, 3.4.3 Indexing other structures,
and 3.4.4 Subset assignment, so the context seems to be saying that
"simple vectors" are those which are not matrices or arrays, and those
("other structures") which do not overload [.

Even if the definition of 'simple vector' were clarified to cover only
atomic vectors, I still can't find any text specifying that list(3)[5]
=> lsit(NULL).

For that matter, it would leave the subscripting of important
built-ins such as factors and dates, etc. undefined. Obviously the
intuition is that vectors of factors or vectors of dates would do the
'same thing' as vectors of integers or of strings, but 3.4.3 doesn't
say what that thing is....

ll[[3]]

Error in list(1)[[3]] : subscript out of bounds
? Why does this return NA for an atomic vector, but give an error for
a generic vector?

cc[[3]] <- 34; dput(cc)

c(1, NA, 34)
OK

ll[[3]] <- 34; dput(ll)
list(1, NULL, 34)
Why is second element NULL, not NA?

NA is a length 1 atomic vector with a specific type matching the type of c.
 It makes more sense in this context to put in a NULL, and return a
list(NULL) for ll[3].

Understood that that's the rationale, but where is it documented?

Also, if that's the rationale, it seems to say that NULL is the
equivalent of NA for list elements, but in fact NULL does not function
like NA:

is.na(NULL)

logical(0)
Warning message:
In is.na(NULL) : is.na() applied to non-(list or vector) of type 'NULL'

is.na(list(NULL))

[1] FALSE

Indeed, NA seems to both up-convert and down-convert nicely to other
forms of NA:

dput(as.integer(as.logical(c(TRUE,NA,TRUE))))

c(1L, NA, 1L)

dput(as.logical(as.integer(c(TRUE,NA,TRUE))))

c(TRUE, NA, TRUE)

and are not converted to NULL when converted to generic vector:

dput(as.list(c(TRUE,NA,TRUE)))

list(TRUE, NA, TRUE)

and NA is preserved when downconverting:

dput(as.logical(as.list(c(TRUE,NA,23))))

c(TRUE, NA, TRUE)

But if you try to downconvert NULL, you get an error

dput(as.integer(list(NULL)))

Error in isS4(x) : (list) object cannot be coerced to type 'integer'

So I don't see why NULL is the right way to represent NA, especially
since NULL is a perfectly good list element, distinct from NA.

And why is it OK to set an undefined ll[[3]], but not to get it?

Lots of code grows vectors by setting elements beyond the end of them, so
whether or not that's a good idea, it's not likely to change.

I wasn't suggesting changing this.

I think an argument could be made that ll[[toobig]] should return NULL
rather than trigger an error, but on the other hand, the current behaviour
allows the programmer to choose:  if you are assuming that a particular
element exists, use ll[[element]], and R will tell you when your assumption
is wrong.  If you aren't sure, use ll[element] and you'll get NA or
list(NULL) if the element isn't there.

Yes, that could make sense, but why would it be true for ll[[toobig]]
but not cc[[toobig]]?

But it is:

 > cc <- c(1)
 > cc[[3]]
Error in cc[[3]] : subscript out of bounds

I agree, but not so strongly that I will drop everything and write one.

Duncan Murdoch

Thomas Lumley

Mon, Mar 16, 2009 1:06 AM #

On Sun, 15 Mar 2009, Stavros Macrakis wrote:

I think some of these are because there are only NAs for character, logical, and the numeric types. There isn't an NA of list type.

This one shouldn't be list(NA) - which NA would it use?  It should be some sort of list(_NA_list_) type, and list(NULL) is playing that role.

Again, because there isn't an NA of generic vector type.

Same reason for NULL vs NA.  The fact that setting works may just be an inconsistency -- as you can see from previous discussions, R often does not effectively forbid code that shouldn't work -- or it may be bug-compatibility with some version of S or S-PLUS.


      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Thomas Lumley

Mon, Mar 16, 2009 1:26 AM #

On Sun, 15 Mar 2009, Wacek Kusnierczyk wrote:

Well, the lack of such a specification is a documented bug (see the FAQ on bug reporting), and I think everyone agrees it would be useful, just not as useful as what they would have to stop doing to write it.  In fact, such a document may well have a higher priority than it deserves: people who would want that sort of documentation are overrepresented in R-core compared to the general R user community.

There was a panel talk at DSC2005 (yes, four years ago) on the possibilities for a joint R/S language standard. That would have provided an external stimulus and a framework for finding all the inconsistencies. It didn't really eventuate.

      -thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle

Wacek Kusnierczyk

Mon, Mar 16, 2009 1:28 AM #

somewhat one the side,

    l = list(1)
   
    l[[2]]
    # error, index out of bounds

    l[2][[1]]
    # NULL

that is, we can't extract from l any element at an index exceeding the
list's length (if we could, it would have been NULL or some sort of
_NA_list), but we can extract a sublist at an index out of bounds, and
from that sublist extract the element (which is NULL, 'the _NA_list').

that's not necessarily wrong, but "the item at index i" (l[[i]]) is not
equivalent to "the item in the sublist at index i".

vQ

Thomas Lumley wrote: