Skip to content

Is a list an atomic object? (or is there an issue with the help page of ?tapply ?)

12 messages · Tal Galili, Hervé Pagès, Richard M. Heiberger +3 more

#
In the help page of ?tapply it says that the first argument (X) is "an
atomic object, typically a vector."

However, tapply seems to be able to handle list objects. For example:

###################

l <- as.list(1:10)
is.atomic(l) # FALSE
index <- c(rep(1,5),rep(2,5))
tapply(l,index,unlist)
$`1`
[1] 1 2 3 4 5

$`2`
[1]  6  7  8  9 10


###################

Hence, does it mean a list an atomic object? (which I thought it wasn't) or
is the help for tapply needs updating?
(or some third option I'm missing?)

Thanks.





----------------Contact
Details:-------------------------------------------------------
Contact me: Tal.Galili at gmail.com |
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------
10 days later
#
Did you ever receive a reply to this?

Note that for your example:
Error in FUN(X[[i]], ...) : invalid 'type' (list) of argument

A list is definitely not atomic (is.recursive(l) ).

So it looks like a "quirk" that FUN = unlist doesn't raise an error.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Feb 4, 2017 at 4:17 AM, Tal Galili <tal.galili at gmail.com> wrote:
#
Hi,

tapply() will work on any object 'X' that has a length and supports
single-bracket subsetting. These objects are sometimes called
"vector-like" objects. Atomic vectors, lists, S4 objects with a "length"
and "[" method, etc... are examples of "vector-like" objects.

So instead of saying

   X: an atomic object, typically a vector.

I think it would be more accurate if the man page was saying something
like

   X: a vector-like object that supports subsetting with `[`, typically
      an atomic vector.

H.
On 02/04/2017 04:17 AM, Tal Galili wrote:

  
    
#
Herv?:

Kindly explain this, then:
[1] FALSE
$`1`
[1] 1 2 3 4 5

$`2`
[1]  6  7  8  9 10
Error in FUN(X[[i]], ...) : invalid 'type' (list) of argument

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Feb 14, 2017 at 5:10 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:
#
The problem with Bert's second example is that sum doesn't work on a list.
The tapply worked correctly.
[1] 1 2 3 4 5
Error in sum(l[1:5]) : invalid 'type' (list) of argument
On Tue, Feb 14, 2017 at 8:28 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
#
Right. More precisely the function passed thru the FUN argument must
work on the subsets of X generated internally by tapply(). You can
actually see these subsets by passing the identity function:

   X <- letters[1:10]
   INDEX <- c(rep(1,5),rep(2,5))
   tapply(X, INDEX, FUN=identity)
   # $`1`
   # [1] "a" "b" "c" "d" "e"
   #
   # $`2`
   # [1] "f" "g" "h" "i" "j"

Doing this shows you how tapply() splits the vector-like object X into
a list of subsets. If you replace the identity function with a function
that cannot be applied to these subsets, then you get an error:

   tapply(X, INDEX, FUN=sum)
   # Error in FUN(X[[i]], ...) : invalid 'type' (character) of argument

As you can see, here we get an error even though X is an atomic vector.

H.
On 02/14/2017 05:41 PM, Richard M. Heiberger wrote:

  
    
#
Yes, exactly.

So my point is that this:

  "X: a vector-like object that supports subsetting with `[`, typically
     an atomic vector."

is incorrect, or at least a bit opaque, without further emphasizing
that FUN must accept the result of "[". With atomic vectors, the error
that you produced was obvious, but with lists, I believe not so. I
Appreciate the desire for brevity, but I think clarity should be the
primary goal. Maybe it *is* just me, but I think a few extra words of
explanation here would not go amiss.

But, anyway, thanks for the clarification.

Cheers,
Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Feb 14, 2017 at 6:04 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:
#
On 02/14/2017 06:39 PM, Bert Gunter wrote:
Maybe this kind of details belong to the description of the FUN
argument. However please note that the man page for lapply() or the
other *apply() functions don't emphasize the fact that the supplied
FUN must be a function that accepts the things it applies to either,
and nobody seems to make a big deal of it. Maybe because it's obvious?
Well, it's the same error. Maybe what's not obvious is that in both
cases the error is coming from sum(), not from tapply() itself.
sum() is complaining that it receives something that it doesn't
know how to handle. The clue is in how the error message starts:

   Error in FUN(X[[i]], ...):

Maybe one could argue this is a little bit cryptic. Note the difference
when the error is coming from tapply() itself:

   > X <- letters[1:9]
   > INDEX <- c(rep(1,5),rep(2,5))
   > tapply(X, INDEX, FUN=identity)
   Error in tapply(X, INDEX, FUN = identity) :
     arguments must have same length

H.

  
    
#
It seems like this should be consistent with split(), since that's
what actually powers the behaviour.

Reading the description for split leads to this rather interesting example:

tapply(mtcars, 1:11, I)

Hadley
On Tue, Feb 14, 2017 at 7:10 PM, Herv? Pag?s <hpages at fredhutch.org> wrote:

  
    
#
You could also call this "interesting example" a bug.

Clearly not enough code reuse in the implementation of tapply().
Instead of the current 25 lines of code, it could be a simple
wrapper around split() and sapply() e.g.. something like:

   tapply2 <- function(X, INDEX, FUN=NULL, ..., simplify=TRUE)
   {
     f <- make_factor_from_INDEX(INDEX)  # same as tapply(INDEX=INDEX, 
FUN=NULL)
     sapply(split(X, f), FUN, ..., simplify=simplify, USE.NAMES=FALSE)
   }

and then be guaranteed to behave consistently with split() and
sapply(). Also the make_factor_from_INDEX() step maybe could be
shared with what aggregate.data.frame() does internally with its
'by' argument.

Still a mystery to me why the power of code sharing/reuse is so
often underestimated :-/

H.
On 02/15/2017 11:32 AM, Hadley Wickham wrote:

  
    
4 days later
#
> Hi, tapply() will work on any object 'X' that has a length
    > and supports single-bracket subsetting. These objects are
    > sometimes called "vector-like" objects. Atomic vectors,
    > lists, S4 objects with a "length" and "[" method,
    > etc... are examples of "vector-like" objects.

    > So instead of saying

    >    X: an atomic object, typically a vector.

    > I think it would be more accurate if the man page was
    > saying something like

    >    X: a vector-like object that supports subsetting with
    > `[`, typically an atomic vector.

Thank you, Herv?!

Actually (someone else mentioned ?)
only   length(X) and  split(X, <group>)   need to work,
and as split() itself is an S3 generic function,  X can be even
more general... well depending on how exactly you understand
"vector-like".

So I would go with

       X: an R object for which a ?split? method exists.  Typically
          vector-like, allowing subsetting with ?[?.


Martin


    > H.
> On 02/04/2017 04:17 AM, Tal Galili wrote:
>> In the help page of ?tapply it says that the first
    >> argument (X) is "an atomic object, typically a vector."
    >> 
    >> However, tapply seems to be able to handle list
    >> objects. For example:
    >> 
    >> ###################
    >> 
    >> l <- as.list(1:10) is.atomic(l) # FALSE index <-
    >> c(rep(1,5),rep(2,5)) tapply(l,index,unlist)
    >> 
    >>> tapply(l,index,unlist)
    >> $`1` [1] 1 2 3 4 5
    >> 
    >> $`2` [1] 6 7 8 9 10
    >> 
    >> 
    >> ###################
    >> 
    >> Hence, does it mean a list an atomic object? (which I
    >> thought it wasn't) or is the help for tapply needs
    >> updating?  (or some third option I'm missing?)
    >> 
    >> Thanks.
#
On Mon, Feb 20, 2017 at 7:31 AM, Martin Maechler
<maechler at stat.math.ethz.ch> wrote:
I think technically tapply() should be using NROW() check that X and
INDEX are compatible. That would make it more compatible with split()
semantics.

Hadley