head.matrix can return 1000s of columns -- limit to n or add new argument?

Thu, Oct 31, 2019 12:46 PM

Hi Martin,


On Wed, Oct 30, 2019 at 4:30 AM Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

You're correct, of course. Apologies for that.

I hope you didn't construe my describing surprise (which was honest)  as a
criticism. It just quite literally not what I thought head(array(100, c(25,
2, 2))) would have done based on what head.matrix does is all.

That is pretty odd. IMHO It would be quite nice from a design perspective
to fix that, but I do wonder, as I infer you do as well, how much code it
would break.

Changing this would cause problems in any case where a generic has an array
method but no matrix method, as well as any code that explicitly checks for
inherits from "array" assuming matrices won't return true, correct? My
intuition is that the former would be pretty rare, though it might be a fun
little problem to figure it out.  The latter is ...probably also fairly
rare? My intuition on that one is less strong though.

Absolutely, will do. I'm gratified we're going after the more general
approach. Thanks for working with us on this.

Best,
~G

    >> x = array(100, c(4, 5, 5))

    >> dim(x)

    > [1] 4 5 5

    >> head(x, 1)

    > [1] 100

    >> class(head(x))

    > [1] "numeric"

    > (For a 1d array, it does return another 1d array).

    > When extending head/tail to understand multiple dimensions as

discussed in

    > this thread, then, should the behavior for 2+d arrays be explicitly
    > retained, or should head and tail do the analogous thing (with a

head(<2d
    array> ) behaving the same as head(<matrix>), which honestly is what I

    > expected to already be happening)?

    > Are people using/relying on this behavior in their code, and if so,

why/for

    > what?

    > Even more generally, one way forward is to have the default methods

check

    > for dimensions, and use length if it is null:

    > tail.default <- tail.data.frame <- function(x, n = 6L, ...)
    > {
    > if(any(n == 0))
    > stop("n must be non-zero or unspecified for all dimensions")
    > if(!is.null(dim(x)))
    > dimsx <- dim(x)
    > else
    > dimsx <- length(x)

    > ## this returns a list of vectors of indices in each
    > ## dimension, regardless of length of the the n
    > ## argument
    > sel <- lapply(seq_along(dimsx), function(i) {
    > dxi <- dimsx[i]
    > ## select all indices (full dim) if not specified
    > ni <- if(length(n) >= i) n[i] else dxi
    > ## handle negative ns
    > ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
    > seq.int(to = dxi, length.out = ni)
    > })
    > args <- c(list(x), sel, drop = FALSE)
    > do.call("[", args)
    > }

    > I think this precludes the need for a separate data.frame method at

all,

    > actually, though (I would think) tail.data.frame would still be

defined and

    > exported for backwards compatibility. (the matrix method has some

extra

    > bits so my current conception of it is still separate, though it

might not

    > NEED to be).

    > The question then becomes, should head/tail always return something

with

    > the same dimensionally (number of dims) it got, or should data.frame

and

    > matrix be special cased in this regard, as they are now?

    > What are people's thoughts?
    > ~G

    > [[alternative HTML version deleted]]

head.matrix can return 1000s of columns -- limit to n or add new argument?

Thread (7 messages)