head.matrix can return 1000s of columns -- limit to n or add new argument?
Hi Martin, On Wed, Oct 30, 2019 at 4:30 AM Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Gabriel Becker
on Tue, 29 Oct 2019 12:43:15 -0700 writes:
> Hi all,
> So I've started working on this and I ran into something that I
didn't
> know, namely that for x a multi-dimensional (2+) array, head(x) and
tail(x)
> ignore dimension completely, treat x as an atomic vector, and return
an
> (unclassed) atomic vector:
Well, that's (3+), not "2+" .
You're correct, of course. Apologies for that.
But I did write (on Sep 17 in this thread!)
> The current source for head() and tail() and all their methods
> in utils is just 83 lines of code {file utils/R/head.R minus
> the initial mostly copyright comments}.
and if've ever looked at these few dozen of R code lines, you'll have seen that we just added two simple utilities with a few reasonable simple methods. To treat non-matrix (i.e. non-2d) arrays as vectors, is typically not unreasonable in R, but indeed with your proposals (in this thread), such non-2d arrays should be treated differently either via new head.array() / tail.array() methods ((or -- only if it can be done more nicely -- by the default method)).
I hope you didn't construe my describing surprise (which was honest) as a criticism. It just quite literally not what I thought head(array(100, c(25, 2, 2))) would have done based on what head.matrix does is all.
Note however the following historical quirk :
sapply(setNames(,1:5), function(K) inherits(array(pi, dim=1:K), "array"))
1 2 3 4 5 TRUE FALSE TRUE TRUE TRUE (Is this something we should consider changing for R 4.0.0 -- to have it TRUE also for 2d-arrays aka matrix objects ??)
That is pretty odd. IMHO It would be quite nice from a design perspective to fix that, but I do wonder, as I infer you do as well, how much code it would break. Changing this would cause problems in any case where a generic has an array method but no matrix method, as well as any code that explicitly checks for inherits from "array" assuming matrices won't return true, correct? My intuition is that the former would be pretty rare, though it might be a fun little problem to figure it out. The latter is ...probably also fairly rare? My intuition on that one is less strong though.
The consequence of that is that
currently, "often" foo.matrix is just a copy of foo.array in
the case the latter exists:
"base" examples: foo in {unique, duplicated, anyDuplicated}.
So I propose you change current head.matrix and tail.matrix to
head.array and tail.array
(and then have head.matrix <- head.array etc, at least if the
above quirk must remain, or remains (which I currently guess to
be the case)).
Absolutely, will do. I'm gratified we're going after the more general approach. Thanks for working with us on this. Best, ~G
>> x = array(100, c(4, 5, 5))
>> dim(x)
> [1] 4 5 5
>> head(x, 1)
> [1] 100
>> class(head(x))
> [1] "numeric"
> (For a 1d array, it does return another 1d array).
> When extending head/tail to understand multiple dimensions as
discussed in
> this thread, then, should the behavior for 2+d arrays be explicitly
> retained, or should head and tail do the analogous thing (with a
head(<2d
array> ) behaving the same as head(<matrix>), which honestly is what I
> expected to already be happening)?
> Are people using/relying on this behavior in their code, and if so,
why/for
> what?
> Even more generally, one way forward is to have the default methods
check
> for dimensions, and use length if it is null:
> tail.default <- tail.data.frame <- function(x, n = 6L, ...)
> {
> if(any(n == 0))
> stop("n must be non-zero or unspecified for all dimensions")
> if(!is.null(dim(x)))
> dimsx <- dim(x)
> else
> dimsx <- length(x)
> ## this returns a list of vectors of indices in each
> ## dimension, regardless of length of the the n
> ## argument
> sel <- lapply(seq_along(dimsx), function(i) {
> dxi <- dimsx[i]
> ## select all indices (full dim) if not specified
> ni <- if(length(n) >= i) n[i] else dxi
> ## handle negative ns
> ni <- if (ni < 0L) max(dxi + ni, 0L) else min(ni, dxi)
> seq.int(to = dxi, length.out = ni)
> })
> args <- c(list(x), sel, drop = FALSE)
> do.call("[", args)
> }
> I think this precludes the need for a separate data.frame method at
all,
> actually, though (I would think) tail.data.frame would still be
defined and
> exported for backwards compatibility. (the matrix method has some
extra
> bits so my current conception of it is still separate, though it
might not
> NEED to be).
> The question then becomes, should head/tail always return something
with
> the same dimensionally (number of dims) it got, or should data.frame
and
> matrix be special cased in this regard, as they are now?
> What are people's thoughts?
> ~G
> [[alternative HTML version deleted]]