head.matrix can return 1000s of columns -- limit to n or add new argument?

John Fox

Tue, Sep 17, 2019 5:32 AM

Dear Herve,

Sorry, I should have said "matrices" rather than "data frames" -- brief() has methods for both.

Best,
 John

  -----------------------------
  John Fox, Professor Emeritus
  McMaster University
  Hamilton, Ontario, Canada
  Web: http::/socserv.mcmaster.ca/jfox

On Sep 17, 2019, at 8:29 AM, Fox, John <jfox at mcmaster.ca> wrote:

Dear Herve,

The brief() generic function in the car package does something very similar to that for data frames (and has methods for other classes of objects as well).

Best,
John

 -----------------------------
 John Fox, Professor Emeritus
 McMaster University
 Hamilton, Ontario, Canada
 Web: http::/socserv.mcmaster.ca/jfox

On Sep 17, 2019, at 2:52 AM, Pages, Herve <hpages at fredhutch.org> wrote:

Hi,

Alternatively, how about a new glance() generic that would do something 
like this:

library(DelayedArray)
glance <- DelayedArray:::show_compact_array

M <- matrix(rnorm(1e6), nrow = 1000L, ncol = 2000L)
glance(M)

<1000 x 2000> matrix object of type "double":
              [,1]        [,2]        [,3] ...    [,1999]    [,2000]
  [1,]  -0.8854896   1.8010288   1.3051341   . -0.4473593  0.4684985
  [2,]  -0.8563415  -0.7102768  -0.9309155   . -1.8743504  0.4300557
  [3,]   1.0558159  -0.5956583   1.2689806   .  2.7292249  0.2608300
  [4,]   0.7547356   0.1465714   0.1798959   . -0.1778017  1.3417423
  [5,]   0.8037360  -2.7081809   0.9766657   . -0.9902788  0.1741957
   ...           .           .           .   .          .          .
[996,]  0.67220752  0.07804320 -0.38743454   .  0.4438639 -0.8130713
[997,] -0.67349962 -1.15292067 -0.54505567   .  0.4630923 -1.6287694
[998,]  0.03374595 -1.68061325 -0.88458368   . -0.2890962  0.2552267
[999,]  0.47861492  1.25530912  0.19436708   . -0.5193121 -1.1695501
[1000,]  1.52819218  2.23253275 -1.22051720   . -1.0342430 -0.1703396

A <- array(rnorm(1e6), c(50, 20, 10, 100))
glance(A)

<50 x 20 x 10 x 100> array object of type "double":
,,1,1
           [,1]       [,2]       [,3] ...      [,19]      [,20]
[1,] 0.78319619 0.82258390 0.09122269   .  1.7288189  0.7968574
[2,] 2.80687459 0.63709640 0.80844430   . -0.3963161 -1.2768284
 ...          .          .          .   .          .          .
[49,] -1.0696320 -0.1698111  2.0082890   .  0.4488292  0.5215745
[50,] -0.7012526 -2.0818229  0.7750518   .  0.3189076  0.1437394

...

,,10,100
           [,1]       [,2]       [,3] ...      [,19]      [,20]
[1,]  0.5360649  0.5491561 -0.4098350   .  0.7647435  0.5640699
[2,]  0.7924093 -0.7395815 -1.3792913   .  0.1980287 -0.2897026
 ...          .          .          .   .          .          .
[49,]  0.6266209  0.3778512  1.4995778   . -0.3820651 -1.4241691
[50,]  1.9218715  3.5475949  0.5963763   .  0.4005210  0.4385623

H.


On 9/16/19 00:54, Michael Chirico wrote:

Awesome. Gabe, since you already have a workshopped version, would you like
to proceed? Feel free to ping me to review the patch once it's posted.

On Mon, Sep 16, 2019 at 3:26 PM Martin Maechler <maechler at stat.math.ethz.ch>
wrote:

Michael Chirico
   on Sun, 15 Sep 2019 20:52:34 +0800 writes:

Finally read in detail your response Gabe. Looks great,
and I agree it's quite intuitive, as well as agree against
non-recycling.

Once the length(n) == length(dim(x)) behavior is enabled,
I don't think there's any need/desire to have head() do
x[1:6,1:6] anymore. head(x, c(6, 6)) is quite clear for
those familiar with head(x, 6), it would seem to me.

Mike C

Thank you, Gabe, and Michael.
I did like Gabe's proposal already back in July but was
busy and/or vacationing then ...

If you submit this with a patch (that includes changes to both
*.R and *.Rd , including some example) as "wishlist" item to R's
bugzilla, I'm willing/happy to check and commit this to R-devel.

Martin

On Sat, Jul 13, 2019 at 8:35 AM Gabriel Becker
<gabembecker at gmail.com> wrote:

Hi Michael and Abby,

So one thing that could happen that would be backwards
compatible (with the exception of something that was an
error no longer being an error) is head and tail could
take vectors of length (dim(x)) rather than integers of
length for n, with the default being n=6 being equivalent
to n = c(6, dim(x)[2], <...>, dim(x)[k]), at least for
the deprecation cycle, if not permanently. It not
recycling would be unexpected based on the behavior of
many R functions but would preserve the current behavior
while granting more fine-grained control to users that
feel they need it.

A rapidly thrown-together prototype of such a method for
the head of a matrix case is as follows:

head2 = function(x, n = 6L, ...) { indvecs =
lapply(seq_along(dim(x)), function(i) { if(length(n) >=
i) { ni = n[i] } else { ni = dim(x)[i] } if(ni < 0L) ni =
max(nrow(x) + ni, 0L) else ni = min(ni, dim(x)[i])
seq_len(ni) }) lstargs = c(list(x),indvecs, drop = FALSE)
do.call("[", lstargs) }

mat = matrix(1:100, 10, 10)

*head(mat)*

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 1 11 21 31 41 51 61 71 81 91

[2,] 2 12 22 32 42 52 62 72 82 92

[3,] 3 13 23 33 43 53 63 73 83 93

[4,] 4 14 24 34 44 54 64 74 84 94

[5,] 5 15 25 35 45 55 65 75 85 95

[6,] 6 16 26 36 46 56 66 76 86 96

*head2(mat)*

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

[1,] 1 11 21 31 41 51 61 71 81 91

[2,] 2 12 22 32 42 52 62 72 82 92

[3,] 3 13 23 33 43 53 63 73 83 93

[4,] 4 14 24 34 44 54 64 74 84 94

[5,] 5 15 25 35 45 55 65 75 85 95

[6,] 6 16 26 36 46 56 66 76 86 96

*head2(mat, c(2, 3))*

[,1] [,2] [,3]

[1,] 1 11 21

[2,] 2 12 22

*head2(mat, c(2, -9))*

[,1]

[1,] 1

[2,] 2


Now one thing to keep in mind here, is that I think we'd
either a) have to make the non-recycling behavior
permanent, or b) have head treat data.frames and matrices
different with respect to the subsets they grab (which
strikes me as a *Bad Plan *(tm)).

So I don't think the default behavior would ever be
mat[1:6, 1:6], not because of backwards compatibility,
but because at least in my intuition that is just not
what head on a data.frame should do by default, and I
think the behaviors for the basic rectangular datatypes
should "stick together". I mean, also because of
backwards compatibility, but that could *in theory*
change across a long enough deprecation cycle, but the
conceptually right thing to do with a data.frame probably
won't.

All of that said, is head(mat, c(6, 6)) really that much
easier to type/better than just mat[1:6, 1:6, drop=FALSE]
(I know this will behave differently if any of the dims
of mat are less than 6, but if so why are you heading it
in the first place ;) )? I don't really have a strong
feeling on the answer to that.

I'm happy to put a patch for head.matrix,
head.data.frame, tail.matrix and tail.data.frame, plus
documentation, if people on R-core are interested in
this.

Note, as most here probably know, and as alluded to
above, length(n) > 1 for head or tail currently give an
error, so this would be an extension of the existing
functionality in the mathematical extension sense, where
all existing behavior would remain identical, but the
support/valid parameter space would grow.

Best, ~G


On Fri, Jul 12, 2019 at 4:03 PM Abby Spurdle
<spurdle.a at gmail.com> wrote:

I assume there are lots of backwards-compatibility

issues as well as valid > use cases for this behavior,
so I guess defaulting to M[1:6, 1:6] is out of > the
question.

Agree.

Is there any scope for adding a new argument to

head.matrix that would > allow this flexibility?

I agree with what you're trying to achieve.  However,
I'm not sure this is as simple as you're suggesting.

What if the user wants "head" in rows but "tail" in
columns.  Or "head" in rows, and both "head" and "tail"
in columns.  With head and tail alone, there's a
combinatorial explosion.

Also, when using tail on an unnamed matrix, it may be
desirable to name rows and columns.

And all of this assumes standard matrix objects.  Add in
a matrix subclasses and related objects, and things get
more complex still.

As I suggested in a another thread, a few days ago, I'm
planning to write an R package for matrices and
matrix-like objects (possibly extending the Matrix
package), with an initial emphasis on subsetting,
printing and formatting.  So, I'm interested to hear
more suggestions on this topic.

[[alternative HTML version deleted]]

______________________________________________
R-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=

______________________________________________
R-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=

______________________________________________
R-devel at r-project.org mailing list
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwICAg&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=sOZlR-nzy_f_Sje6VGA6IXYQM01BO39OQ2zqA8mtaGI&s=VyNGYbk1jJJqirYBwnhKX60dCp31ArtS62RmXKn86O4&e=

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

head.matrix can return 1000s of columns -- limit to n or add new argument?

Thread (2 messages)