R-alpha: 'Matrix' & 'matrix' Class

5 messages · Martin Maechler, Thomas Lumley, Kurt Hornik

Tue, Aug 12, 1997 12:32 AM #

we are discussing, yes? -- 
	please nobody feel hurt just  because I disagree so heartily .. ;-)

TL> On Tue, 12 Aug 1997, Ross Ihaka wrote:

>> There have been a few prods to create an S-like matrix class for R.
    >> A major reason for having a matrix class seems to be that people get
    >> bitten by the default behavior of the drop= parameter to "[".
I think there are quite a few other reasons. I've been wanting to have 
a 'matrix' class for other reasons for quite a long time.
(One that come to my mind:
	Kurt's (and my earlier) proposal for a generic  "write" function
	with a write.data.frame  method would be much
	cleaner to implement with a write.matrix method when matrix was a class)

    >> That being the case does it make sense to change the semantics of "["
    >> rather than introducing a new class?

I don't think so. (see above and below)

    TL> I think changing the semantics is preferable, in part because it
    TL> fixes the drop problem automatically, rather than requiring
    TL> everyone to use a new feature.  Creating a Matrix class would not
    TL> fix the array problem for arrays of dimension 3 or more, either.

I disagree (I think this is a rare thing with you Thomas..).
----------
1) Lest I misunderstood something,
  you mean that	for matrices 'x',
	  x[,1]
  would be a 1--column matrix instead of a vector ?
  IMHO, this is ugly, and is really just logical if you come from a strict
  matrix thinker's corner which is the case for  matlab users, and maybe even
  mathematicians and the like (I'm one myself),
  but not for an average person analyzing  statistical data.

2) I'm quite convinced that this will break quite a bit of existing code,
   maybe code outside the R 'base' source, but still.

   It is more basically incompatible to S semantics than many other things,
   I think.

3) What if  x  is a data.frame? Would  x[,1] still be a data.frame?
  If not (which I presume), wouldn't this be contrary to the idea which I
  teach to beginners that
	``data.frames are just matrices, but a bit more general since
	  they can contain both numbers and character codes''


All in all, the use of  ``drop = FALSE'' is one way to avoid the problem,
and I agree there should be others,
(namely a  'Matrix' class in "R base" INDEPENDENT of the Matrix library.)

I do like the  class approach in any case and would like to have
'matrix' be THE generic class for all matrices 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[[I've advocating this even for S-plus,
  years ago, before 'Matrix' came into existence]] 

Then, 'Matrix' would be a class inheriting from 'matrix' with 
at least it's own "].Matrix" method.

    TL> However, a Matrix library might be a nice place to put matrix
    TL> features that not everyone needs. This helps keep the size of R
    TL> down.  Libraries that can be loaded and unloaded seem like the
    TL> ideal place to put a whole lot of things, rather than taking the
    TL> S-PLUS approach of bundling gam and trees and nlme and everything
    TL> else together.
yes. {also see above: 'Matrix' would be a class ANYWAY, for the
	``[,1] ==> Matrix''   aficionados}

Especially that we now have require() and provide()  
[[ provide  has been  fallen out of the source recently,
   it is "back again" and should appear in the next (pre)release
]]


This is an important discussion - I think -
so please tell me why you disagree...

--- Martin


=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Thomas Lumley

Tue, Aug 12, 1997 9:28 AM #

I'm not opposed to a matrix class, but I do think drop=FALSE should be the
default for arrays (I'm not as sure about data frames, but I do think it
is reasonable that one row of a data frame should be a data frame). The
reason is that the *current* behaviour of [ with drop=TRUE in S and R
breaks a lot of S code. 

I have found this particular problem in code written for other people to
use by people from a wide range of backgrounds. For example, Harald
Fekjaer's addreg library, Adrian Raftery's bayesian model averaging code,
and even an early version of Brian Ripley's linear discrimination
function. These functions break because they assume that x[vec,] is a
matrix even when it has a single row.  It rarely seems to matter if x[,1]
is a matrix or a vector, but it is important if x[1,] is a vector. 

For data analysis I don't think it matters much, though I agree it is a
little ugly for x[,1] to be a matrix. For programming I think that drop=F
is the same sort of incompatibility as the R scoping rules: one that
fixes, rather than breaks, S code. Of course people could just use drop=F,
the way they could just pass variables into nested functions explicitly.
The problem is that most of us don't, and it's a remarkably difficult bug
to find.

I may well be wrong, of course. ;-)


Thomas Lumley
-----------------------------------------------------+------
Biostatistics		: "Never attribute to malice what  :
Uni of Washington	:  can be adequately explained by  :
Box 357232		:  incompetence" - Hanlon's Razor  :
Seattle WA 98195-7232	:				   :
------------------------------------------------------------

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Kurt Hornik

Wed, Aug 13, 1997 1:09 AM #

I certainly don't.

TL> On Tue, 12 Aug 1997, Ross Ihaka wrote:

Yes, exactly.  It seems stupid to have to write code like

	xxx.default <- function(obj, ...) {
          if (is.matrix(obj)) {
             xxx.matrix(obj, ...)
          } else {
             ...
          }

(Not only with e.g. the generic write function ... there are many more
examples.)

TL> I think changing the semantics is preferable, in part because it
TL> fixes the drop problem automatically, rather than requiring
TL> everyone to use a new feature.  Creating a Matrix class would not
TL> fix the array problem for arrays of dimension 3 or more, either.

Exactly.  Having done statistics with Octave prior to the existence of
R, I can only say that `strict matrix thinking' is terrible.  (In
particular, having to worry whether something is a row or a column
vector, when it DOES NOT MATTER AT ALL!)

Right!

Again, definitely right!

Exactly.

TL> However, a Matrix library might be a nice place to put matrix
TL> features that not everyone needs. This helps keep the size of R
TL> down.  Libraries that can be loaded and unloaded seem like the
TL> ideal place to put a whole lot of things, rather than taking the
TL> S-PLUS approach of bundling gam and trees and nlme and everything
TL> else together.

-k
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Kurt Hornik

Wed, Aug 13, 1997 1:23 AM #

(Hmm.  Why do they break subsequently?)

I have two remarks on this one.

* Thomas, if you have ports of the above software, could you PLEASE
package them?  (Btw, I have a finished `port' of the V&R `classif'
library, and of the `clus' library which comes with the paper on JSS.)

* Personally, I feel that x[1,] and x[,1] should be a vector if x is a
matrix.  As Martin said, MATRIX `freaks' should be able to attach class
attribute Matrix which would give drop = F.  (Fortunately, the S lang
does not have scalars, otherwise we'd we in trouble with x[i] for x a
vector ...).

If that breaks existing code, I think we should try to get the authors
of that code to make compatibility changes if possible.  E.g. in the
above cases, if subscripting with drop = F would do the trick in either
case, then I don't think that e.g. V&R would object to making the
change.

Or, perhaps R&R should ask Chambers why S does things the way it does?

Best,
-k
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Thomas Lumley

Wed, Aug 13, 1997 9:21 AM #

To make what I was saying a little clearer, the fact that eg
y<-x[vec,]
y[1,]
returns an error if vec is of length 1 (or has one T, &c) is compatible
with S, but not with what most people expect when writing programs.
Statements like y<-x[vec,] nearly always need drop=F as otherwise any
future references to y must have special cases to handle vectors. 

The problem is that we should have drop=F when indexing by vectors that
happen to be of length 1 and drop=T when indexing by scalars. I suppose
it's reasonable to say that programmers can take care of themselves and we
should choose the best default for interactive use, which is certainly
drop=T. 

Perhaps we just need a mental health warning in the documentation and/or
the faq to remind programmers that array dimensions can vanish whenever
you subset. 

Also

Kurt wrote:

I have ported the addreg library, with additions (and sent it back to
 Harald Fekjaer, who should be releasing it Real Soon Now).  The linear
 discrimination code I referred to is long obsolete and has been fixed (I
 saw it in 1992, I think). I'm intermittently working on the bayesian
 model averaging code, which was the major reason for creating the "leaps" 
 subset selection library. 


Thomas Lumley
-----------------------------------------------------+------
Biostatistics		: "Never attribute to malice what  :
Uni of Washington	:  can be adequately explained by  :
Box 357232		:  incompetence" - Hanlon's Razor  :
Seattle WA 98195-7232	:				   :
------------------------------------------------------------

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-