Skip to content

man page for as.matrix for data frames outdated?

5 messages · Hervé Pagès, Bill Dunlap, Martin Maechler

#
Hi again,


The man page for 'as.matrix' says:

     'as.matrix' is a generic function. The method for data frames will
     convert any non-numeric/complex column into a character vector
     using 'format' and so return a character matrix, except that
     all-logical data frames will be coerced to a logical matrix.

It's true that "all-logical data frames will be coerced to a logical
matrix":

    > fourLogicals <- 2:5>3
    > df1 <- data.frame(a=fourLogicals)
    > storage.mode(as.matrix(df1))
    [1] "logical"

Otherwise it's not true that 'as.matrix' will return a character matrix:

    > fourInts <- 2:-1
    > df2 <- data.frame(a=fourLogicals, b=fourInts)
    > storage.mode(as.matrix(df2))
    [1] "integer"


    > fourDoubles <- rep(pi,4)
    > df3 <- data.frame(c=fourDoubles, a=fourLogicals, b=fourInts)
    > storage.mode(as.matrix(df3))
    [1] "double"


    > fourComplexes <- (-1:2)+3i
    > df4 <- data.frame(a=fourLogicals, d=fourComplexes, b=fourInts,
    c=fourDoubles)
    > storage.mode(as.matrix(df4))
    [1] "complex"

If one column is of mode character, then 'as.matrix' will effectively
return a character matrix:

    > df5 <- data.frame(toto=c("a","bb"), titi=c(9,999))
    > storage.mode(as.matrix(df5))
    [1] "character"

Note that the doc says that "any non-numeric/complex column" will
be passed thru 'format' which seems to be exactly the other way
around:

    > as.matrix(df5)
      toto titi
    1 "a"  "  9"
    2 "bb" "999"

Anyway why one would like to have the numeric values passed
thru 'format' to start with?

This is in R-2.4.0 and recent R-devel.

Best,
H.
#
Herve> Hi again, The man page for 'as.matrix' says:

   >>      'as.matrix' is a generic function. The method
   >> for data frames will convert any non-numeric/complex
   >> column into a character vector using 'format' and so
   >> return a character matrix, except that all-logical
   >> data frames will be coerced to a logical matrix.

In very old versions of R (e.g. 0.3 from March 1996),

    is.numeric(<logical>)  was TRUE
and there the help page was entirely correct.

I think if you replace [in the above paragraph]
    "non-numeric/complex"
by  "non-(logical/numeric/complex)"

the help page is correct again.


    Herve> If one column is of mode character, then 'as.matrix'
    Herve> will effectively return a character matrix:

(as it says in the man page, and always did in all
 implementations of the S language)

    Herve> Note that the doc says that "any non-numeric/complex
    Herve> column" will be passed thru 'format' which seems to
    Herve> be exactly the other way around:

No!  You left off the second part of the statement cited initally.
Slightly reformulated:

	 Iff there's any non-(logical/numeric/complex) column,
	 that will have to be passed through format and hence
	 the result must be a character matrix 
	 and hence every other column also needs to be "formatted"


    >> as.matrix(df5)
    Herve>       toto titi 1 "a" " 9" 2 "bb" "999"

    Herve> Anyway why one would like to have the numeric values
    Herve> passed thru 'format' to start with?

Recall: The result must be a matrix !
	If it can't be a numeric matrix the decision was it must
	be character.

BTW: Exactly because of your problems,  
     The function   data.matrix()  had been devised (long ago in
     pre-R times),
     and data.matrix  *is*  the first entry in "See Also" on the
     help page for as.matrix

Martin
#
Hi Martin,

Thanks for the answer!
OK I can use data.matrix to convert a data frame to a numeric
matrix but that's another story. Basically I'm reporting 2
problems with 'as.matrix' when applied to a data frame:

1) A documentation problem:

    "The method for data frames will convert any
     non-numeric/complex column into a character vector
     using 'format'"

    > df5 <- data.frame(toto=c("a","bb"), titi=c(9,999))
    > as.matrix(df5)
      toto titi
    1 "a"  "  9"
    2 "bb" "999"

     As I said, it seems to be the other way around: it's not the
     "non-numeric" column that is converted to a character vector,
     it's the "numeric" column.

2) the questionable decision to do this conversion using 'format'
   (leading to the addition of unnecessary white space) and not
   simply 'as.character'

BTW your mailer seems to do some strange reformatting to the output
of my code snippets making it hard to see the "formatting" problem
that I'm trying to show.

Cheers,

H.
#
On Fri, 3 Nov 2006, Herve Pages wrote:

            
It is possible that this decision was made because one use of
as.matrix(mixed-mode-data-frame) was for printing data.frames and
format(numberic-column) would line up the decimal points and
make one scientific-notation/or-not decision for the entire column.
(Splus3.4, c. 1996, printed data.frames with print(as.matrix(x)),
but by 1999 Splus4.7 was using print(format(x)).)

   R> df6<- data.frame(toto=c("a","bb","ccc"), titi=c(.9e-20,.99999,999.9))
   R> as.matrix(df6)
     toto  titi
   1 "a"   "9.0000e-21"
   2 "bb"  "9.9999e-01"
   3 "ccc" "9.9990e+02"
   R> df7<-data.frame(toto=c("a","bb","ccc"), titi=c(.9,.99999,999.9))
   R> as.matrix(df7)
     toto  titi
   1 "a"   "  0.90000"
   2 "bb"  "  0.99999"
   3 "ccc" "999.90000"

Once such a decision is made it is hard to change things, especially
when the benefit is slight.

----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146

 "All statements in this message represent the opinions of the author and do
 not necessarily reflect Insightful Corporation policy or position."
#
Herve> Hi Martin,
    Herve> Thanks for the answer!
    Herve> OK I can use data.matrix to convert a data frame to a numeric
    Herve> matrix but that's another story. Basically I'm reporting 2
    Herve> problems with 'as.matrix' when applied to a data frame:

yes, indeed, and I was missing them up partly.

    Herve> 1) A documentation problem:

    Herve> "The method for data frames will convert any
    Herve> non-numeric/complex column into a character vector
    Herve> using 'format'"

    >> df5 <- data.frame(toto=c("a","bb"), titi=c(9,999))
    >> as.matrix(df5)

    Herve> toto titi
    Herve> 1 "a"  "  9"
    Herve> 2 "bb" "999"

    Herve> As I said, it seems to be the other way around: it's not the
    Herve> "non-numeric" column that is converted to a character vector,
    Herve> it's the "numeric" column.

Indeed, and I now agree that the documentation is more wrong
than I first acknowledged.  I'm changing it currently.

    Herve> 2) the questionable decision to do this conversion using 'format'
    Herve> (leading to the addition of unnecessary white space) and not
    Herve> simply 'as.character'

as Bill remarked, 
- there were good reasons pro such a decision,
- to change such decisions without necessity is a problem for
  back-compatibility

    Herve> BTW your mailer seems to do some strange reformatting to the output
    Herve> of my code snippets making it hard to see the "formatting" problem
    Herve> that I'm trying to show.

yes. Part of it was my mistake; sorry.

Best regards to Seattle,
Martin