Hi again,
The man page for 'as.matrix' says:
'as.matrix' is a generic function. The method for data frames will
convert any non-numeric/complex column into a character vector
using 'format' and so return a character matrix, except that
all-logical data frames will be coerced to a logical matrix.
It's true that "all-logical data frames will be coerced to a logical
matrix":
> fourLogicals <- 2:5>3
> df1 <- data.frame(a=fourLogicals)
> storage.mode(as.matrix(df1))
[1] "logical"
Otherwise it's not true that 'as.matrix' will return a character matrix:
> fourInts <- 2:-1
> df2 <- data.frame(a=fourLogicals, b=fourInts)
> storage.mode(as.matrix(df2))
[1] "integer"
> fourDoubles <- rep(pi,4)
> df3 <- data.frame(c=fourDoubles, a=fourLogicals, b=fourInts)
> storage.mode(as.matrix(df3))
[1] "double"
> fourComplexes <- (-1:2)+3i
> df4 <- data.frame(a=fourLogicals, d=fourComplexes, b=fourInts,
c=fourDoubles)
> storage.mode(as.matrix(df4))
[1] "complex"
If one column is of mode character, then 'as.matrix' will effectively
return a character matrix:
> df5 <- data.frame(toto=c("a","bb"), titi=c(9,999))
> storage.mode(as.matrix(df5))
[1] "character"
Note that the doc says that "any non-numeric/complex column" will
be passed thru 'format' which seems to be exactly the other way
around:
> as.matrix(df5)
toto titi
1 "a" " 9"
2 "bb" "999"
Anyway why one would like to have the numeric values passed
thru 'format' to start with?
This is in R-2.4.0 and recent R-devel.
Best,
H.
man page for as.matrix for data frames outdated?
5 messages · Hervé Pagès, Bill Dunlap, Martin Maechler
"Herve" == Herve Pages <hpages at fhcrc.org>
on Thu, 02 Nov 2006 20:46:01 -0800 writes:
Herve> Hi again, The man page for 'as.matrix' says:
>> 'as.matrix' is a generic function. The method
>> for data frames will convert any non-numeric/complex
>> column into a character vector using 'format' and so
>> return a character matrix, except that all-logical
>> data frames will be coerced to a logical matrix.
In very old versions of R (e.g. 0.3 from March 1996),
is.numeric(<logical>) was TRUE
and there the help page was entirely correct.
I think if you replace [in the above paragraph]
"non-numeric/complex"
by "non-(logical/numeric/complex)"
the help page is correct again.
Herve> If one column is of mode character, then 'as.matrix'
Herve> will effectively return a character matrix:
(as it says in the man page, and always did in all
implementations of the S language)
Herve> Note that the doc says that "any non-numeric/complex
Herve> column" will be passed thru 'format' which seems to
Herve> be exactly the other way around:
No! You left off the second part of the statement cited initally.
Slightly reformulated:
Iff there's any non-(logical/numeric/complex) column,
that will have to be passed through format and hence
the result must be a character matrix
and hence every other column also needs to be "formatted"
>> as.matrix(df5)
Herve> toto titi 1 "a" " 9" 2 "bb" "999"
Herve> Anyway why one would like to have the numeric values
Herve> passed thru 'format' to start with?
Recall: The result must be a matrix !
If it can't be a numeric matrix the decision was it must
be character.
BTW: Exactly because of your problems,
The function data.matrix() had been devised (long ago in
pre-R times),
and data.matrix *is* the first entry in "See Also" on the
help page for as.matrix
Martin
Hi Martin,
Thanks for the answer!
OK I can use data.matrix to convert a data frame to a numeric
matrix but that's another story. Basically I'm reporting 2
problems with 'as.matrix' when applied to a data frame:
1) A documentation problem:
"The method for data frames will convert any
non-numeric/complex column into a character vector
using 'format'"
> df5 <- data.frame(toto=c("a","bb"), titi=c(9,999))
> as.matrix(df5)
toto titi
1 "a" " 9"
2 "bb" "999"
As I said, it seems to be the other way around: it's not the
"non-numeric" column that is converted to a character vector,
it's the "numeric" column.
2) the questionable decision to do this conversion using 'format'
(leading to the addition of unnecessary white space) and not
simply 'as.character'
BTW your mailer seems to do some strange reformatting to the output
of my code snippets making it hard to see the "formatting" problem
that I'm trying to show.
Cheers,
H.
On Fri, 3 Nov 2006, Herve Pages wrote:
> df5 <- data.frame(toto=c("a","bb"), titi=c(9,999))
> as.matrix(df5)
toto titi
1 "a" " 9"
2 "bb" "999"
...
2) the questionable decision to do this conversion using 'format'
(leading to the addition of unnecessary white space) and not
simply 'as.character'
It is possible that this decision was made because one use of
as.matrix(mixed-mode-data-frame) was for printing data.frames and
format(numberic-column) would line up the decimal points and
make one scientific-notation/or-not decision for the entire column.
(Splus3.4, c. 1996, printed data.frames with print(as.matrix(x)),
but by 1999 Splus4.7 was using print(format(x)).)
R> df6<- data.frame(toto=c("a","bb","ccc"), titi=c(.9e-20,.99999,999.9))
R> as.matrix(df6)
toto titi
1 "a" "9.0000e-21"
2 "bb" "9.9999e-01"
3 "ccc" "9.9990e+02"
R> df7<-data.frame(toto=c("a","bb","ccc"), titi=c(.9,.99999,999.9))
R> as.matrix(df7)
toto titi
1 "a" " 0.90000"
2 "bb" " 0.99999"
3 "ccc" "999.90000"
Once such a decision is made it is hard to change things, especially
when the benefit is slight.
----------------------------------------------------------------------------
Bill Dunlap
Insightful Corporation
bill at insightful dot com
360-428-8146
"All statements in this message represent the opinions of the author and do
not necessarily reflect Insightful Corporation policy or position."
"Herve" == Herve Pages <hpages at fhcrc.org>
on Fri, 03 Nov 2006 10:50:10 -0800 writes:
Herve> Hi Martin,
Herve> Thanks for the answer!
Herve> OK I can use data.matrix to convert a data frame to a numeric
Herve> matrix but that's another story. Basically I'm reporting 2
Herve> problems with 'as.matrix' when applied to a data frame:
yes, indeed, and I was missing them up partly.
Herve> 1) A documentation problem:
Herve> "The method for data frames will convert any
Herve> non-numeric/complex column into a character vector
Herve> using 'format'"
>> df5 <- data.frame(toto=c("a","bb"), titi=c(9,999))
>> as.matrix(df5)
Herve> toto titi
Herve> 1 "a" " 9"
Herve> 2 "bb" "999"
Herve> As I said, it seems to be the other way around: it's not the
Herve> "non-numeric" column that is converted to a character vector,
Herve> it's the "numeric" column.
Indeed, and I now agree that the documentation is more wrong
than I first acknowledged. I'm changing it currently.
Herve> 2) the questionable decision to do this conversion using 'format'
Herve> (leading to the addition of unnecessary white space) and not
Herve> simply 'as.character'
as Bill remarked,
- there were good reasons pro such a decision,
- to change such decisions without necessity is a problem for
back-compatibility
Herve> BTW your mailer seems to do some strange reformatting to the output
Herve> of my code snippets making it hard to see the "formatting" problem
Herve> that I'm trying to show.
yes. Part of it was my mistake; sorry.
Best regards to Seattle,
Martin