Skip to content
Prev 36033 / 63424 Next

scale(x, center=FALSE) (PR#14219)

[cc'ing back to r-devel]
Maria Rizzo wrote:
This is really a disagreement with the way the function is implemented
(and I happen to agree with you), but I would argue that it is *not* a
bug in the strict sense -- I would call it a "misfeature".

  From the R FAQ:
[snip]
"should" according to you ...
Again, I agree with you that the behavior is not optimal, but it is
very hard to make changes in R when the behavior is sub-optimal rather
than actually wrong (by some definition).  R-core is very conservative
about changes that break backward compatibility; I would like it if they
chose to change the function to use standard deviation rather than
root-mean-square, but I doubt it will happen (and it would break things
for any users who are relying on the current definition).

  It turns out that the documentation for this function was changed on
25 Nov 2009 to clarify this issue, but I think the change (which among
other minor changes modified the previous use of "root mean square" to
"standard deviation") didn't help that much ...  I have attached a patch
file (and append the information below as well) that changes "standard
deviation" back to "root mean square" and is much more explicit about
this issue ... I hope R-core will jump in, critique it, and possibly use
it in some form to improve (?) the documentation ...

  [PS: I have written that the scaling is equivalent to sd() "if and
only if" centering was done.  Technically it would also be equivalent if
the column already had zero mean ...]

===================================================================
--- scale.Rd	(revision 51180)
+++ scale.Rd	(working copy)
@@ -41,13 +41,18 @@
   equal to the number of columns of \code{x}, then each column of
   \code{x} is divided by the corresponding value from \code{scale}.  If
   \code{scale} is \code{TRUE} then scaling is done by dividing the
-  (centered) columns of \code{x} by their standard deviations, and if
+  (centered) columns of \code{x} by their root-mean-squares, and if
   \code{scale} is \code{FALSE}, no scaling is done.
-
-  The standard deviation for a column is obtained by computing the
-  square-root of the sum-of-squares of the non-missing values in the
-  column divided by the number of non-missing values minus one (whether
-  or not centering was done).
+
+  The root-mean-square for a (possibly centered)
+  column is defined as
+  \eqn{\sqrt{\sum(x^2)/(n-1)}}{sqrt(sum(x^2)/(n-1))},
+  where \eqn{x} is a vector of the non-missing values
+  and \eqn{n} is the number of non-missing values.
+  If (and only if) centering was done,
+  this is equivalent to \code{sd(x,na.rm=TRUE)}.
+  (To scale by the standard deviations without centering,
+  use \code{scale(x,center=FALSE,scale=apply(x,2,sd,na.rm=TRUE))}.)
 }
 \references{
   Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: scale.Rd.patch
Type: text/x-patch
Size: 1340 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20100226/b83e9d8b/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 261 bytes
Desc: OpenPGP digital signature
URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20100226/b83e9d8b/attachment-0001.bin>