sweep() and recycling
I think that as the proponents do not agree, we need to leave this as is. BTW, R-devel is the place to discuss patched to R, rather than R-help.
On Tue, 21 Jun 2005, Heather Turner wrote:
Agreed my examples may be trivial and I'm sure there are more efficient ways to do the same thing, but I disagree that my examples are against the spirit of sweep (). In the first example a vector is swept out from the rows, with one value for odd rows and one value for even rows. In the second example an array of values is swept out across the third dimension. In the third example an array of values is swept out from the full array. The first example is a natural use of recycling. E.g. sweep(matrix(1:100, 50, 2), 1, c(1, 1000), "+", give.warning = TRUE) is a quicker way of writing sweep(matrix(1:100, 50, 2), 1, rep(c(1, 1000), 25), "+", give.warning = TRUE) but your code would give a warning in the first case, even though the intent and the result are exactly the same as in the second case. As you say, it is only a warning, that can be ignored. However the warning should at least reflect the warning condition used, i.e. warn that the length of STATS does not equal the extent of MARGIN, rather warning that STATS does not recycle exactly. Heather
Robin Hankin <r.hankin at noc.soton.ac.uk> 06/21/05 02:47pm >>>
Hi On Jun 21, 2005, at 02:33 pm, Heather Turner wrote:
I think the warning condition in Robin's patch is too harsh - the following examples seem reasonable to me, but all produce warnings sweep(array(1:24, dim = c(4,3,2)), 1, 1:2, give.warning = TRUE) sweep(array(1:24, dim = c(4,3,2)), 1, 1:12, give.warning = TRUE) sweep(array(1:24, dim = c(4,3,2)), 1, 1:24, give.warning = TRUE)
The examples above do give warnings (as intended) but I think all three cases above are inimical to the spirit of sweep(): nothing is being "swept" out. So a warning is appropriate, IMO. In any case, one can always suppress (or ignore!) a warning if one knows what one is doing. YMMV, but if I wanted to do the above operations I would replace sweep(array(0, dim = c(4,3,2)), c(1,3), 1:12, "+" , give.warning = FALSE) with aperm(array(1:12,c(4,2,3)),c(1,3,2)) best wishes rksh
I have written an alternative (given below) which does not give warnings in the above cases, but does warn in the following case
sweep(array(1:24, dim = c(4,3,2)), 1:2, 1:3)
, , 1
[,1] [,2] [,3]
[1,] 0 3 6
[2,] 0 3 9
[3,] 0 6 9
[4,] 3 6 9
, , 2
[,1] [,2] [,3]
[1,] 12 15 18
[2,] 12 15 21
[3,] 12 18 21
[4,] 15 18 21
Warning message:
STATS does not recycle exactly across MARGIN
The code could be easily modified to warn in other cases, e.g. when
length of STATS is a divisor of the corresponding array extent (as in
the first example above, with length(STATS) = 2).
The code also includes Gabor's suggestion.
Heather
sweep <- function (x, MARGIN, STATS, FUN = "-", warn =
getOption("warn"), ...)
{
FUN <- match.fun(FUN)
dims <- dim(x)
perm <- c(MARGIN, (1:length(dims))[-MARGIN])
if (warn >= 0) {
s <- length(STATS)
cumDim <- c(1, cumprod(dims[perm]))
if (s > max(cumDim))
warning("length of STATS greater than length of array",
call. = FALSE)
else {
upper <- min(ifelse(cumDim > s, cumDim, max(cumDim)))
lower <- max(ifelse(cumDim < s, cumDim, min(cumDim)))
if (any(upper %% s != 0, s %% lower != 0))
warning("STATS does not recycle exactly across MARGIN",
call. = FALSE)
}
}
FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
}
Gabor Grothendieck <ggrothendieck at gmail.com> 06/21/05 01:25pm >>>
\
Perhaps the signature should be:
sweep(...other args go here..., warn=getOption("warn"))
so that the name and value of the argument are consistent with
the R warn option.
On 6/21/05, Robin Hankin <r.hankin at noc.soton.ac.uk> wrote:
On Jun 20, 2005, at 04:58 pm, Prof Brian Ripley wrote:
The issue here is that the equivalent command array(1:5, c(6,6)) (to matrix(1:5,6,6)) gives no warning, and sweep uses array(). I am not sure either should: fractional recycling was normally allowed in S3 (S4 tightened up a bit). Perhaps someone who thinks sweep() should warn could contribute a tested patch?
OK, modified R code and Rd file below (is this the best way to do
this?)
"sweep" <-
function (x, MARGIN, STATS, FUN = "-", give.warning = FALSE, ...)
{
FUN <- match.fun(FUN)
dims <- dim(x)
if(give.warning & length(STATS)>1 & any(dims[MARGIN] !=
dim(as.array(STATS)))){
warning("array extents do not recycle exactly")
}
perm <- c(MARGIN, (1:length(dims))[-MARGIN])
FUN(x, aperm(array(STATS, dims[perm]), order(perm)), ...)
}
\name{sweep}
\alias{sweep}
\title{Sweep out Array Summaries}
\description{
Return an array obtained from an input array by sweeping out a
summary
statistic.
}
\usage{
sweep(x, MARGIN, STATS, FUN="-", give.warning = FALSE, \dots)
}
\arguments{
\item{x}{an array.}
\item{MARGIN}{a vector of indices giving the extents of \code{x}
which correspond to \code{STATS}.}
\item{STATS}{the summary statistic which is to be swept out.}
\item{FUN}{the function to be used to carry out the sweep. In the
case of binary operators such as \code{"/"} etc., the function
name
must be quoted.}
\item{give.warning}{Boolean, with default \code{FALSE} meaning to
give no warning, even if array extents do not match. If
\code{TRUE}, check for the correct dimensions and if a
mismatch is detected, give a suitable warning.}
\item{\dots}{optional arguments to \code{FUN}.}
}
\value{
An array with the same shape as \code{x}, but with the summary
statistics swept out.
}
\note{
If \code{STATS} is of length 1, recycling is carried out with no
warning irrespective of the value of \code{give.warning}.
}
\references{
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
\emph{The New S Language}.
Wadsworth \& Brooks/Cole.
}
\seealso{
\code{\link{apply}} on which \code{sweep} used to be based;
\code{\link{scale}} for centering and scaling.
}
\examples{
require(stats) # for median
med.att <- apply(attitude, 2, median)
sweep(data.matrix(attitude), 2, med.att)# subtract the column medians
a <- array(0, c(2, 3, 4))
b <- matrix(1:8, c(2, 4))
sweep(a, c(1, 3), b, "+", give.warning = TRUE) # no warning:
all(dim(a)[c(1,3)] == dim(b))
sweep(a, c(1, 2), b, "+", give.warning = TRUE) # warning given
}
\keyword{array}
\keyword{iteration}
--
Robin Hankin
Uncertainty Analyst
National Oceanography Centre, Southampton
European Way, Southampton SO14 3ZH, UK
tel 023-8059-7743
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
-- Robin Hankin Uncertainty Analyst National Oceanography Centre, Southampton European Way, Southampton SO14 3ZH, UK tel 023-8059-7743
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595