I am having a hard time understanding just what 'sweep' does. The documentation states: Return an array obtained from an input array by sweeping out a summary statistic. So what does it mean "weeping out a summary statistic"? Thank you. Kevin
sweep?
5 messages · rkevinburton at charter.net, David Winsemius, Wacek Kusnierczyk
Either the rows or columns have a (possibly) varying argument vector
applied with an operator. The default operator is "-"/
> dtest <- matrix(1:25, nrow=5)
> dtest
[,1] [,2] [,3] [,4] [,5]
[1,] 1 6 11 16 21
[2,] 2 7 12 17 22
[3,] 3 8 13 18 23
[4,] 4 9 14 19 24
[5,] 5 10 15 20 25
> sweep(dtest, 2, 1)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 5 10 15 20 # -1 argument recycling
[2,] 1 6 11 16 21
[3,] 2 7 12 17 22
[4,] 3 8 13 18 23
[5,] 4 9 14 19 24
> sweep(dtest, 2, 1:5)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 4 8 12 16 # -1 -2 -3 -4 -5
[2,] 1 5 9 13 17 # |
[3,] 2 6 10 14 18 # |
[4,] 3 7 11 15 19 # \/
[5,] 4 8 12 16 20
> sweep(dtest, 1, 1:5)
[,1] [,2] [,3] [,4] [,5]
[1,] 0 5 10 15 20 # -1 --->
[2,] 0 5 10 15 20 # -2
[3,] 0 5 10 15 20 # -3
[4,] 0 5 10 15 20 # -4
[5,] 0 5 10 15 20 # -5
> sweep(dtest, 1, 1:5, FUN="+")
[,1] [,2] [,3] [,4] [,5]
[1,] 2 7 12 17 22 # +1
[2,] 4 9 14 19 24 # etc
[3,] 6 11 16 21 26
[4,] 8 13 18 23 28
[5,] 10 15 20 25 30
On Mar 16, 2009, at 10:25 PM, <rkevinburton at charter.net> <rkevinburton at charter.net
> wrote:
I am having a hard time understanding just what 'sweep' does. The documentation states: Return an array obtained from an input array by sweeping out a summary statistic. So what does it mean "weeping out a summary statistic"? Thank you. Kevin
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Heritage Laboratories West Hartford, CT
rkevinburton at charter.net wrote:
I am having a hard time understanding just what 'sweep' does. The documentation states: Return an array obtained from an input array by sweeping out a summary statistic. So what does it mean "weeping out a summary statistic"?
from both the text and the examples in that help page, it seems that
both 'sweep' and 'summary statistics' are misleading. the argument
STATS is just about any value, vector of values, array of values, etc.,
you might like, and these values are combined, using whatever function
passed as the argument FUN, with the values in the input data. by
default the combinator function FUN is '-', hence 'sweep'.
in this example (from ?sweep, simplified), you're sweeping arbitrary
values ('summary statistics'):
A <- array(1:16, dim = c(4,4))
# sweep 1:2, with recycling
sweep(A, 1, 1:2)
in this example, you're multiplying ('sweeping') the data by some
arbitrary values ('summary statistics'):
A <- array(1:16, dim = c(4, 4))
# sweep by * 1:4, with recycling
sweep(A, 1, 1:4, '*')
be careful to note that here '1' means that the operation is performed
*columnwise*, unlike in the case of apply, where '1' means *rowwise*:
sweep(A, 1, 1:4, '*')
apply(A, 1, '*', 1:4)
(to make sense of the output, not that apply has implicitly transposed
the matrix).
be careful to note that the documentation is *wrong* wrt. the type of
input and output:
"
Arguments:
x: an array.
Value:
An array with the same shape as 'x', but with the summary
statistics swept out.
"
d = data.frame(x=rnorm(10), y = rnorm(10))
is.array(d)
# FALSE
d = sweep(d, 1, 0)
is.array(d)
# FALSE
no error reported, however.
vQ
On Mar 17, 2009, at 4:59 AM, Wacek Kusnierczyk wrote:
rkevinburton at charter.net wrote:
I am having a hard time understanding just what 'sweep' does. The documentation states: Return an array obtained from an input array by sweeping out a summary statistic. So what does it mean "weeping out a summary statistic"?
from both the text and the examples in that help page, it seems that
both 'sweep' and 'summary statistics' are misleading. the argument
STATS is just about any value, vector of values, array of values,
etc.,
you might like, and these values are combined, using whatever function
passed as the argument FUN, with the values in the input data. by
default the combinator function FUN is '-', hence 'sweep'.
in this example (from ?sweep, simplified), you're sweeping arbitrary
values ('summary statistics'):
A <- array(1:16, dim = c(4,4))
# sweep 1:2, with recycling
sweep(A, 1, 1:2)
in this example, you're multiplying ('sweeping') the data by some
arbitrary values ('summary statistics'):
A <- array(1:16, dim = c(4, 4))
# sweep by * 1:4, with recycling
sweep(A, 1, 1:4, '*')
be careful to note that here '1' means that the operation is performed
*columnwise*, unlike in the case of apply, where '1' means *rowwise*:
The sweep operation is really being done by first lining up the second argument, statistic vector, with either the rows or columns of the first argument matrix in the same sense as with apply. The sweeping is then done in the remaining direction(s). The confusion arises because there are really two (or more) directions of the operation and you are focussing on the second. I have no argument with your assertion that the documentation was not clear in this regard or in the meaning of "summary statistic".
David Winsemius > > > sweep(A, 1, 1:4, '*') > apply(A, 1, '*', 1:4) > > > (to make sense of the output, not that apply has implicitly transposed > the matrix). > > be careful to note that the documentation is *wrong* wrt. the type of > input and output: > > " > Arguments: > > x: an array. > > Value: > > An array with the same shape as 'x', but with the summary > statistics swept out. > " > > d = data.frame(x=rnorm(10), y = rnorm(10)) > is.array(d) > # FALSE > > d = sweep(d, 1, 0) > is.array(d) > # FALSE > > no error reported, however. > > vQ
David Winsemius wrote:
A <- array(1:16, dim = c(4, 4)) # sweep by * 1:4, with recycling sweep(A, 1, 1:4, '*') be careful to note that here '1' means that the operation is performed *columnwise*, unlike in the case of apply, where '1' means *rowwise*:
The sweep operation is really being done by first lining up the second argument, statistic vector, with either the rows or columns of the first argument matrix in the same sense as with apply. The sweeping is then done in the remaining direction(s). The confusion arises because there are really two (or more) directions of the operation and you are focussing on the second.
hmm, that's ok, but i find the meaning of the MARGIN argument a little bit confusing, esp. when the STATS argument is a vector -- as in the example above. you could equally well argue that in the case of apply, the vector is aligned with rows, and the applications of FUN are done in the other (column) direction. vQ