sapply puzzlement

Hi,

I have this data.frame with two variables in it,
z
V1 V2
1 10  8
2 NA 18
3  9  7
4  3 NA
5 NA 10
6 11 12
7 13  9
8 12 11

and a vector of means,
means <- apply(z, 2, function (col) mean(na.omit(col)))
means
V1        V2 
 9.666667 10.714286 

My intention was substracting means from z, so instictively I tried
z-means
V1         V2
1  0.3333333 -1.6666667
2         NA  7.2857143
3 -0.6666667 -2.6666667
4 -7.7142857         NA
5         NA  0.3333333
6  0.2857143  1.2857143
7  3.3333333 -0.6666667
8  1.2857143  0.2857143

But this is completely wrong. sapply() gives the same result:
sapply(z, function(row) row - means)
V1         V2
[1,]  0.3333333 -1.6666667
[2,]         NA  7.2857143
[3,] -0.6666667 -2.6666667
[4,] -7.7142857         NA
[5,]         NA  0.3333333
[6,]  0.2857143  1.2857143
[7,]  3.3333333 -0.6666667
[8,]  1.2857143  0.2857143

So, what is going on here? 
The following appears to work
z-matrix(means,ncol=2)[rep(1, dim(z)[1]),]
V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143

but I think it's rather cumbersome, surely there must be a cleaner way
to do it.
Ernest
R works by going down the columns. If you make the rows into columns, it then does what you want. You just have to make the columns back into rows to get the original shape of your matrix.

So the code in one line is :

t(t(z) - means)

---- Original message ----
Date: Fri, 28 Jan 2011 01:16:45 +0100
From: r-help-bounces at r-project.org (on behalf of nfdisco at gmail.com (Ernest Adrogu? i Calveras))
Subject: [R] sapply puzzlement  
To: r-help at r-project.org

Hi,

I have this data.frame with two variables in it,

z
 V1 V2
1 10  8
2 NA 18
3  9  7
4  3 NA
5 NA 10
6 11 12
7 13  9
8 12 11

and a vector of means,

means <- apply(z, 2, function (col) mean(na.omit(col)))
means
      V1        V2 
9.666667 10.714286 

My intention was substracting means from z, so instictively I tried

z-means
         V1         V2
1  0.3333333 -1.6666667
2         NA  7.2857143
3 -0.6666667 -2.6666667
4 -7.7142857         NA
5         NA  0.3333333
6  0.2857143  1.2857143
7  3.3333333 -0.6666667
8  1.2857143  0.2857143

But this is completely wrong. sapply() gives the same result:

sapply(z, function(row) row - means)
            V1         V2
[1,]  0.3333333 -1.6666667
[2,]         NA  7.2857143
[3,] -0.6666667 -2.6666667
[4,] -7.7142857         NA
[5,]         NA  0.3333333
[6,]  0.2857143  1.2857143
[7,]  3.3333333 -0.6666667
[8,]  1.2857143  0.2857143

So, what is going on here? 
The following appears to work

z-matrix(means,ncol=2)[rep(1, dim(z)[1]),]
         V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143

but I think it's rather cumbersome, surely there must be a cleaner way
to do it.

-- 
Ernest

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia
sapply(z, function(row) ...) does not actually grab a row at a time out of
'z'. It grabs a column (because 'z' is a data.frame)

You may want:
t(apply(z, 1, function(row) row - means))

or:

t(t(z) - means)

Hope that helps,

-David Johnston
View this message in context: http://r.789695.n4.nabble.com/sapply-puzzlement-tp3243520p3243534.html
Sent from the R help mailing list archive at Nabble.com.

Hi,

I have this data.frame with two variables in it,

z
 V1 V2
1 10  8
2 NA 18
3  9  7
4  3 NA
5 NA 10
6 11 12
7 13  9
8 12 11

and a vector of means,

means <- apply(z, 2, function (col) mean(na.omit(col)))
means
      V1        V2
9.666667 10.714286
Two methods:

A) use sweep  (which by default takes the difference)

 > sweep(z, 2, means)
           V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143

B) use the scale function (whose "whole purpose in life" is to  
subtract the mean and possibly divide by the standard deviation which  
we suppressed in this case with the scale=FALSE argument)

 > scale(z, scale=FALSE)
           V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143
attr(,"scaled:center")
        V1        V2
  9.666667 10.714286
David.

>
> My intention was substracting means from z, so instictively I tried
>
>> z-means
>          V1         V2
> 1  0.3333333 -1.6666667
> 2         NA  7.2857143
> 3 -0.6666667 -2.6666667
> 4 -7.7142857         NA
> 5         NA  0.3333333
> 6  0.2857143  1.2857143
> 7  3.3333333 -0.6666667
> 8  1.2857143  0.2857143
>
> But this is completely wrong. sapply() gives the same result:
>
>> sapply(z, function(row) row - means)
>             V1         V2
> [1,]  0.3333333 -1.6666667
> [2,]         NA  7.2857143
> [3,] -0.6666667 -2.6666667
> [4,] -7.7142857         NA
> [5,]         NA  0.3333333
> [6,]  0.2857143  1.2857143
> [7,]  3.3333333 -0.6666667
> [8,]  1.2857143  0.2857143
>
> So, what is going on here?
> The following appears to work
>
>> z-matrix(means,ncol=2)[rep(1, dim(z)[1]),]
>          V1         V2
> 1  0.3333333 -2.7142857
> 2         NA  7.2857143
> 3 -0.6666667 -3.7142857
> 4 -6.6666667         NA
> 5         NA -0.7142857
> 6  1.3333333  1.2857143
> 7  3.3333333 -1.7142857
> 8  2.3333333  0.2857143
>
> but I think it's rather cumbersome, surely there must be a cleaner way
> to do it.
>
> -- 
> Ernest
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT
In addition to what has already been suggested you could use ......

mapply(function(x,y) x-y, z,means)

which returns ....

             V1         V2
[1,]  0.3333333 -2.7142857
[2,]         NA  7.2857143
[3,] -0.6666667 -3.7142857
[4,] -6.6666667         NA
[5,]         NA -0.7142857
[6,]  1.3333333  1.2857143
[7,]  3.3333333 -1.7142857
[8,]  2.3333333  0.2857143

The results you see when you use the z-means approach are caused by the
vectors being different lengths. The shorter one (means) is repeated. 

Phil Spector's book describes a nice example which illustrates the behaviour
nicely.

nums = 1:10
nums +c(1,2)

HTH

Pete
View this message in context: http://r.789695.n4.nabble.com/sapply-puzzlement-tp3243520p3243583.html
Sent from the R help mailing list archive at Nabble.com.