Skip to content

sapply puzzlement

5 messages · Ernest Adrogué, Dario Strbenac, David A. Johnston +2 more

#
Hi,

I have this data.frame with two variables in it,
V1 V2
1 10  8
2 NA 18
3  9  7
4  3 NA
5 NA 10
6 11 12
7 13  9
8 12 11

and a vector of means,
V1        V2 
 9.666667 10.714286 

My intention was substracting means from z, so instictively I tried
V1         V2
1  0.3333333 -1.6666667
2         NA  7.2857143
3 -0.6666667 -2.6666667
4 -7.7142857         NA
5         NA  0.3333333
6  0.2857143  1.2857143
7  3.3333333 -0.6666667
8  1.2857143  0.2857143

But this is completely wrong. sapply() gives the same result:
V1         V2
[1,]  0.3333333 -1.6666667
[2,]         NA  7.2857143
[3,] -0.6666667 -2.6666667
[4,] -7.7142857         NA
[5,]         NA  0.3333333
[6,]  0.2857143  1.2857143
[7,]  3.3333333 -0.6666667
[8,]  1.2857143  0.2857143

So, what is going on here? 
The following appears to work
V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143

but I think it's rather cumbersome, surely there must be a cleaner way
to do it.
#
R works by going down the columns. If you make the rows into columns, it then does what you want. You just have to make the columns back into rows to get the original shape of your matrix.

So the code in one line is :

t(t(z) - means)

---- Original message ----
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia
#
sapply(z, function(row) ...) does not actually grab a row at a time out of
'z'. It grabs a column (because 'z' is a data.frame)

You may want:
t(apply(z, 1, function(row) row - means))

or:

t(t(z) - means)


Hope that helps,

-David Johnston
#
On Jan 27, 2011, at 7:16 PM, Ernest Adrogu? i Calveras wrote:

            
Two methods:

A) use sweep  (which by default takes the difference)

 > sweep(z, 2, means)
           V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143


B) use the scale function (whose "whole purpose in life" is to  
subtract the mean and possibly divide by the standard deviation which  
we suppressed in this case with the scale=FALSE argument)

 > scale(z, scale=FALSE)
           V1         V2
1  0.3333333 -2.7142857
2         NA  7.2857143
3 -0.6666667 -3.7142857
4 -6.6666667         NA
5         NA -0.7142857
6  1.3333333  1.2857143
7  3.3333333 -1.7142857
8  2.3333333  0.2857143
attr(,"scaled:center")
        V1        V2
  9.666667 10.714286
#
In addition to what has already been suggested you could use ......

mapply(function(x,y) x-y, z,means)

which returns ....

             V1         V2
[1,]  0.3333333 -2.7142857
[2,]         NA  7.2857143
[3,] -0.6666667 -3.7142857
[4,] -6.6666667         NA
[5,]         NA -0.7142857
[6,]  1.3333333  1.2857143
[7,]  3.3333333 -1.7142857
[8,]  2.3333333  0.2857143

The results you see when you use the z-means approach are caused by the
vectors being different lengths. The shorter one (means) is repeated. 

Phil Spector's book describes a nice example which illustrates the behaviour
nicely.

nums = 1:10
nums +c(1,2)

HTH

Pete