Skip to content

dividing a matrix by positive sum or negative sum depending on the sign

5 messages · Dimitris Rizopoulos, David Winsemius, Hao Cen

#
Hi,

I have a matrix with positive numbers, negative numbers, and NAs. An
example of the matrix is as follows

-1 -1 2 NA
3 3 -2 -1
1 1 NA -2

I need to compute a scaled version of this matrix. The scaling method is
dividing each positive numbers in each row by the sum of positive numbers
in that row and  dividing each negative numbers in each row by the sum of
absolute value of negative numbers in that row.

So the resulting matrix would be

-1/2 -1/2 2/2 NA
3/6 3/6 -2/3 -1/3
1/2 1/2 NA -2/2

Is there an efficient way to do that in R? One way I am using is

1. rowSums for positive numbers in the matrix
2. rowSums for negative numbers in the matrix
3. sweep(mat, 1, posSumVec, posDivFun)
4. sweep(mat, 1, negSumVec, negDivFun)

posDivFun = function(x,y) {
        xPosId = x>0 & !is.na(x)
        x[xPosId] = x[xPosId]/y[xPosId]
        return(x)
}

negDivFun = function(x,y) {
        xNegId = x<0 & !is.na(x)
        x[xNegId] = -x[xNegId]/y[xNegId]
        return(x)
}

It is not fast enough though. This scaling is to be applied to large data
sets repetitively. I would like to make it as fast as possible. Any
thoughts on improving it would be appreciated.

Thanks

Jeff
#
one approach is the following:

mat <- rbind(c(-1, -1, 2, NA), c(3, 3, -2, -1), c(1, 1, NA, -2))

mat / ave(abs(mat), row(mat), sign(mat), FUN = sum)


I hope it helps.

Best,
Dimitris
Hao Cen wrote:

  
    
#
On Nov 11, 2009, at 10:36 AM, Dimitris Rizopoulos wrote:

            
Very elegant. My solution was a bit more pedestrian, but may have some  
speed advantage:

t( apply(mat, 1, function(x) ifelse( x <0, -x/sum(x[x<0], na.rm=T), x/ 
sum(x[x>0], na.rm=T) ) ) )


 > system.time(replicate(10000, t( apply(mat, 1, function(x) ifelse( x  
<0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) )
    user  system elapsed
   5.958   0.027   5.977

 > system.time(replicate(10000, mat / ave(abs(mat), row(mat),  
sign(mat), FUN = sum) ) )
    user  system elapsed
  12.886   0.064  12.886
#
On Nov 11, 2009, at 10:57 AM, David Winsemius wrote:

            
I am wondering if there might be further performance improvements if  
sums were pre-calculated before the ifelse scaling step.

Perhaps:
 > mat <- matrix(sample(-4:4, 100, replace=T), ncol=10)
 > system.time(replicate(10000, t(apply(mat, 1, function(x) {negs <- 
sum(x[x<0], na.rm=T); poss <- sum(x[x>0], na.rm=T); ifelse( x <0, -x/ 
negs, x/poss)} ) ) ) )
    user  system elapsed
   9.420   0.103   9.619
 > system.time(replicate(10000, t( apply(mat, 1, function(x) ifelse( x  
<0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) )
    user  system elapsed
   8.206   0.035   8.231

That was only a 15% improvement but I got a 50% improvement by  
replacing the ifelse() with its Boolean algebra equivalent:

 > t( apply(mat, 1, function(x) -x*(x <0)/sum(x[x<0], na.rm=T) +  
x*(x>0)/sum(x[x>0], na.rm=T) ) )
      [,1] [,2]       [,3]       [,4]
[1,] -0.5 -0.5  1.0000000         NA
[2,]  0.5  0.5 -0.6666667 -0.3333333
[3,]  0.5  0.5         NA -1.0000000


 > system.time(replicate(10000,  t( apply(mat, 1, function(x) -x*(x  
<0)/sum(x[x<0], na.rm=T) + x*(x>0)/sum(x[x>0], na.rm=T) ) ) ))
    user  system elapsed
   4.805   0.041   4.839

I could not figure out the Jeff's method of applying the two functions  
he presented, so I am unable to compare any of these methods to his  
strategy.
#
Hi David and Dimitris,

Thanks for your suggestions. They are very helpful.

Jeff
On Wed, November 11, 2009 12:12 pm, David Winsemius wrote: