Hi,
I have a matrix with positive numbers, negative numbers, and NAs. An
example of the matrix is as follows
-1 -1 2 NA
3 3 -2 -1
1 1 NA -2
I need to compute a scaled version of this matrix. The scaling method is
dividing each positive numbers in each row by the sum of positive numbers
in that row and dividing each negative numbers in each row by the sum of
absolute value of negative numbers in that row.
So the resulting matrix would be
-1/2 -1/2 2/2 NA
3/6 3/6 -2/3 -1/3
1/2 1/2 NA -2/2
Is there an efficient way to do that in R? One way I am using is
1. rowSums for positive numbers in the matrix
2. rowSums for negative numbers in the matrix
3. sweep(mat, 1, posSumVec, posDivFun)
4. sweep(mat, 1, negSumVec, negDivFun)
posDivFun = function(x,y) {
xPosId = x>0 & !is.na(x)
x[xPosId] = x[xPosId]/y[xPosId]
return(x)
}
negDivFun = function(x,y) {
xNegId = x<0 & !is.na(x)
x[xNegId] = -x[xNegId]/y[xNegId]
return(x)
}
It is not fast enough though. This scaling is to be applied to large data
sets repetitively. I would like to make it as fast as possible. Any
thoughts on improving it would be appreciated.
Thanks
Jeff
dividing a matrix by positive sum or negative sum depending on the sign
5 messages · Dimitris Rizopoulos, David Winsemius, Hao Cen
one approach is the following: mat <- rbind(c(-1, -1, 2, NA), c(3, 3, -2, -1), c(1, 1, NA, -2)) mat / ave(abs(mat), row(mat), sign(mat), FUN = sum) I hope it helps. Best, Dimitris
Hao Cen wrote:
Hi,
I have a matrix with positive numbers, negative numbers, and NAs. An
example of the matrix is as follows
-1 -1 2 NA
3 3 -2 -1
1 1 NA -2
I need to compute a scaled version of this matrix. The scaling method is
dividing each positive numbers in each row by the sum of positive numbers
in that row and dividing each negative numbers in each row by the sum of
absolute value of negative numbers in that row.
So the resulting matrix would be
-1/2 -1/2 2/2 NA
3/6 3/6 -2/3 -1/3
1/2 1/2 NA -2/2
Is there an efficient way to do that in R? One way I am using is
1. rowSums for positive numbers in the matrix
2. rowSums for negative numbers in the matrix
3. sweep(mat, 1, posSumVec, posDivFun)
4. sweep(mat, 1, negSumVec, negDivFun)
posDivFun = function(x,y) {
xPosId = x>0 & !is.na(x)
x[xPosId] = x[xPosId]/y[xPosId]
return(x)
}
negDivFun = function(x,y) {
xNegId = x<0 & !is.na(x)
x[xNegId] = -x[xNegId]/y[xNegId]
return(x)
}
It is not fast enough though. This scaling is to be applied to large data
sets repetitively. I would like to make it as fast as possible. Any
thoughts on improving it would be appreciated.
Thanks
Jeff
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
On Nov 11, 2009, at 10:36 AM, Dimitris Rizopoulos wrote:
one approach is the following: mat <- rbind(c(-1, -1, 2, NA), c(3, 3, -2, -1), c(1, 1, NA, -2)) mat / ave(abs(mat), row(mat), sign(mat), FUN = sum)
Very elegant. My solution was a bit more pedestrian, but may have some
speed advantage:
t( apply(mat, 1, function(x) ifelse( x <0, -x/sum(x[x<0], na.rm=T), x/
sum(x[x>0], na.rm=T) ) ) )
> system.time(replicate(10000, t( apply(mat, 1, function(x) ifelse( x
<0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) )
user system elapsed
5.958 0.027 5.977
> system.time(replicate(10000, mat / ave(abs(mat), row(mat),
sign(mat), FUN = sum) ) )
user system elapsed
12.886 0.064 12.886
David
>
>
> I hope it helps.
>
> Best,
> Dimitris
>
>
> Hao Cen wrote:
>> Hi,
>> I have a matrix with positive numbers, negative numbers, and NAs. An
>> example of the matrix is as follows
>> -1 -1 2 NA
>> 3 3 -2 -1
>> 1 1 NA -2
>> I need to compute a scaled version of this matrix. The scaling
>> method is
>> dividing each positive numbers in each row by the sum of positive
>> numbers
>> in that row and dividing each negative numbers in each row by the
>> sum of
>> absolute value of negative numbers in that row.
>> So the resulting matrix would be
>> -1/2 -1/2 2/2 NA
>> 3/6 3/6 -2/3 -1/3
>> 1/2 1/2 NA -2/2
>> Is there an efficient way to do that in R? One way I am using is
>> 1. rowSums for positive numbers in the matrix
>> 2. rowSums for negative numbers in the matrix
>> 3. sweep(mat, 1, posSumVec, posDivFun)
>> 4. sweep(mat, 1, negSumVec, negDivFun)
>> posDivFun = function(x,y) {
>> xPosId = x>0 & !is.na(x)
>> x[xPosId] = x[xPosId]/y[xPosId]
>> return(x)
>> }
>> negDivFun = function(x,y) {
>> xNegId = x<0 & !is.na(x)
>> x[xNegId] = -x[xNegId]/y[xNegId]
>> return(x)
>> }
>> It is not fast enough though. This scaling is to be applied to
>> large data
>> sets repetitively. I would like to make it as fast as possible. Any
>> thoughts on improving it would be appreciated.
>> Thanks
>> Jeff
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> Dimitris Rizopoulos
> Assistant Professor
> Department of Biostatistics
> Erasmus University Medical Center
>
> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
> Tel: +31/(0)10/7043478
> Fax: +31/(0)10/7043014
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
On Nov 11, 2009, at 10:57 AM, David Winsemius wrote:
On Nov 11, 2009, at 10:36 AM, Dimitris Rizopoulos wrote:
one approach is the following: mat <- rbind(c(-1, -1, 2, NA), c(3, 3, -2, -1), c(1, 1, NA, -2)) mat / ave(abs(mat), row(mat), sign(mat), FUN = sum)
Very elegant. My solution was a bit more pedestrian, but may have some speed advantage:
I am wondering if there might be further performance improvements if
sums were pre-calculated before the ifelse scaling step.
Perhaps:
> mat <- matrix(sample(-4:4, 100, replace=T), ncol=10)
> system.time(replicate(10000, t(apply(mat, 1, function(x) {negs <-
sum(x[x<0], na.rm=T); poss <- sum(x[x>0], na.rm=T); ifelse( x <0, -x/
negs, x/poss)} ) ) ) )
user system elapsed
9.420 0.103 9.619
> system.time(replicate(10000, t( apply(mat, 1, function(x) ifelse( x
<0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) )
user system elapsed
8.206 0.035 8.231
That was only a 15% improvement but I got a 50% improvement by
replacing the ifelse() with its Boolean algebra equivalent:
> t( apply(mat, 1, function(x) -x*(x <0)/sum(x[x<0], na.rm=T) +
x*(x>0)/sum(x[x>0], na.rm=T) ) )
[,1] [,2] [,3] [,4]
[1,] -0.5 -0.5 1.0000000 NA
[2,] 0.5 0.5 -0.6666667 -0.3333333
[3,] 0.5 0.5 NA -1.0000000
> system.time(replicate(10000, t( apply(mat, 1, function(x) -x*(x
<0)/sum(x[x<0], na.rm=T) + x*(x>0)/sum(x[x>0], na.rm=T) ) ) ))
user system elapsed
4.805 0.041 4.839
I could not figure out the Jeff's method of applying the two functions
he presented, so I am unable to compare any of these methods to his
strategy.
David.
>
>
> > system.time(replicate(10000, t( apply(mat, 1, function(x)
> ifelse( x <0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0],
> na.rm=T) ) ) ) ) )
> user system elapsed
> 5.958 0.027 5.977
>
> > system.time(replicate(10000, mat / ave(abs(mat), row(mat),
> sign(mat), FUN = sum) ) )
> user system elapsed
> 12.886 0.064 12.886
>
> --
> David
>>
>>
>> I hope it helps.
>>
>> Best,
>> Dimitris
>>
>>
>> Hao Cen wrote:
>>> Hi,
>>> I have a matrix with positive numbers, negative numbers, and NAs. An
>>> example of the matrix is as follows
>>> -1 -1 2 NA
>>> 3 3 -2 -1
>>> 1 1 NA -2
>>> I need to compute a scaled version of this matrix. The scaling
>>> method is
>>> dividing each positive numbers in each row by the sum of positive
>>> numbers
>>> in that row and dividing each negative numbers in each row by the
>>> sum of
>>> absolute value of negative numbers in that row.
>>> So the resulting matrix would be
>>> -1/2 -1/2 2/2 NA
>>> 3/6 3/6 -2/3 -1/3
>>> 1/2 1/2 NA -2/2
>>> Is there an efficient way to do that in R? One way I am using is
>>> 1. rowSums for positive numbers in the matrix
>>> 2. rowSums for negative numbers in the matrix
>>> 3. sweep(mat, 1, posSumVec, posDivFun)
>>> 4. sweep(mat, 1, negSumVec, negDivFun)
>>> posDivFun = function(x,y) {
>>> xPosId = x>0 & !is.na(x)
>>> x[xPosId] = x[xPosId]/y[xPosId]
>>> return(x)
>>> }
>>> negDivFun = function(x,y) {
>>> xNegId = x<0 & !is.na(x)
>>> x[xNegId] = -x[xNegId]/y[xNegId]
>>> return(x)
>>> }
>>> It is not fast enough though. This scaling is to be applied to
>>> large data
>>> sets repetitively. I would like to make it as fast as possible. Any
>>> thoughts on improving it would be appreciated.
>>> Thanks
>>> Jeff
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> --
>> Dimitris Rizopoulos
>> Assistant Professor
>> Department of Biostatistics
>> Erasmus University Medical Center
>>
>> Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
>> Tel: +31/(0)10/7043478
>> Fax: +31/(0)10/7043014
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> Heritage Laboratories
> West Hartford, CT
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
Hi David and Dimitris, Thanks for your suggestions. They are very helpful. Jeff
On Wed, November 11, 2009 12:12 pm, David Winsemius wrote:
On Nov 11, 2009, at 10:57 AM, David Winsemius wrote:
On Nov 11, 2009, at 10:36 AM, Dimitris Rizopoulos wrote:
one approach is the following: mat <- rbind(c(-1, -1, 2, NA), c(3, 3, -2, -1), c(1, 1, NA, -2)) mat / ave(abs(mat), row(mat), sign(mat), FUN = sum)
Very elegant. My solution was a bit more pedestrian, but may have some speed advantage:
I am wondering if there might be further performance improvements if sums were pre-calculated before the ifelse scaling step. Perhaps:
mat <- matrix(sample(-4:4, 100, replace=T), ncol=10)
system.time(replicate(10000, t(apply(mat, 1, function(x) {negs <-
sum(x[x<0], na.rm=T); poss <- sum(x[x>0], na.rm=T); ifelse( x <0, -x/ negs, x/poss)} ) ) ) ) user system elapsed 9.420 0.103 9.619
system.time(replicate(10000, t( apply(mat, 1, function(x) ifelse( x
<0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) ) user system elapsed 8.206 0.035 8.231 That was only a 15% improvement but I got a 50% improvement by replacing the ifelse() with its Boolean algebra equivalent:
t( apply(mat, 1, function(x) -x*(x <0)/sum(x[x<0], na.rm=T) +
x*(x>0)/sum(x[x>0], na.rm=T) ) ) [,1] [,2] [,3] [,4] [1,] -0.5 -0.5 1.0000000 NA [2,] 0.5 0.5 -0.6666667 -0.3333333 [3,] 0.5 0.5 NA -1.0000000
system.time(replicate(10000, t( apply(mat, 1, function(x) -x*(x
<0)/sum(x[x<0], na.rm=T) + x*(x>0)/sum(x[x>0], na.rm=T) ) ) )) user system elapsed 4.805 0.041 4.839 I could not figure out the Jeff's method of applying the two functions he presented, so I am unable to compare any of these methods to his strategy. -- David.
system.time(replicate(10000, t( apply(mat, 1, function(x)
ifelse( x <0, -x/sum(x[x<0], na.rm=T), x/sum(x[x>0], na.rm=T) ) ) ) ) ) user system elapsed 5.958 0.027 5.977
system.time(replicate(10000, mat / ave(abs(mat), row(mat),
sign(mat), FUN = sum) ) ) user system elapsed 12.886 0.064 12.886 -- David
I hope it helps. Best, Dimitris Hao Cen wrote:
Hi,
I have a matrix with positive numbers, negative numbers, and NAs. An
example of the matrix is as follows -1 -1 2 NA
3 3 -2 -1
1 1 NA -2
I need to compute a scaled version of this matrix. The scaling
method is dividing each positive numbers in each row by the sum of
positive numbers in that row and dividing each negative numbers in
each row by the sum of absolute value of negative numbers in that
row. So the resulting matrix would be
-1/2 -1/2 2/2 NA
3/6 3/6 -2/3 -1/3
1/2 1/2 NA -2/2
Is there an efficient way to do that in R? One way I am using is
1. rowSums for positive numbers in the matrix
2. rowSums for negative numbers in the matrix
3. sweep(mat, 1, posSumVec, posDivFun)
4. sweep(mat, 1, negSumVec, negDivFun)
posDivFun = function(x,y) { xPosId = x>0 & !is.na(x) x[xPosId] =
x[xPosId]/y[xPosId] return(x) }
negDivFun = function(x,y) { xNegId = x<0 & !is.na(x) x[xNegId] =
-x[xNegId]/y[xNegId]
return(x) }
It is not fast enough though. This scaling is to be applied to
large data sets repetitively. I would like to make it as fast as
possible. Any thoughts on improving it would be appreciated. Thanks
Jeff
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Heritage Laboratories West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD Heritage Laboratories West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.