Possible Improvement of the R code
Unlike C or C++, subscripting vector or matrix is not almost free.
Also, subscripting a matrix tends to make more time than subscripting a
vector so it can help to pull out the previous row of params once
per iteration, use vector subscripting on the that row to build the new
row, and then insert the new row back into params with one matrix
subscripting call:
f5 <- function(param) {
for(i in 2:11) {
# do matrix subscripting only twice per iteration
prev <- param[i-1, ]
param[i,] <- c(
prev[3] + prev[4]/2,
prev[4]/2 + prev[5],
prev[1] * (prev[3] + prev[4]/2),
prev[1] * (prev[4]/2 + prev[5]) + prev[2] * (prev[3] + prev[4]/2),
prev[2] * (prev[4]/2 + prev[5]))
}
param
}
You can also do as much as possible to not repeat any subscripting or
sums:
f6 <- function(param) {
for(i in 2:11) {
# do any subscripting only twice/iteration and
# repeated sums only once/iteration.
prev <- param[i-1, ]
p1 <- prev[1]
p2 <- prev[2]
p42 <- prev[4]/2
p342 <- prev[3] + p42
p425 <- p42 + prev[5]
param[i,] <- c(
p342,
p425,
p1 * p342,
p1 * p425 + p2 * p342,
p2 * p425)
}
param
}
I was curious about how much the compiler package could replace hand-optimization
so I timed these with and without compiling. (f1 and f3 are Berend's f1 and f3; I renamed
his f2 and f4 to f1c and f3c to indicate they were the compiled versions.)
funs <- list(f1=f1, f1c=f1c, f3=f3, f3c=f3c, f5=f5, f5c=f5c, f6=f6, f6c=f6c) z <- vapply(funs, function(f)system.time(for(i in 1:10000)f(param))[1:3], numeric(3)) z
f1 f1c f3 f3c f5 f5c f6 f6c user.self 5.03 2.82 3.74 1.89 2.73 1.78 1.54 0.9 sys.self 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0 elapsed 5.03 2.83 3.75 1.90 2.76 1.77 1.54 0.9
matrix(z["user.self", ], byrow=TRUE, ncol=2, dimnames=list(c("f1","f3","f5","f6"), c("raw","compiled")))
raw compiled f1 5.03 2.82 f3 3.74 1.89 f5 2.73 1.78 f6 1.54 0.90 It looks like both hand- and machine-optimization help quite a bit here. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Berend Hasselman Sent: Sunday, September 16, 2012 9:28 PM To: li li Cc: r-help Subject: Re: [R] Possible Improvement of the R code On 17-09-2012, at 00:51, li li wrote:
Dear all,
In the following code, I was trying to compute each row of the "param"
iteratively based
on the first row.
This likely is not the best way. Can anyone suggest a simpler way to
improve the code.
Thanks a lot!
Hannah
param <- matrix(0, 11, 5)
colnames(param) <- c("p", "q", "r", "2s", "t")
param[1,] <- c(0.5, 0.5, 0.4, 0.5, 0.1)
for (i in 2:11){
param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
param[i,3] <- param[(i-1),1]*(param[(i-1),3]+param[(i-1),4]/2)
param[i,4] <- param[(i-1),1]*(param[(i-1),4]/2+param[(i-1),5])+param[(i-1),2
]*(param[(i-1),3]+param[(i-1),4]/2)
param[i,5] <- param[(i-1),2]*(param[(i-1),4]/2+param[(i-1),5])
}
You can use the compiler package.
It also helps if you don't repeat certain calculations. For example (param[(i-
1),3]+param[(i-1),4]/2) is computed three times.
Once is enough.
See this example where your code has been put in function f1. The simplified code is in
function f3.
Functions f2 and f4 are the compiled versions of f1 and f3.
library(compiler)
library(rbenchmark)
param <- matrix(0, 11, 5)
colnames(param) <- c("p", "q", "r", "2s", "t")
param[1,] <- c(0.5, 0.5, 0.4, 0.5, 0.1)
# your calculation
f1 <- function(param) {
for (i in 2:11){
param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
param[i,3] <- param[(i-1),1]*(param[(i-1),3]+param[(i-1),4]/2)
param[i,4] <- param[(i-1),1]*(param[(i-1),4]/2+param[(i-1),5])+param[(i-
1),2]*(param[(i-1),3]+param[(i-1),4]/2)
param[i,5] <- param[(i-1),2]*(param[(i-1),4]/2+param[(i-1),5])
}
param
}
f2 <- cmpfun(f1)
# modified by replacing identical sub-expressions with result
f3 <- function(param) {
for (i in 2:11){
param[i,1] <- param[(i-1),3]+param[(i-1),4]/2
param[i,2] <- param[(i-1),4]/2+param[(i-1),5]
param[i,3] <- param[(i-1),1]*param[i,1]
param[i,4] <- param[(i-1),1]*param[i,2]+param[(i-1),2]*param[i,1]
param[i,5] <- param[(i-1),2]*param[i,2]
}
param
}
f4 <- cmpfun(f3)
z1 <- f1(param)
z2 <- f2(param)
z3 <- f3(param)
z4 <- f4(param)
Running in R
all.equal(z2,z1)
[1] TRUE
all.equal(z3,z1)
[1] TRUE
all.equal(z4,z1)
[1] TRUE
benchmark(f1(param), f2(param), f3(param), f4(param),replications=5000,
columns=c("test", "replications", "elapsed", "relative"))
test replications elapsed relative
1 f1(param) 5000 3.748 2.502
2 f2(param) 5000 2.104 1.405
3 f3(param) 5000 2.745 1.832
4 f4(param) 5000 1.498 1.000
f4 is quite an improvement over f1.
It's quite possible that more can be gained but I'm too lazy to investigate further.
Berend
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.