Skip to content

Optimization in R similar to MS Excel Solver

10 messages · Thomas Schu, Berend Hasselman, Richard James

#
Dear Colleagues,
I am attempting to develop an optimization routine for a river suspended
sediment mixing model. I have Calcium and Magnesium concentrations (%) for
sediments from 4 potential source areas (Topsoil, Channel Banks, Roads,
Drains) and I want to work out, based on the suspended sediment calcium and
magnesium concentrations, what are the optimal contributions from each
source area to river suspended sediments. The dataset is as follows:

                                    Topsoil         Channel Bank          
Roads          Drains
Ca(%)                         0.03                0.6                              
0.2                  0.35
Mg(%)                        0.0073           0.0102                       
0.0141           0.012
Contribution           0.25                0.25                            
0.25                0.25

and

Ca in river(%)         0.33
Mg in river (%)      0.0114

I want to optimize the contribution values (currently set at 0.25) by
minimizing the sum of squares of the relative errors of the mixing model,
calculated as follows:

SSRE =( (x1-((a1*c1)+(a2*c2)+(a3*c3)+(a4*c4))/x1)^2) +
(x2-((b1*c1)+(b2*c2)+(b3*c3)+(b4*c4))/x2)^2)
 
Where:
 x1 = calcium in river;
 x2 = magnesium in river;
a1:a4 = Ca in topsoil, channel banks, roads, drains
b1:b4 = Mg in topsoil, channel banks, roads, drains
c1:4 = Contribution to be optimized

I can generate a solution very quickly using the MS Excel Solver function,
however later I want to be able to run a Monte-Carlo simulation on to
generate confidence intervals based on variance in the source area
concentrations ? hence my desire to use R to develop the mixing model.

So far I have been using the ?optim? function, however I?m confused as to
what form the ?par? and ?fn? arguments should take from my data above. I am
also unsure of how to write the model constraints ? i.e. total contribution
should sum to 1, and all values must be non-negative.

Any help, advise, or suggestions to this problem would be very much
appreciated.  




--
View this message in context: http://r.789695.n4.nabble.com/Optimization-in-R-similar-to-MS-Excel-Solver-tp4646759.html
Sent from the R help mailing list archive at Nabble.com.
#
I do not know what algorithms the Excel solver function uses.

See inline for how to do what you want in R.
Forgive me if I have misinterpreted your request.
On 19-10-2012, at 16:25, Richard James wrote:

            
Read data (it would have been a lot more helpful if you had provided the result of dput for the data and the target).

basedat <- read.table(text="Topsoil         Channel-Bank          Roads          Drains
Ca(%)         0.03             0.6                0.2             0.35
Mg(%)         0.0073           0.0102             0.0141          0.012
Contribution  0.25             0.25               0.25            0.25", header=TRUE)

basedat
target <- c("Ca in river(%)"=0.33,"Mg in river (%)"=0.0114)

# convert to matrix and vector for later use

bmat   <- as.matrix(basedat[1:2,])
pstart <- as.numeric(basedat["Contribution",1:3])
I do not understand why you are dividing by x1 and x2. It make no sense to me to calculate (xi- (xi_calculated)/xi)^2
Given what your stated purpose then (xi- xi_calculated)^2 or something like (1-x1/xi_calculated)^2 seem more appropriate.
The 'par' should be a vector of starting values of the parameters for your objective function.
The 'fn' is the function that calculates a scalar given  the parameter values.

Your 'par' is a vector with all elements between 0 and 1 and with a sum == 1.
That can't be done with optim but you can simulate the requirements by letting optim work with  a three element vector and defining the fourth value as 
1-sum(first three params). The requirement that all params must lie between 0 and 1 can be met by making 'fn' return a large value when the requirement is not met.

Some code:

SSRE  <- function(parx) {
    par<- c(parx,1-sum(parx))
    if(all(par > 0 & par < 1)) { # parameters meet requirements
        sum((target - (bmat %*% par))^2)  # this is a  linear algebra version of your objective without  the division by xi
# or if you want to divide by target (see above) see below in the benchmark section for comment
#        sum(((target - (bmat %*% par))/target)^2)
    } else 1e7  # penalty for parameters not satisfying constraints
}

SSRE(pstart)

z <- optim(pstart,SSRE)
z
c(z$par, 1-sum(z$par))  # final contributions


Results:

# > z <- optim(pstart,SSRE)
# > z
# $par
# [1] 0.1492113 0.2880078 0.2950191
# 
# $value
# [1] 2.157446e-12
# 
# $counts
# function gradient 
#      108       NA 
# 
# $convergence
# [1] 0
# 
# $message
# NULL

You can also try this:

z <- optim(pstart,SSRE,method="BFGS")
z
c(z$par, 1-sum(z$par))

z <- nlminb(pstart,SSRE)
z
c(z$par, 1-sum(z$par))

If you intend to do run these optimizations many times I wouldn't use optim without specifying the method argument.
See this benchmark.
+           replications=1000, columns=c("test","replications","elapsed","relative"))
                                  test replications elapsed relative
3                 nlminb(pstart, SSRE)         1000   0.829    1.264
1                  optim(pstart, SSRE)         1000   1.647    2.511
2 optim(pstart, SSRE, method = "BFGS")         1000   0.656    1.000


If you use  sum(((target - (bmat %*% par))/target)^2) as objective you'll see different timings. nlminb turns out to be the fastest.

good luck,

Berend
1 day later
#
Dear Berend,

Many thanks for taking your time to assist with this optimization problem.
I'll work on data this week and let you know how I get on.

Again, many thanks

Richard




--
View this message in context: http://r.789695.n4.nabble.com/Optimization-in-R-similar-to-MS-Excel-Solver-tp4646759p4646906.html
Sent from the R help mailing list archive at Nabble.com.
#
Dear Richard,

It is funny. I have to perform the approach of sediment fingerprinting for
my master thesis. Mr. Hasselman gave me the advice to take a closer look
into the limSolve package a few days ago. 
http://cran.r-project.org/web/packages/limSolve/index.html

I guess, the lsei-function of this package is exactly what  you and i are
looking for. I have tried it out today and compared the lsei-results with
the results of the MS-Excel solver. They are the same!
You will find in the manual (Chemtax example), how to define the
constraints, that every unknown is >= 0 and the sum of unknowns is always 1.
http://cran.r-project.org/web/packages/limSolve/limSolve.pdf

Try to get familiar with the lsei-function and if you still need help, i can
send you my code.

Bearing the Monte-Carlo simulation in mind, the package could offer some
nice functions too.

Have a nice day and best regards
Thomas



--
View this message in context: http://r.789695.n4.nabble.com/Optimization-in-R-similar-to-MS-Excel-Solver-tp4646759p4646911.html
Sent from the R help mailing list archive at Nabble.com.
#
On 21-10-2012, at 13:37, Thomas Schu wrote:

            
Excellent idea. I hadn't actually tried limSolve.
Using the data for bmat and target from my previous post, limSolve provides an excellent alternative and is fast(est):
You don't need to define any special functions.

library(limSolve)
lsei(A=bmat,B=target,E=matrix(rep(1,4),ncol=4),F=1,G=matrix(rep(1,4),ncol=4),H=0)

Berend
4 days later
#
Dear Berend and Thomas,

thank you for suggesting the lsei function. I found that the tlsce {BCE}
function also works very well:

library("BCE")
tlsce(A=bmat,B=target)

The limSolve package has an 'xsample' function for generating uncertainty
values via Monte-Carlo simulation, however it only works when specifying the
standard deviation on the target data (B). In my situation I have standard
deviations for the source areas (A) only. Therefore I need to generate the
uncertainty values manually.

I've created a matrix of 1000 randonly distributed numbers for each of the
source area (A) properties:
 
TSCa<-matrix(rnorm(1000, mean=0.03, sd=0.005),ncol=1)
TSMg<-matrix(rnorm(1000, mean=0.0073, sd=0.002),ncol=1)
CBCa<-matrix(rnorm(1000, mean=0.6, sd=0.1),ncol=1)
CBMg<-matrix(rnorm(1000, mean=0.0102, sd=0.005),ncol=1)
RCa<-matrix(rnorm(1000, mean=0.2, sd=0.05),ncol=1)
RMg<-matrix(rnorm(1000, mean=0.0141, sd=0.005),ncol=1)
DCa<-matrix(rnorm(1000, mean=0.35, sd=0.1),ncol=1)
DMg<-matrix(rnorm(1000, mean=0.012, sd=0.004),ncol=1)

DistAll<-cbind(TSCa,TSMg,CBCa,CBMg,RCa,RMg,DCa,DMg)
colnames(DistAll)<-c("TopSoilCa","TopSoilMg","ChannelBankCa","ChannelBankMg","RoadCa","RoadMg","DrainCa","DrainMg")

I now want to run the lsei model again:

lsei(A=bmat,B=target,E=matrix(rep(1,4),ncol=4),F=1,G=matrix(rep(1,4),ncol=4),H=0,
fulloutput=TRUE))

but with A=bmat replaced by the appropriate random values in DistAll.  I
assume I could then use the function "replicate" to then run the model 1000
times to generate uncertainty values, e.g.

replicate(n=1000,lsei(A=bmat,B=target,E=matrix(rep(1,4),ncol=4),F=1,G=matrix(rep(1,4),ncol=4),H=0,
fulloutput=TRUE))

Would anyone be able to help me write a function to replace bmat with new
values from DistAll each time the lsei model is run?


Many thanks in advance

Richard



--
View this message in context: http://r.789695.n4.nabble.com/Optimization-in-R-similar-to-MS-Excel-Solver-tp4646759p4647524.html
Sent from the R help mailing list archive at Nabble.com.
#
On 26-10-2012, at 12:50, Richard James wrote:

            
I'm not sure if you can do that with replicate.
However, the first  matter to be resolved is: how is bmat related to Distall? Which rows/columns should go into a single instance of bmat?
In addition you will need to store the output of lsei somewhere.

So it would be seem to be a good idea to make a function that runs a single instance of lsei  given the input parameters Distall, .... so that it can generate a bmat, run  lsei and return the output in a form usable for further analysis.

When that is available you can start thinking about your simulation.

Berend
#
On 26-10-2012, at 12:50, Richard James wrote:

            
Actually the G and H arguments are wrong. They should be

G=diag(1,4)
H=rep(0,4)

since the weights should be >= 0
Maybe this will do what you need

Nrep <- 10  # fors testing

I think this is what you need

### start code

kRow <- 1
bmat <- matrix(DistAll[kRow,],ncol=4,byrow=FALSE)
bmat
target <- c("Ca in river(%)"=0.33,"Mg in river (%)"=0.0114)
target 

lsei(A=bmat,B=target,E=matrix(rep(1,4),ncol=4),F=1,G=diag(1,4),H=rep(0,4),fulloutput=TRUE)

gen.single <- function(k,Distall,target) {
    bmat <- matrix(DistAll[k,],ncol=4,byrow=FALSE)
    z <- lsei(A=bmat,B=target,E=matrix(rep(1,4),ncol=4),F=1,G=diag(1,4),H=rep(0,4),fulloutput=TRUE)
    # insert tests of  output of lsei to see if all is ok etc.
    z$X # weights
}

set.seed(2001)
t(sapply(1:Nrep, FUN=gen.single,Distall=Distall,target=target))

### end of code


And now you can do whatever you wish with the columns of the output matrix

Berend
#
That solution works very well. 

The only issue is that 'rnorm' occasionally generates negative values which
aren't logical in this situation. 

Is there a way to set a lower limit of zero?



--
View this message in context: http://r.789695.n4.nabble.com/Optimization-in-R-similar-to-MS-Excel-Solver-tp4646759p4647596.html
Sent from the R help mailing list archive at Nabble.com.
#
On 26-10-2012, at 21:41, Richard James wrote:

            
Try another random generator.
Lognormal, uniform, ...
Truncated normal perhaps?

Berend