Skip to content

Excel can do what R can't?????

10 messages · Spencer Graves, Jerome Asselin, Michael Rennie +2 more

#
I've programmed many things like this in both Excel and R.  When I 
did not get the same answer from both, it was because I had an error in 
one (or both).  I do this routinely as part of debugging:  I catch many 
mistakes this way, and I often feel I can not trust my answers without 
this level of checking.

	  I use Excel with Solver if I only need one solution or if I'm working 
with someone who doesn't have R or S-Plus.   Otherwise, I prefer S-Plus 
of R.

	  First forget about "optim":  Do you get the same numbers from your 
function "f" and from Excel?  Have you plotted the function to be sure 
you even have local minima?  Naively, I would expect Excel to be more 
likely to get stuck in local minima than "optim".

	  I'm sorry you've had such a frustrating experience with R.  The S 
language is very powerful but does have a steep learning curve.  I had 
S-Plus for 3-5 years before I was finally able to do anything useful 
with it.  Now, it is an integral part of how I do much of what I do.

hope this helps.
spencer graves
Michael Rennie wrote:
#
Mike,

The definition of your function f() seems quite inefficient. You could 
vectorize it, which would shorten and speed up your code, especially if M 
is large. See the R introduction file available online to learn how to do 
it if you don't already know how. Also, you have to return only one 
argument. Unless I'm wrong, your function wants to return Wtmod, Hgtmod, q 
and f. I'm don't think this would change anything in this case, but you 
should definitely clean this up!

Another advice... If you can simplify your example into a few lines of 
"ready-to-execute" code with a toy dataset, then it's easy for everyone to 
try it out and you can get more feedback. The code you've included is 
quite large and cumbersome. For one thing, you could easily have removed 
the lines of code that were commented out.

Meanwhile, I would suggest that you go back to the basics of R to clean up 
your code.

Sorry I can't be more helpful.
Jerome
On July 15, 2003 10:46 am, Michael Rennie wrote:
#
At 11:47 AM 7/15/03 -0700, Jerome Asselin wrote:

            
Hi, Jerome

I don;t think I can vectorize it, since in the iteration loop, the value 
for each [i] is dependent on the value of [i-1], so I require the loop to 
go through each [i] before I can get my values for any particular vector 
(variable).  I actually had my program operating this way in the first 
place, but I get all sorts of warnings and the 'optim' function especially 
doesn't seem to appreciate it.
The calls to Wtmod, q, and Hgtmod are all just residual from the 
development of the loop inside function f.  I would like to get the last 
line of 'bioday' reported from within the loop, had I figured out the 
optimization, but that point is rather moot unless I can get the 
optimization functioning.
Thanks for the advice- every bit helps if I eventually get this thing to 
work.....

Mike
Michael Rennie
M.Sc. Candidate
University of Toronto at Mississauga
3359 Mississauga Rd. N.
Mississauga, ON  L5L 1C6
Ph: 905-828-5452  Fax: 905-828-3792
#
I thought that you could simplify your code by using something like 
c(0,W[-length(W)]) as opposed to W[i-1] in a loop, but now I understand 
it's not that easy. Unless you can analytically simplify the calculation 
of W in order to vectorize it, it's going to be slow.

However, many of the lines don't depend on [i] and not on [i-1]. Therefore 
you could simplify those as they don't need to be calculated within the 
loop.

HTH,
Jerome
On July 15, 2003 01:24 pm, Michael Rennie wrote:
#
The phrase:

    f <- 1000000000*(((((Wt-Wtmod)^2)/Wt) + (((Hgt-Hgtmod)^2)/Hgt))2) ; f

is an immediate computation, not a function.  If you want a function, 
try something like the following:

    f <- function(x){
	  Wt <- x[1]
	  Wtmod <- x[2]
	  Hgt <- x[3]
	  Hgtmod <- x[4]
      1000000000*(((((Wt-Wtmod)^2)/Wt) + (((Hgt-Hgtmod)^2)/Hgt))2)
    }

OR

    f <- function(x, X){
	  Wt <- X[,1]
	  Hgt <- X[,2]
	  Wtmod <- x[1]
	  Hgtmod <- x[2]
	1000000000*(((((Wt-Wtmod)^2)/Wt) + (((Hgt-Hgtmod)^2)/Hgt))2)
    }

"par" in "optim" is the starting values for "x".  Pass "X" to "f" via 
"..." in the call to "optim".

	  If you can't make this work, please submit a toy example with the 
code and error messages.  Please limit your example to 3 observations, 
preferably whole numbers so someone else can read your question in 
seconds.  If it is any longer than that, it should be ignored.

hope this helps.
Spencer Graves
M.Kondrin wrote:
#
Hi, Spencer

I know I submitted a beastly ammount of code, but I'm not sure how to simplify 
it much further, and still sucessfully address the problem that i am having.  
The reason being is that the funciton begins

f<- function (q) 

At the top of the iterative loop.  This is what takes q and generates Wtmod, 
Hgtmod at the end of the iterative loop. the assignment to f occurs at the 
bottom of the iterative loop. So, yes, the call to f is performing an immediate 
computation, but based on arguments that are coming out of the iterative loop 
above it, arguments which depend on q<-(p, ACT).  Maybe this is the problem; 
I've got too much going on between my function defenition and it's assignment, 
but I don't know how to get around it.

So, I'm not sure if your example will work- the output from the iterative 
process is Wtmod, Hgtmod, and I want to minimize the difference between them 
and my observed endpoints (Wt, Hgt).  The numbers I am varying to reach this 
optimization are in the iterative loop (p, ACT), so re-defining these outputs 
as x's and getting it to vary these doesn't do me much good unless they are 
directly linked to the output of the iterative loop above it.  

Last, it's not even that I'm getting error messages anymore- I just can't get 
the solution that I get from Excel.  If I try to let R find the solution, and 
give it starting values of c(1,2), it gives me an optimization sulution, but an 
extremely poor one.  However, if I give it the answer I got from excel, it 
comes right back with the same answer and solutions I get from excel.  

Using the 'trace' function, I can see that R gets stuck in a specific region of 
parameter space in looking for the optimization and just appears to give up.  
Even when it re-set itself, it keeps going back to this region, and thus 
doesn't even try a full range of the parameter space I've defined before it 
stops and gives me the wrong answer.

I can try cleaning up the code and see if I can re-submit it, but what I am 
trying to program is so parameter heavy that 90% of it is just defining these 
at the top of the file.

Thank you for the suggestions,

Mike
  

Quoting Spencer Graves <spencer.graves at PDF.COM>:

  
    
#
1.  If I have an answer that works, I typically go to something else.

2.  If your Excel solution is still inadequate, have you considered 
trying to modularize your function "f", so "f" is 5-10 lines, several of 
which call other functions, f1, f2, ..., f6, say, and each of these do a 
piece of the computations that can be checked by comparison with 
intermediate results in Excel.

hope this helps.  spencer graves
Michael Rennie wrote:
#
optim(par, fn, gr = NULL,
            method = c("Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN"),
            lower = -Inf, upper = Inf,
            control = list(), hessian = FALSE, ...)

.....
       fn: A function to be minimized (or maximized), with first
           argument the vector of parameters over which minimization is
           to take place. It should return a scalar result.

Your fn defined as:
f <- 1000000000*(((((Wt-Wtmod)^2)/Wt) + (((Hgt-Hgtmod)^2)/Hgt))2) ; f
What is its first argument I wonder?
I think you have just an ill-defined R function (although for Excel it 
may be OK - do not know) and optim just chokes on it.
#
Michael Rennie wrote:
1. Either your function or the Excel solver is wrong. I executed your
source code (which defines f), then ran it over a grid of points, and
plotted the answer, using this code:

xvals <- seq(.2,2,by=.2)
yvals <- seq(1,3,by=.2)
z <- matrix(NA,nrow=length(xvals),ncol=length(yvals))
for (i in 1:length(xvals)) for (j in 1:length(yvals)) {
  x <- xvals[i]
  y <- yvals[j]
  z[i,j] <- f(c(x,y))
  }
filled.contour(x=xvals,y=yvals,z=log(z))

Your "solution" from Excel evaluates to
  f(c(.558626306252032,1.66764519286918)) == 0.3866079
while I easily found a point which was much better,
  f(c(.4,1)) = 7.83029e-05

You should have tried executing your function over a grid of points, and
plotting the results in a contour plot, to see if optim was working
sensibly. You could do the same grid in Excel and R to verify that the
function you've defined does the same thing in each.

Since your optimization is only over a 2D parameter space, it is easy for
you to plot the results, to see at a glance what the optimum is, and to
work out what is going wrong.

2. Your code executes very slowly because it is programmed inefficiently.
You need to iterate a function to get your final solution, but you don't
need to keep track of all the states you visit on the way. The way R
works, whenever you assign a value to a certain index in a vector, as in 
  A[i] <- 10,
the system actually copies the entire vector. So, in every iteration, you
are copying very many vectors, and this is needlessly slowing down the
program. Also, at the end of each iteration, you define
  bio <- cbind(W, C, ASMR, SMR, A, F, U, SDA, Gr, Ed, GHg, EGK, Hg)
which creates a matrix. But you only ever use this matrix right at the
end, and so there is no need to create this 365*14 matrix at every single
iteration.

It looks to me as if you took some Excel code and translated it directly
into R. This will not produce efficient R code. Your iterative loop would
be more naturally expressed in R as

f <- function(q) {
  p <- q[[1]]
  ACT <- q[[2]]
  # cat(paste("Trying p=",p," ACT=",ACT,"\n",sep=""))
  state <- c(W=Wo,Hg=Hgo)
  numdays <- length(temps)
  for (i in 1:numdays)
    state <- updateState(state,
                         jday=temps$jday[i],temp=temps$Temp[i],M=numdays,
                         p=p,ACT=ACT)
  Wtmod <- state[["W"]]
  Hgtmod <- state[["Hg"]]
  (Wt-Wtmod)^2/Wt + (Hgt-Hgtmod)^2/Hgt
  }

updateState <- function(state,jday,temp,M,p,ACT) {
  # Given W[i-1] and Hg[i-1], want to compute W[i] and Hg[i]
  W <- state[["W"]]
  Hg <- state[["Hg"]]
  # First compute certain parameters: Vc[i-1] ... Expegk[i-1]
  Vc <- (CTM-temp)/(CTM-CTO)
  Vr <- (RTM-temp)/(RTM-RTO)
  C <-      p * CA * W^CB * Vc^Xc * exp(Xc*(1-Vc)) * Pc
  ASMR <- ACT * RA * W^RB * Vr^Xa * exp(Xa*(1-Vr))
  ...
  # Now find W[i] and Hg[i]
  Wnew <- if (!(jday==121 && Mat==1)) W+Gr/Ef
          else                        W * (1-GSI*1.2)
  Hgnew <- a*Hgp*C*(1-Expegk)/(Pc*W*EGK) + Hg*Expegk
  c(W=Wnew,Hg=Hgnew)
  }

In this code, I do not attempt to keep the entire array in memory. All I
need to know at each iteration is the value of state=(W,Hg) at time i-1,
and from this I compute the new value at time i.

3. You use some thoroughly weird code to read in a table. You should add a
row to the top of your table with variable names, then just use
  temps <- read.table("TEMP.DAT", header=TRUE)
  temps$Vc <- (CTM-temps$temp)/(CTM-CTO)
This would also avoid leaving global variables (like Day) hanging around
the place. Global variables cause confusion: see the next point.

4. Here are some lines taken from your code.

p <- NULL
ACT <- NULL

#starting values for p, ACT
p <- 1
ACT <- 2

f <- function (q)
  {
  F[i]<- (FA*((comp[i,3])^FB)*(exp(FG*p))*C[i])
  # (and ACT is never referred to)
  }

Why did you define p<-NULL and ACT<-NULL at the top? Those definitions are
irrelevant, because they are overridden by p<-1 and ACT<-2.

In the body of your function f, in defining F[i], you refer to the
variable p. The only assignment to p is in the line p<-1. I strongly
suspect this is an error. Probably you want to refer to q[1]. The best way
to do this (as you can see from my code above) is to define p and ACT at
the beginning of f.

5. Some minor comments on code. It's unwise to use T or F as variable
names in R, because of the potential for confusion with S-Plus, which uses
them for TRUE and False. Also, you don't need all those brackets: A*(B*C)
is the same as A*B*C, and ((A/B)/C) is more transparently written as
A/(B*C). Also, you should indent your code, since otherwise you'll just
confuse yourself and other people.

6. I've written a version of the code which takes all these comments into
account. It doesn't agree with your Excel solution. You haven't given us
enough real data for me to work out if there's a bug in my code or if the
Excel solution is wrong. Once you have worked out a function f which you
know to be correct (checked by drawing a contour plot), if you have any
more problems, share it with us and we may be able to help. 

Damon Wischik.