Skip to content
Prev 307477 / 398506 Next

problem with convergence in mle2/optim function

Dear R help,

Thanks again for the responses.  I increased the lower constraint to:

lower = list(p1 = 0.0001, p2 = 0.0001, mu1 = 0.0001, mu2 = 0.0001).

I also included an upper box constraint of:

upper = list(p1 = Inf, p2 = Inf, mu1 = p1t, mu2 = p2t).

Making these changes improved the rate of convergence among stochastic 
simulation runs, but I still had convergence problems.

I found success in switching from mle2/optim to spg (BB package).  So 
far, spg has produced similarly precise estimates as L-BFGS-B and 
consistently provides parameter estimates.

If anyone is interested, here is the new objective function and spg 
call, instead of my previous objective function and mle2 call.  All 
other parts of my reproducible code are the same as I've previously 
supplied:

######################################################################
library(BB)

# Objective function for spg()
NLL2 <- function(par, y){
   p1 <- par[1]
   p2 <- par[2]
   mu1 <- par[3]
   mu2 <- par[4]
   t <- y$tv
   n1 <- y$n1
   n2 <- y$n2
   n3 <- y$n3
   P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
     mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))) -
     exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
     mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
     2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
     exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))) 

   P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
     mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))) -
     exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
     mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
     2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
     exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
     4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
     sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
   P3 <- 1 - P1 - P2
   p.all <- c(P1, P2, P3)
   #cat("NLL.free p.all {P1,P2,P3}\n")
   #print(matrix(p.all, ncol=3))
   -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE))
}

par <- c(p1t, p2t, mu1t, mu2t)

spg.fit <- spg(par = par, fn = NLL2, y = yt,
             lower = c(0.001, 0.001, 0.001, 0.001),
             control = list(maxit = 5000))

########################################################################

My next problem is that spg takes about twice as long as L-BFGS-B to 
converge.  The spg help file strongly suggests the use of an exact 
gradient function to improve speed.  But I am having trouble writing a 
gradient function.  Here is what I have so far:

I derived the gradient function by taking the derivative of my NLL 
equation with respect to each parameter.  My NLL equation is the 
probability mass function of the trinomial distribution.  Here is some 
reproducible code:

#########################################################################
library(Ryacas)

p1 <- Sym("p1"); p2 <- Sym("p2"); mu1 <- Sym("mu1"); mu2 <- Sym("mu2")
t <- Sym("t"); n1 <- Sym("n1"); n2 <- Sym("n2"); n3 <- Sym("n3")

P1.symb <- ((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
   mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))) -
   exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
   mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
   2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
   exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))

P2.symb <- ((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
   mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))) -
   exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
   mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
   2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
   exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
   4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
   sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))

P3.symb <- 1 - P1.symb - P2.symb

# gradient equation with respect to parameter p1 for probability mass
# function of the trinomial distribution with probabilities P1, P2, P3

gr.p1 <- deriv(log(P1.symb^n1), p1) + deriv(log(P2.symb^n2), p1) + 
deriv(log(P3.symb^n3), p1)

######################################################################

gr.p1 is very large equation, which I won't reproduce here. Let's say 
that the four gradient equations for the four parameters are defined as 
gr.p1, gr.p2, gr.mu1, gr.mu2, and all are derived as described above for 
gr.p1.  These gradient equations are functions of p1, p2, mu1, mu2, t, 
n1, n2, and n3.  My current gradient function is:

grr <- function(par, y){
   p1 <- par[1]
   p2 <- par[2]
   mu1 <- par[3]
   mu2 <- par[4]
   t <- y[,1]
   n1 <- y[,2]
   n2 <- y[,3]
   n3 <- y[,4]
   gr.p1 <- ....
   gr.p2 <- ....
   gr.mu1 <- ....
   gr.mu2 <- ....
   c(gr.p1, gr.p2, gr.mu1, gr.mu2)
}

The problem is that I need to supply values for t, n1, n2, and n3 to the 
gradient function, which are from the dataset yt, above.  When I supply 
the dataset yt, the grr function produces a vector of length 4*nrow(yt) 
= 80.  However, spg requires a gradient function that returns a vector 
of length(par) = 4.  When I include this gradient function in my spg 
function, I get an error that the gradient function is incorrect.

Does anyone have any suggestions on how to write my gradient function? 
Am I calculating the gradient equation, gr.p1 incorrectly?  As always, 
any help would be much appreciated.

Adam Zeilinger
On 10/8/2012 3:44 AM, Berend Hasselman wrote: