Skip to content

scaling problems in "optim"

4 messages · kathie, Peter Dalgaard, Spencer Graves

#
Dear R users,

I am trying to figure out the control parameter in "optim,"  especially,
"fnscale" and "parscale."

In the R docu.,

------------------------------------------------------
fnscale

    An overall scaling to be applied to the value of fn and gr during
optimization. If negative, turns the problem into a maximization problem.
Optimization is performed on fn(par)/fnscale.

parscale

    A vector of scaling values for the parameters. Optimization is performed
on par/parscale and these should be comparable in the sense that a unit
change in any element produces about a unit change in the scaled value.
------------------------------------------------------

I cannot understand these two statements.

"Optimization is performed on fn(par)/fnscale." and

"Optimization is performed on par/parscale and these should be comparable in
the sense that a unit change in any element produces about a unit change in
the scaled value."

Would you please explain these things?  

Thank you in advance.

Kathryn Lord
#
kathie wrote:
Well, the gist is that optim is happiest when the function values 
f(beta) are not too large and not too small, and ditto for df/dbeta. You 
may e.g. get convergence issues if your data or your "covariates" are 
Molar concentrations when the actual values are on the order of 
microMolar.  "Covariates" in quotes because this is not linear, but the 
gradient df/dbeta plays the part in the local linearization. So you get 
the opportunity to rescale function values and parameters.
#
ALL: 

      Can anyone explain why optim returns c(0.75, 2) for what I think 
should be the maximum of a bivariate normal density with mean = 1:2? 


KATHIE: 

      Apart from 'optim' giving an answer I don't understand, the 
following should illustrate the use of 'fnscale' and 'parscale' -- while 
perhaps illustrating the need to use these parameters. 

      Hope this helps. 
      Spencer Graves

library(mvtnorm)
(mle2 <- optim(rep(0, 2), dmvnorm, method='CG',
               control=list(fnscale=-10, parscale=c(.3, 3), trace=9),
              mean=1:2, hessian=TRUE, log=TRUE))
  Conjugate gradients function minimizer
Method: Fletcher Reeves
tolerance used in gradient test=3.63798e-12
0 1 0.433788
parameters    0.00000    0.00000
 i> 1 3 0.234892
parameters    0.03000    0.60000

<snip>

i> 100 201 0.187167
parameters    2.46674    0.66667
$par
[1] 0.7352784 2.0000004 # = approximately c(2.46674, 0.66667) * parscale

$value
[1] -1.871671 # = 0.187167 * fnscale

$counts
function gradient
     201      101

$convergence
[1] 1

$message
NULL

$hessian
             [,1]         [,2]
[1,] -1.00000e+00 -9.94586e-11
[2,] -9.94586e-11 -1.00000e+00

# Checks:  
 > dmvnorm(c( 0.7352784, 2.0000004), mean=1:2, log=TRUE)
[1] -1.872916
 > dmvnorm(1:2, mean=1:2, log=TRUE)
[1] -1.837877
kathie wrote:
#
p.s.  With proper scaling, 'optim' gives the correct answer in this case: 

(mle2 <- optim(rep(0, 2), dmvnorm, method='CG',
               control=list(fnscale=-10, trace=9),
              mean=1:2, hessian=TRUE, log=TRUE))
$par
[1] 0.9999917 1.9999833

      However, as noted below, with parscale = c(0.3, 3), only a 
difference of a factor of 9 in the scaling between the two parameters, 
the first parameter = 0.735 rather than 1;  the second is identified 
correctly. 

      Spencer
Spencer Graves wrote: