scaling problems in "optim" - R-help

kathie

Sun, Mar 23, 2008 12:27 AM #

Dear R users,

I am trying to figure out the control parameter in "optim,"  especially,
"fnscale" and "parscale."

In the R docu.,

------------------------------------------------------
fnscale

    An overall scaling to be applied to the value of fn and gr during
optimization. If negative, turns the problem into a maximization problem.
Optimization is performed on fn(par)/fnscale.

parscale

    A vector of scaling values for the parameters. Optimization is performed
on par/parscale and these should be comparable in the sense that a unit
change in any element produces about a unit change in the scaled value.
------------------------------------------------------

I cannot understand these two statements.

"Optimization is performed on fn(par)/fnscale." and

"Optimization is performed on par/parscale and these should be comparable in
the sense that a unit change in any element produces about a unit change in
the scaled value."

Would you please explain these things?  

Thank you in advance.

Kathryn Lord

View this message in context: http://www.nabble.com/scaling-problems-in-%22optim%22-tp16232185p16232185.html
Sent from the R help mailing list archive at Nabble.com.

Peter Dalgaard

Sun, Mar 23, 2008 3:35 AM #

kathie wrote:

Well, the gist is that optim is happiest when the function values 
f(beta) are not too large and not too small, and ditto for df/dbeta. You 
may e.g. get convergence issues if your data or your "covariates" are 
Molar concentrations when the actual values are on the order of 
microMolar.  "Covariates" in quotes because this is not linear, but the 
gradient df/dbeta plays the part in the local linearization. So you get 
the opportunity to rescale function values and parameters.

O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark      Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)              FAX: (+45) 35327907

Spencer Graves

Sun, Mar 23, 2008 10:05 AM #

ALL: 

      Can anyone explain why optim returns c(0.75, 2) for what I think 
should be the maximum of a bivariate normal density with mean = 1:2? 


KATHIE: 

      Apart from 'optim' giving an answer I don't understand, the 
following should illustrate the use of 'fnscale' and 'parscale' -- while 
perhaps illustrating the need to use these parameters. 

      Hope this helps. 
      Spencer Graves

library(mvtnorm)
(mle2 <- optim(rep(0, 2), dmvnorm, method='CG',
               control=list(fnscale=-10, parscale=c(.3, 3), trace=9),
              mean=1:2, hessian=TRUE, log=TRUE))
  Conjugate gradients function minimizer
Method: Fletcher Reeves
tolerance used in gradient test=3.63798e-12
0 1 0.433788
parameters    0.00000    0.00000
 i> 1 3 0.234892
parameters    0.03000    0.60000

<snip>

i> 100 201 0.187167
parameters    2.46674    0.66667
$par
[1] 0.7352784 2.0000004 # = approximately c(2.46674, 0.66667) * parscale

$value
[1] -1.871671 # = 0.187167 * fnscale

$counts
function gradient
     201      101

$convergence
[1] 1

$message
NULL

$hessian
             [,1]         [,2]
[1,] -1.00000e+00 -9.94586e-11
[2,] -9.94586e-11 -1.00000e+00

# Checks:  
 > dmvnorm(c( 0.7352784, 2.0000004), mean=1:2, log=TRUE)
[1] -1.872916
 > dmvnorm(1:2, mean=1:2, log=TRUE)
[1] -1.837877

kathie wrote:

Spencer Graves

Sun, Mar 23, 2008 10:17 AM #

p.s.  With proper scaling, 'optim' gives the correct answer in this case: 

(mle2 <- optim(rep(0, 2), dmvnorm, method='CG',
               control=list(fnscale=-10, trace=9),
              mean=1:2, hessian=TRUE, log=TRUE))
$par
[1] 0.9999917 1.9999833

      However, as noted below, with parscale = c(0.3, 3), only a 
difference of a factor of 9 in the scaling between the two parameters, 
the first parameter = 0.735 rather than 1;  the second is identified 
correctly. 

      Spencer

Spencer Graves wrote:

ALL: 

      Can anyone explain why optim returns c(0.75, 2) for what I think 
should be the maximum of a bivariate normal density with mean = 1:2? 


KATHIE: 

      Apart from 'optim' giving an answer I don't understand, the 
following should illustrate the use of 'fnscale' and 'parscale' -- while 
perhaps illustrating the need to use these parameters. 

      Hope this helps. 
      Spencer Graves

library(mvtnorm)
(mle2 <- optim(rep(0, 2), dmvnorm, method='CG',
               control=list(fnscale=-10, parscale=c(.3, 3), trace=9),
              mean=1:2, hessian=TRUE, log=TRUE))
  Conjugate gradients function minimizer
Method: Fletcher Reeves
tolerance used in gradient test=3.63798e-12
0 1 0.433788
parameters    0.00000    0.00000
 i> 1 3 0.234892
parameters    0.03000    0.60000

<snip>

i> 100 201 0.187167
parameters    2.46674    0.66667
$par
[1] 0.7352784 2.0000004 # = approximately c(2.46674, 0.66667) * parscale

$value
[1] -1.871671 # = 0.187167 * fnscale

$counts
function gradient
     201      101

$convergence
[1] 1

$message
NULL

$hessian
             [,1]         [,2]
[1,] -1.00000e+00 -9.94586e-11
[2,] -9.94586e-11 -1.00000e+00

# Checks:

 > dmvnorm(c( 0.7352784, 2.0000004), mean=1:2, log=TRUE)

[1] -1.872916

 > dmvnorm(1:2, mean=1:2, log=TRUE)

[1] -1.837877
 

kathie wrote:

Dear R users,

I am trying to figure out the control parameter in "optim,"  especially,
"fnscale" and "parscale."

In the R docu.,

------------------------------------------------------
fnscale

    An overall scaling to be applied to the value of fn and gr during
optimization. If negative, turns the problem into a maximization problem.
Optimization is performed on fn(par)/fnscale.

parscale

    A vector of scaling values for the parameters. Optimization is performed
on par/parscale and these should be comparable in the sense that a unit
change in any element produces about a unit change in the scaled value.
------------------------------------------------------

I cannot understand these two statements.

"Optimization is performed on fn(par)/fnscale." and

"Optimization is performed on par/parscale and these should be comparable in
the sense that a unit change in any element produces about a unit change in
the scaled value."

Would you please explain these things?  

Thank you in advance.

Kathryn Lord

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.