Skip to content

Parameter scaling problems with optim and Nelder-Mead method (bug?)

4 messages · Karl Ove Hufthammer, Bert Gunter

#
Dear all,

I?m having some problems getting optim with method="Nelder-Mead" to work
properly. It seems like there is no way of controlling the step size,
and the step size seems to depend on the *difference* between the
initial values, which makes no sense. Example:

    f=function(xy, mu1, mu2) {
      print(xy)
      dnorm(xy[1]-mu1)*dnorm(xy[2]-mu2)
    }
    f1=function(xy) -f(xy, 0, 0)
    optim(c(1,1), f1)

The first four values evaluated are

    1.0, 1.0
    1.1, 1.0
    1.0, 1.1
    0.9, 1.1

which is reasonable (step size of 0.1) for this function. And if I
translate both the function and the initial values

    f2=function(xy) -f(xy, 5000, 5000)
    optim(c(5001,5001), f2)

the first four values are

    5001.0, 5001.0
    5501.1, 5001.0
    5001.0, 5501.1
    4500.9, 5501.1

With

    f3=function(xy) -f(xy, 0, 5000)
    optim(c(1,5001), f3)

they are

       1.0, 5001.0
     501.1, 5001.0
       1.0, 5501.1
    -499.1, 5501.1
      
and with

    f4=function(xy) -f(xy, -3000, 50000)
    optim(c(-2999,50001), f4)
    
    -2999.0, 50001.0
     2001.1, 50001.0
    -2999.0, 55001.1
    -7999.1, 55001.1
  
However, the function to optimise is the same in all cases, only
translated, not scaled, so the step size *should* be the same. From
reading the documentation, it looks like changing the parscale should
work, and *relative* changes have the intended effect. Example:

    optim(c(1,1), f1, control=list(parscale=c(1,5)))

gives the function evaluations

    1.0, 1.0
    1.1, 1.0
    1.0, 1.5
    1.1, 0.5

But changing both values, e.g.,

   optim(c(1,1), f1, control=list(parscale=c(500,500)))
   
gives the same first four values. There *are* eventually some
differences in the values tried, but these don?t seem to correspond to
parscale as described in ?optim. For example, for parscale=c(1,1), the
parameter values tried are

1: 1, 1
2: 1.1, 1
3: 1, 1.1
4: 0.9, 1.1
5: 0.95, 1.075
6: 0.9, 1
7: 0.85, 0.95
8: 0.95, 0.85
9: 0.9375, 0.9125
10: 0.8, 0.8
11: 0.7, 0.7
12: 0.8, 0.6
13: 0.8125, 0.6875
14: 0.55, 0.45

while for parscale=c(500,500) they are

1: 1, 1
2: 1.1, 1
3: 1, 1.1
4: 0.9, 1.1
5: 0.95, 1.075
6: 0.9, 1
7: 0.85, 0.95
8: 0.95, 0.85
9: 0.975, 0.725
10: 0.825, 0.675
11: 0.7375, 0.5125
12: 0.8625, 0.2875
13: 0.859375, 0.453125
14: 0.625000000000001, 0.0750000000000004
   
for parscale=1/c(50000,50000) they are

1: 1, 1
2: 1.1, 1
3: 1, 1.1
4: 0.9, 1.1
5: 0.95, 1.075
6: 0.9, 1
7: 0.85, 0.95
8: 0.95, 0.85
9: 0.9375, 0.9125
10: 0.8, 0.8
11: 0.7, 0.7
12: 0.8, 0.6
13: 0.8125, 0.6875
14: 0.55, 0.45

And there seems to be no way of actually changing the step size to
reasonable values (i.e., the same values for optimising f1?f4).

Is there something I have missed in how one is supposed to use optim
with Nelder-Mead? Or is this actually a bug in the implementation?


$ sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-suse-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=nn_NO.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=nn_NO.UTF-8        LC_COLLATE=nn_NO.UTF-8    
 [5] LC_MONETARY=nn_NO.UTF-8    LC_MESSAGES=nn_NO.UTF-8   
 [7] LC_PAPER=C                 LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
#
Well, I'm no optimization guru, but a quick reading of Wikipedia said
tha step size depends on the initial value configuration and is then
"adjusted" by the algorithm using alpha, beta and gamma scaling
parameters thru the optimization. So it seems that it is supposed to
work exactly as you describe. Why do you expect something else?

-- Bert
On Sat, Aug 18, 2012 at 2:30 AM, Karl Ove Hufthammer <karl at huftis.org> wrote:

  
    
#
You?re right that the step size should be effectively adjusted using
alpha, beta and gamma in later iterations, but the problem is that the
values used for the first simplex generated depends on the differences
between the initial values, which makes no sense, as this make
optimisation problem not invariant to translations.

Here?s an analogy. Think of the function to maximise as a mountain
placed somewhere on Earth. If you start 1 km east and 1 km north of the
mountain, and try to find its peak, the values you sample *relative to
the peak?s position* should not depend on whether the mountain is
situated on Equator, in Australia or in North America, as long as the
actual mountain is identical (i.e., there is no *scaling* of the
function, only a translation). But for optim with method="Nelder-Mead"
they seem to do so.

Also, the values of parscale seem to have a rather mysterious effect on
the values chosen for later iterations, while their (absolute) values
seems to have *no* effect on the initial simplex (but their relative
values do have an effect, and a correct effect, AFAICS).


Karl Ove Hufthammer



la. den 18. 08. 2012 klokka 07.32 (-0700) skreiv Bert Gunter:
#
Karl:

My ignorance of optimization makes any further comments hazardous.
Indeed, my initial reply may already have gone too far, as I my not
either understand you or NM. So I'm just going to shut up.

-- Bert
On Sat, Aug 18, 2012 at 9:33 AM, Karl Ove Hufthammer <karl at huftis.org> wrote: