Preview only show first 10 pages with watermark. For full document please download

Minimizing Multimodal Functions Of Continuous Variables With The “simulated Annealing” Algorithm Corrigenda For This Article Is Available Here

Minimizing multimodal functions of continuous variables with the “simulated annealing” algorithm Corrigenda for this article is available here

   EMBED


Share

Transcript

  Minimizing Multimodal Functions ofContinuous Variables with the ‘SimulatedAnnealing” Algorithm A. CORANA, M. MARCHESI, C. MARTINI, and S. RIDELLAlstituto per i Circuiti Elettronici-C.N.R. A new global optimization algorithm for functions of continuous variables is presented, derived fromthe “Simulated Annealing” algorithm recently introduced in combinatorial optimization.The algorithm is essentially an iterative random search procedure with adaptive moves along thecoordinate directions. It permits uphill moves under the control of a probabilistic criterion, thustending to avoid the first local minima encountered.The algorithm has been tested against the Nelder and Mead simplex method and against a versionof Adaptive Random Search. The test functions were Rosenbrock valleys and multiminima functionsin 2,4, and 10 dimensions.The new method proved to be more reliable than the others, being always able to find the optimum,or at least a point very close to it. It is quite costly in term of function evaluations, but its cost canbe predicted in advance, depending only slightly on the starting point.Categories and subject descriptors: G.1.6. [Numerical Analysis]: Optimization-nonlinear pro-gramming; G.3 [Mathematics of Computing]: Probability and Statistics-probabilistic algorithms(including Monte Carlo); G.4. [Mathematics of Computing]: Mathematical Software-certificationand testingGeneral Terms: Algorithms, PerformanceAdditional Key Words and Phrases: Global optimization, stochastic optimization, test functions 1. INTRODUCTION The problem of determining the position in n-space of the minimum of a givenfunction of n variables has been tackled using nonlinear programming methodsfor many years, in practice since digital computers have been available.If the cost function is unimodal in the domain of interest, one can chooseamong many good alternatives. Some of the available minimization algorithms(direct methods) involve only function evaluations, such as those of Rosenbrock[ 161, Hooke and Jeeves [7], and Nelder and Mead [l, 111. Others also use theevaluation of the derivatives of the cost function [4, 5, 171 and are considered tobe more efficient than direct methods; but they are also more complicated and Authors’ address: Istituto per i Circuiti Elettronici-C.N.R, via all’ Opera Pia, 11-16145 Genova, Italy.Permission to copy without fee all or part of this material is granted provided that the copies are notmade or distributed for direct commercial advantage, the ACM copyright notice and the title of thepublication and its date appear, and notice is given that copying is by permission of the Associationfor Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specificpermission.0 1987 ACM 0098-3500/87/0900-0262 $01.50ACM Transactions on Mathematical Software, Vol. 13, No. 3, September 1987, Pages 262-280.  Minimizing Multi-Modal Functions of Continuous Variable263 more inclined to terminate far from the minimum if the cost function is very ill-conditioned.However, if the cost function is multimodal within the domain of interest, thenumber of available algorithms is reduced to very few. In this case, the algorithmsquoted above tend to stop at the first minimum encountered, and cannot be usedeasily for finding the global one. Since a systematic search in the functiondomain, in nontrivial cases, requires so many function evaluations as to beimpracticable, one must rely on a limited search, either systematic or random.A simple and widely used technique is to generate a given number of differentpoints inside the function domain, performing unimodal searches starting fromeach of them, and retaining the best result.Other methods for random global optimization are reported in [2,3,9, 13, 141.Under mild conditions on the test functions, these stochastic methods guaranteeasymptotic convergence to the global optimum as the number of sample pointsincreases. All of these techniques are efficient in the case of functions with a fewlocal minima; but, in practice, many optimization problems deal with a largenumber of variables (up to tens or hundreds) and/or a very large number of localminima that is often an increasing function of the number of variables. In thissituation, traditional methods offer low efficiency and limited reliability.Recently, a global optimization algorithm called Simulated Annealing (SA) [8]has been proposed in the area of combinatorial optimization, that is, when thecost function is defined in a discrete domain. This method is reported to performwell in the presence of a very high number of variables (even tens of thousands)[8, 15, 181. It is based on random evaluations of the cost function, in such a waythat transitions out of a local minimum are possible. It does not guarantee, ofcourse, to find the global minimum, but if the function has many good near-optimal solutions, it should find one. In particular, this method is able todiscriminate between “gross behavior” of the function and finer “wrinkles.” First,it reaches an area in the function domain where a global minimum should bepresent, following the gross behavior irrespectively of small local minima foundon the way. It then develops finer details, finding a good, near-optimal localminimum, if not the global minimum itself.In the present work we propose some modifications to this algorithm, in orderto apply it to the optimization of functions defined in a continuous domain. It isworth noting that these functions do not need to be smooth or even continuousin their domain. The algorithm is characterized by the need for a number offunction evaluations, usually about three orders-of-magnitude greater than thatcommonly required for a single run of unimodal algorithms. Nevertheless, in thecase of very ill-conditioned cost functions with thousands or millions of localminima, such big computational effort leads to better results than the random orsystematic use of a traditional optimization algorithm for as many times as ittakes to make the total number of function evaluations equal to SA. Moreover,the cost of automatic computation is becoming lower and lower, and suchcomputational tasks are becoming affordable.In Section 3, we introduce some multimodal test functions that are difficult tominimize, having a number of local minima greater than 10” in their domain ofinterest. In Section 4, our SA algorithm is tested against the Nelder and Mead ACM Transactions on Mathematical Software, Vol. 13, No. 3, September 1987.  264 l A. Corana, M. Marchesi, C. Martini, and S. Ridellasimplex method and against the “Adaptive Random Search” stochastic method.The test functions used are both the traditional Rosenbrock valleys and themultimodal functions quoted above. The results obtained are critically analyzed,and suggestions are made for future research.2. METHODLet x be a vector in R” and (x1, x2, . . . ,x,) its components. Let f(x) be thefunction to minimize and let al < x1 < bl, . . . , a, < n, < b, be its n variables,each ranging in a finite, continuous interval. f does not need to be continuousbut it must be bounded.Our SA algorithm is schematically shown in Figure 1. It proceeds iteratively:Starting from a given point x0, it generates a succession of points: x0, xl, . . . , Xi, e *tending to the global minimum of the cost function. New candidate pointsare generated around the current point xi applying random moves along eachcoordinate direction, in turn. The new coordinate values are uniformly distributedin intervals centered around the corresponding coordinate of xi. Half the size ofthese intervals along each coordinate is recorded in the step vector v. If the pointfalls outside the definition domain off, a new point is randomly generated untila point belonging to the definition domain is found. A candidate point x ’ isaccepted or rejected according to the Metropolis criterion [lo]:If Af a 0, then accept the new point: Xi+1 = X’ else accept the new point with probability: p(Af 1 = exd-Af/T) where Af = f (x’) - f (xi) and T is a parameter called temperature.At a fixed value of T the succession of points x0, x1, . . . , xi, . . . is not downhill,except when T = 0. For values of T large compared to the mean value of ] f (xh)- f (xk) ] (xh and xk are points randomly chosen inside the definition domain of f) almost all new points are accepted and the succession is a random sampling off. The SA algorithm starts at some “high” temperature To given by the user. Asequence of points is then generated until a sort of “equilibrium” is approached;that is a sequence of points xi whose average value off reaches a stable value asi increases. During this phase the step vector v, is periodically adjusted to betterfollow the function behavior. The best point reached is recorded as xopt.After thermal equilibration, the temperature T is reduced and a new sequenceof moves is made starting from x,,,,~, until thermal equilibrium is reached again,and so on.The process is stopped at a temperature low enough that no more usefulimprovement can be expected, according to a stopping criterion that we willdescribe later.The SA optimization algorithm can be considered analogous to the physicalprocess by which a material changes state while minimizing its energy [8, 181. Aslow, careful cooling brings the material to a highly ordered, crystalline state oflowest energy. A rapid cooling instead yields defects and glass-like intrusionsinside the material.From an optimization point of view, an iterative search accepting only newpoints with lowest function values is like rapidly quenching a physical system at ACM Transactions on Mathematical Software, Vol. 13, No. 3, September 1987.  Minimizing Multi-Modal Functions of Continuous Variable l 265 I Initialize parametersIPerform a cycle of random moves, each along onecoordinate direction. Accept or reject eachpoint according to the Metropolis criterion.Record the optimum point reached so far. l no *I et current point to the optimum.Fig. 1. The SA minimization algorithm. zero temperature: It is very likely to get stuck in a metastable, local minimum.On the contrary, SA permits uphill moves under the control of a temperatureparameter. At higher temperature only the gross behavior of the cost function isrelevant to the search. As temperature decreases, finer details can be developedto get a good final point. While the optimality of the final point cannot be ACM Transactions on Mathematical Software, Vol. 13, No. 3, September 1987.  266 l A. Corana, M. Marchesi, C. Martini, and S. Ridellaguaranteed, the method is able to proceed toward better minima even in thepresence of many local minima.A detailed description of the algorithm follows.Step 0 (Initialization)ChooseA starting point x0.A starting step vector vo.A starting temperature To.A terminating criterion E and a number of successive temperature reductionsto test for termination N,.A test for step variation Ns and a varying criterion c.A test for temperature reduction NT and a reduction coefficient rT.Set i, j, m, k to 0. i is the index denoting successive points, j denotes successivecycles along every direction, m describes successive step adjustments, and kcovers successive temperature reductions.Set h to 1. h is the index denoting the direction along which the trial point isgenerated, starting from the last accepted point.Compute f. = f(xo).Set xopt= x0, fopt = fo.Set n, = 0,u = 1, . . . , n.Set fu* = fo,u = 0, -1, . . . , -N, + 1.Step 1Starting from the point xi, generate a random point x’ along the direction h:x’ =x.+ rumh hwhere r is a random number generated in the range [-1, l] by a pseudorandomnumber generator; eh is the vector of the hth coordinate direction; and v,,,,, sthe component of the step vector v, along the same direction.Step 2If the hth coordinate of x’ lies outside the definition domain of f, that is, ifxx C ah or xx > bh, then return to step 1.Step 3Compute f' = f (x'). If f' < fip then accept the new point:set Xi+1 = X’, setfi+l = f', add 1 to i,add 1 to nh;if f' < fopt, then setxopt = x’, fopt = f'. endif; ACM Transactions on Mathematical Software, Vol. 13, No. 3, September 1987.