A restarted and modified simplex search for unconstrained optimization

A restarted and modified simplex search for unconstrained optimization

Computers & Operations Research 36 (2009) 3263 -- 3271 Contents lists available at ScienceDirect Computers & Operations Research journal homepage: w...

302KB Sizes 0 Downloads 29 Views

Computers & Operations Research 36 (2009) 3263 -- 3271

Contents lists available at ScienceDirect

Computers & Operations Research journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / c o r

A restarted and modified simplex search for unconstrained optimization Qiu Hong Zhao a,b , Dragan Urosevic c , Nenad Mladenovic d,∗ , Pierre Hansen b a

School of Economics and Management, Beihang University, Beijing, China GERAD and HEC Montreal, Montreal, Quebec, Canada c Mathematical Institute SANU, Belgrade, Serbia d School of Mathematics, Brunel University, West London, UK b

A R T I C L E

I N F O

Available online 24 March 2009 Keywords: Unconstrained optimization Global optimization Direct search methods Nelder–Mead method Restarted modified simplex search Metaheuristics

A B S T R A C T

In this paper we propose a simple but efficient modification of the well-known Nelder–Mead (NM) simplex search method for unconstrained optimization. Instead of moving all n simplex vertices at once in the direction of the best vertex, our “shrink” step moves them in the same direction but one by one until an improvement is obtained. In addition, for solving non-convex problems, we simply restart the so-modified NM (MNM) method by constructing an initial simplex around the solution obtained in the previous phase. We repeat restarts until there is no improvement in the objective function value. Thus, our restarted modified NM (RMNM) is a descent and deterministic method and may be seen as an extended local search for continuous optimization. In order to improve computational complexity and efficiency, we use the heap data structure for storing and updating simplex vertices. Extensive empirical analysis shows that: our modified method outperforms in average the original version as well as some other recent successful modifications; in solving global optimization problems, it is comparable with the state-of-the-art heuristics. Crown Copyright © 2009 Published by Elsevier Ltd. All rights reserved.

1. Introduction Let us consider the unconstrained continuous optimization problem: min{f (x)|x ∈ Rn },

(1)

where f is a real valued function f : Rn → R, also called objective function. The Nelder–Mead (NM) method [27] for solving unconstrained minimization problems belongs to the class of direct search methods, i.e., methods that do not use derivatives (for a recent survey of direct search methods, see e.g. [21]). In the case where the gradient of f is not available, some direct search methods try to estimate it by evaluating the values of f at several points. For that purpose the simplex is usually used, since it contains the minimum number (n+1) of such points. At each iteration one of these points is dropped and a new point added thus defining a new simplex. Using the direction defined by the line that joints the centroid of the n best simplex points and the worst point in order to select a new point was first suggested in [31]. However, the authors choose only one point on that direction in order to keep the shape of the simplex unchanged. In NM, three

∗ Corresponding author. Tel.: +44 1895 266151; fax: +44 1895 269732. E-mail address: [email protected] (N. Mladenovic). 

points along that line are considered: reflection, contraction, and extension points. Then one among them may replace the worst one; in such a way the new simplex is obtained. As a consequence, the simplex changes its shape during iterations. If none of those new points deserves to replace the worst one, the n worst points move toward to the best one, thus reducing (or shrinking) the volume of the simplex. This procedure is iterated until the volume of the simplex is sufficiently small or any other termination criterion is met. Since its publication in 1965, the NM algorithm has been used in an extraordinarily wide variety of contexts, especially in chemistry, chemical engineering, and medicine. Despite its wide use, until quite recently the NM method has been ignored by almost all of the mainstream optimization community. The reasons of NM's unpopularity during a long period are discussed in [37]. According to this author, the negative attitude arises in part as a reaction to users who might choose an NM simply for its ease of use, even when minimizing a smooth and inexpensive function well suited to a more reliable gradient-based method. In addition, it is remarkable that for 20 years after publication of the NM, no analysis of its theoretical properties was published. The first theoretical result concerning the NM method appeared in the Ph.D. thesis of Woods [36], and in fact does not apply to the original NM. Moreover, mainstream optimization researchers were not impressed by the NM's practical performance, which can, in some cases, be appallingly poor [32]. The NM simplex method was originally designed for finding minima of real valued convex functions. Recently, NM was also used for finding

0305-0548/$ - see front matter Crown Copyright © 2009 Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2009.03.005

3264

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271

global minima of non-convex functions. In that case, either the search process is restarted (see e.g. [18,20,21]) or it is used as a local search (LS) subroutine within a so-called metaheuristic method (see e.g. [9,7,16,19,25,28]). In recent years, theoretical analysis has lead to modifications of NM algorithm with stronger mathematical background. For examples, in [30] the convergence of the proposed NM modification has been shown under mild conditions, including smoothness of f (f ∈ C 1 ). In [5] a convergent variant of NM is based on the principle of grid refrainment. In this paper, a heap data structure based modified NM (MNM) is first presented. Instead of moving all n simplex vertices at once in the direction of the best vertex, our “shrink” step moves them in the same direction but one at the time. Such replacements of worst simplex vertices with new ones continue until a vertex with an objective function value better than that one of a vertex with the worst value is obtained. We then suggest a simple extension that allows us to continue a search after the first local minimum is found: we simply restart such MNM method by constructing an initial simplex around the solution obtained in the previous phase. It has already been shown in the literature that restarting NM is beneficial (see e.g. [18,20,21,29]). Hence, with that extension, we are able to get heuristic (near-optimal) global minima of functions that are nighter convex nor concave as well. Therefore, our method is called restarted modified NM (RMNM). The rest of the paper is organized as follows. In the next section we briefly outline the steps of NM and some properties that will be used later. In Section 3 we give details of our two modified versions of NM. Extensive computational analysis is performed in Section 4. Finally, conclusions are given in Section 5. 2. NM simplex method The term direct search method appears to have originated in the 1961 paper by Hooke and Jeeves [17], but since then it has become a catch-all phrase applied usually to any optimization method that does not require an explicit representation of the gradient of f . The important class of simplex-based direct search methods was introduced in 1962 by Spendley et al. [31]. A simplex-based method constructs an evolving pattern of n+1 points in Rn that are viewed as the vertices of a simplex (A simplex in two dimensions is a triangle; a simplex in three dimensions is a tetrahedron). The most popular simplex-based search method was proposed by Nelder and Mead in 1965 [27]. The NM is based on the idea of creating a sequence of changing simplices, but deliberately being modified so that the simplex adapts itself to the local landscape. It should be noted that there are other non-simplex direct search algorithms in the literature. They do not find search direction by using n + 1 points in Rn . Such methods are pattern search algorithms [33] and mesh adaptive direct search algorithms [3,2]. In this note we consider only simplex type direct search methods. One NM iteration. We first give steps of one NM iteration. It is assumed that the initial simplex X = {x1 , x2 , . . . , xn+1 } is given. The output, i.e., the new simplex, is also denoted by X. Note that X is n × (n + 1) matrix, where each column represents a simplex vertex. Algorithm 1. One NM iteration. function NM-Iteration (X, f ); 1 Order. Order the n + 1 vertices of X to satisfy f (x1 )  f (x2 ) . . .  f (xn+1 ). 2 Reflect. Compute the reflection point xr as xr = x¯ + (x¯ − xn+1 ),  where x¯ = 1n ni=1 xi is the centroid of the n best points (all vertices except for xn+1 ). If f (x1 )  f (xr ) < f (xn ), accept the reflected point xr , terminate the iteration.

Expand. If f (xr ) < f (x1 ), calculate the expansion point xe = x¯ + (xr − x¯ ). If f (xe )  f (xr ), accept the expanded point xe and terminate; otherwise accept xr and terminate the iteration. 4 Contract. If f (xr )  f (xn ), perform a contraction between x¯ and the better of xn+1 and xr . (a) Outside. If f (xr ) < f (xn+1 ), then outside contraction: xc = x¯ + (xr − x¯ ). If f (xc )  f (xr ), accept xc and terminate; otherwise go to Shrink step. (b) Inside. If f (xr )  f (xn+1 ), then inside contraction: xc = x¯ − (x¯ − xn+1 ). If f (xc )  f (xn+1 ), accept xc and terminate; otherwise go to Shrink step. 5 Shrink. Evaluate f at the n points vi =x1 + (xi −x1 ), i=2, . . . , n+1. The (unordered) vertices at the next iteration consist of V = {x1 , v2 , . . . , vn+1 }; set X = V.

3

The meaning of “point x is accepted” in Algorithm 1 is: X ← X ∪ x\xn+1 , i.e., x replaces the worst point from X. According to the original NM paper [27], the coefficients in the NM iteration should satisfy  > 0,  > 1, 0 <  < 1, and 0 <  < 1. The standard, nearly universal, choices for these values are  =1,  =2,  = 12 , and  = 12 . We also use these values in the two modifications given in this paper. The NM algorithm. To specify a complete NM algorithm, we need to define the initial simplex and the termination criterion. In absence of information about the particular function being minimized, it is customary to specify a starting point in Rn that is taken as one of the initial simplex vertices. The other n vertices are then generated in one of the two ways (for some problems, of course, it may be possible for the user to specify n+1 suitable starting vertices): perturbing the starting point by a specified step along the n coordinate directions, or creating a regular simplex with specified edge length and orientation. The first way is adopted in Algorithm 2 with specified step  · m · ei , where  is a parameter with predefined value, ei the n-dimensional unit vector with one in the ith component (and zeros elsewhere) and m represents the largest absolute value among coordinates of the starting point x. Regarding termination, most implementations of direct search methods, including NM, use one of the following two criteria: either the function values at all vertices are close, or the volume of the simplex becomes very small (Volume(X) < , where  is an arbitrary small number). However, since NM does not always converge, the second usual termination criterion (i.e., the maximum number of iterations kmax ) is introduced as well in the Algorithm 3. Algorithm 2. Creating initial simplex.

Algorithm 3. NM method.

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271

3265

Clearly, the best solution found by NM procedure is x∗ = x1 since points of the simplex are ranked by their objective function values. Computational complexity of NM. Let us denote with g(n) the number of operations necessary to calculate the objective function value f (x), for a given x ∈ Rn . We have the following property.

Section 4. In what follows, we gradually introduce our two modifications.

Property 1. The worst-case behavior of one iteration of NM is:

Our first modification of the original NM is called MNM. Basically, we only modify the shrinking step. However, we suggest using a heap data structure to update the sequence of the simplex vertices more efficiently. Modification of shrink step. Simply, instead of shrinking all n vertices of the simplex X along the direction of the best one (Step 5 in Algorithm 1), we move only the worst one (with maximal function value). If there is no improvement in the next iteration, we move the second worst one of the original X and so on. If all vertices in the original simplex X have been shrunk a new simplex X1 has been obtained and replaces X. Our shrinking step is listed later in MNM pseudo-code (in Algorithm 4) within a heap data structure. The possible advantages of such a modification are as follows: (i) classical shrinking step is the main reason for divergence of NM method; the simplex greatly reduces its volume in the classical shrink step [37]. (ii) A chance to jump over (or miss) the local minimum is larger with the classical shrink step. In such case, if NM converges, possibly more function evaluations are needed to reach the local minimum. (iii) Using appropriate data structure, i.e., heaps, the worst-case complexity of one iteration can be reduced (see Property 2). Heap structure within simplex method. Heap is a complete binary tree with the following properties [1]: (i) Every node except nodes in the last (lowest) level (or two last levels, depending on the number of the nodes) has two children. (ii) All nodes on the lowest level are in left-most positions. (iii) Values are stored (saved) at nodes. (iv) Value stored (saved) at any node is greater than values at its child nodes. A simple illustrative example is given in Appendix A. In the MNM, vertices of the current simplex are stored in the heap, such that vertex in the root of heap has maximal (worst) function value f , and each node of heap satisfies the following conditions: if y0 is a simplex vertex stored in any node and y1 and y2 are vertices stored in his child, then f (y0 )  f (y1 ) and f (y0 )  f (y2 ). In that way we can effectively find two vertices with maximal function values (see Appendix A for the pseudo-code of Heap). In each iteration, we replace at least one vertex and automatically update heap, in O(log n). Therefore, the following holds.

(i) max{O(n log n), O(g(n))}, if the shrinking step is not used; (ii) O(n · g(n)) if the shrinking step is performed. Proof. It is clear from Algorithm 1 that the ordering of vertices is performed in each iteration. Thus, its complexity is O(n log n). Also, it is easy to see that finding reflection, contraction or expansion point is in O(g(n)). So, if the shrinking step is not performed, the number of numerical operations is bounded either by the ordering step, or by g(n). However, if the shrinking step is performed, the number of function evaluations in each iteration is n, i.e., its complexity is O(n · g(n)).  Since evaluating the value of the objective function at a given point x usually requires at least O(n) calculations, the worst-case number of calculations in each iteration of NM is greater than or equal to n2 operations. So, ordering a sequence of n + 1 numbers is usually less time consuming than finding the objective function value, even in cases where shrinking step is not performed. Consequently, the usual measure in comparing different NM variants is the number of function evaluation, and the researchers did not make any effort to replace the O(n log n) ordering steps with some updating step. In our modifications, however, the complexity of one NM iteration can be reduced and the computational efficiency can be improved by using heap data structure. 3. Modified and restarted NM method Since it is easy to understand and code, the NM has been extensively used to solve parameter estimation and other problems. However, several researches have shown that the method can fail to converge or may converge to a non-minimum solution on certain classes of problems [22,23,36]. As a result, numerous modifications to the NM algorithm have been proposed in the literature, e.g. [26,30,34]. Their purpose is mainly to make NM's performance stable and/or to extend it for solving global optimization problems. Many users consider that the easiest fix is to restart the search with a new simplex, which can lead to substantial improvements in efficiency [20,21]. When it stops, the revised NM method can simply build a new larger simplex around the local optimum (i.e., the simplex whose volume is larger than ) and continue. Building a new simplex with the size of each edge equal to half the size it had in the previous one, gave very good results according to [18]. A new way of getting n simplex points around the given n + 1st one is also suggested there. The overall method is called revised simplex search (RSS). Since RSS is probably the most efficient modification of NM that allows solving global optimization problems as well, we will compare it with our new methods in Section 4. The combination of some metaheuristic based global optimization heuristics (Tabu search; genetic algorithm, GA; etc.) with the NM method (or some modified version of it) has recently attracted considerable attention. For example, in [13] NM is hybridized with GA and particle swarm optimization (PSO). The hybrid methods are tested on some non-linear continuous functions and then compared with the best known heuristics in the literature. There are several other studies that follow such lines [9,35,7,15,16]. Most of these approaches will be discussed and compared with our methods in

3.1. The MNM

Property 2. The worst-case behavior of one iteration of MNM is max{O(n), O(g(n))}. The initial simplex. The initial vertex x1 of the simplex is chosen at random. The same as in [18], the remaining vertices of the initial simplex are obtained using Algorithm 2, where  = 4.0 initially. Termination criterion. The termination criterion is set to be the combination of number of iterations without improvement (itmax is set to 10,000) and the following condition hold: |f (xmax ) − f (xmin )|  10o , |f (xmax ) − f (xmin )| + 10o

(2)

where xmin and xmax denote simplex vertices with the smallest and the largest function values respectively and o = −10. So, we suggest the first modification of NM (MNM) method, in order to get O(1) number of function evaluations in the worst (shrinking) case. Its pseudo-code is given in Algorithm 4. Pseudo-codes for MakeHeap and UpdateHeap subroutines mentioned in Algorithm 4, are given in Appendix 1.

3266

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271 Table 1 Standard test instances.

Algorithm 4. Pseudo-code for MNM.

Fun.

Abbr.

n

fmin

Dixon and Price Griewank Powell Rosenbrock Schwefel Zakharov Rastrigin

DP GR PO RO SC ZA RA

10, 15, . . . , 100 10, 15, . . . , 100 8, 12, . . . , 100 10, 15, . . . , 100 10, 15, . . . , 100 10, 15, . . . , 100 10, 15, . . . , 100

0 0 0 0 0 0 0

on the last two sets. The initial simplex of each method is generated in the same way as for a MNM method described earlier. The termination criterion for NM and for each phase of RSS is defined in (2). In addition, the value of the parameters , , ,  are the same for all methods. 3.2. The RMNM

4.1. Standard test problems

It has already been observed in [18,20,21,29] that it is beneficial to restart the whole method (in the second phase) from the solution obtained in the first phase. For example in [20] an reoriented restart is suggested, which reinitializes the simplex to a smaller one around its best vertex. The RSS procedure [18] consists of a threephase application of the NM method in which: (a) the ending values for one phase become the starting values for the next phase; (b) the step size for the initial simplex (respectively, the shrink coefficient) decreases geometrically (respectively, increases linearly) over successive phases; and (c) the final estimated optimum is the best of the ending values for the three phases. We instead take the same size (volume) of the simplex in the proceeding phases as in the first one. We finish the procedure when the solution f (x1 ) in current phase is not better than the best one (denote it as f (x)) obtained in the previous phase; i.e., the termination criterion is f (x1 )  f (x). We call this procedure RMNM, which is strictly descent and thus may be seen as an LA as well. In Algorithm 5 we present the pseudo-code of RMNM.

In this subsection, the two modified versions of NM suggested in this paper are compared with the original NM as well as with RSS on test instances listed in Table 1. The name and the abbreviation of each test function are given in the first two columns of this table, respectively. The third column n denotes the dimension and the last column fmin gives a known minimum objective function value gained from the literature. These test functions (or some of them) can be found for example in [9,6,15], etc., and represent well the diversity in characteristics of difficulties that arising in global optimization problems. As presented in Table 1, in order to conduct comparative analysis among different methods, the dimension of these functions is changed within some interval. For all the functions except Powell's function, the dimension is changed from 10 to 100, with step 5 steadily, while the last one is from 8 to 100, with step 4. Thus, each + 1) instances ( 100−8 + 1 = 24 test function consists of 19 ( 100−10 5 4 instances for Powell's test function). The code for each function under each dimension is run one time only. Computational results are reported in Tables 2 and 3, respectively. In Table 2, the column “# of Glob.” reports how many times global minima are reached, out of 19 or 24 instances; while column “# of Loc.” lists accordingly the number of times local minima are attained, i.e., when the current solution x satisfies the following condition (we use the fact that all standard test functions in Table 1 are differentiable):

Algorithm 5. Pseudo-code of RMNM.

|∇f (x)| < 103 , 4. Computational results In this section we analyze the quality of our modifications of NM. The algorithms for NM, MNM, RSS, and RMNM methods are coded in C + + and run on a Pentium 4 computer with 1400 MHz processor and 256 MB of RAM. As in [6,15,16], we use the following criterion to judge the success (i.e., convergence to a global minimum) of a trial: |f˜ − fmin | < 1 |fmin | + 2 ,

(3)

where f˜ refers to the best function value obtained by the algorithm, fmin is the exact global minimum. As it is usual in the literature, 1 and 2 are set to 10−4 and 10−6 , respectively. To conduct the overall analysis, four sets of test problems are considered. The first two sets are solved by both new versions of NM and compared with the original NM and RSS. The new RMNM heuristic is compared with two type of metaheuristic based methods

(4)

where 3 is set to 10−4 . We take “# of Loc.” as an additional index evaluating the performance of different methods and further, as a reference in future research for analyzing the relationship of stopping condition and optimality condition. We report in column “# of Div.” how many times method converges to a non-stationary point (within a given maximum number of iterations). However, when we removed the maximum number of iteration condition, methods almost always converged to global or local optima, but much more computing time was spent. In Table 3 we report the average objective function values and average computing time obtained in the same set of runs as given in Table 2. As noted by an anonymous referee, averaging the computation time across various dimensions from 1, . . . , 100 may not be the best idea because the computation for a 100-dimensional problem takes significantly more time than the computation for the 10dimensional problem and the average is strongly biased toward the performance of the 100-dimensional problem. Therefore, beside the

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271

3267

Table 2 Comparison of methods on standard test instances. Fun.

DP GR PO RO SC ZA RA

Classical NM

Modified NM

Revised SS

Restart MNM

# of Glob.

# of Loc.

# of Div.

# of Glob.

# of Loc.

# of Div.

# of Glob.

# of Loc.

# of Div.

# of Glob.

# of Loc.

# of Div.

0 0 3 0 0 2 0

2 0 0 1 0 0 2

17 19 21 18 19 17 17

0 0 3 0 0 2 0

3 0 0 1 1 1 1

16 19 21 18 18 16 18

0 8 23 1 0 3 0

12 0 1 2 1 0 4

7 11 0 16 18 16 15

3 9 16 18 3 8 4

15 1 8 0 8 11 2

1 9 0 1 8 0 13

Table 3 Average objective function values and running time for standard test functions. Fun.

Average values

DP GR PO RO SC ZA RA

Average time

NM

MNM

RSS

RMNM

NM

MNM

RSS

RMNM

209,732.08 61.88 38.71 832.14 12,091.18 37.76 300.94

4237.74 6.81 47.78 19,795.45 11,691.08 210.00 387.77

1.83 0.04 0.00 58.88 10,836.81 0.26 208.59

0.56 0.02 0.00 0.00 6659.31 0.00 127.44

19.00 57.41 40.40 11.28 46.11 3.36 29.30

8.87 19.08 10.56 6.43 33.64 1.45 11.28

45.64 107.40 71.29 17.67 39.22 16.20 24.15

19.60 22.28 15.96 17.73 220.06 17.53 213.39

Table 4 The first computational results of testing set in Section 4.2. Fun.

f1 f2 f3 f4 f5 f6

Classical NM

Modified NM

Revised SS

Restart MNM

# of Glob.

# of Loc.

# of Div.

# of Glob.

# of Loc.

# of Div.

# of Glob.

# of Loc.

# of Div.

# of Glob.

# of Loc.

# of Div.

10 28 0 21 12 0

0 0 0 1 3 49

39 21 49 2 34 0

7 12 2 23 9 0

0 0 0 1 0 49

42 37 47 0 40 0

25 49 0 24 40 0

0 0 0 0 9 49

24 0 49 0 0 0

49 49 48 24 35 0

0 0 0 0 12 49

0 0 1 0 2 0

Table 5 Average objective function values and running time for instances from [18]. Fun.

f1 f2 f3 f4 f5 f6

Average values

Average time

NM

MNM

RSS

RMNM

NM

MNM

RSS

RMNM

11.13 35.41 606.00 1.01 3.70 6.32

25.97 487.47 15.29 1.01 187.68 6.32

1.07 1.00 606.00 1.00 2.38 6.32

1.00 1.00 1.00 1.00 2.32 6.32

1.24 160.96 88.58 20.00 19.14 0.04

1.06 53.35 4.00 5.97 2.25 0.48

3.57 335.67 387.40 72.05 29.74 0.23

7.01 64.61 51.44 5.97 16.83 0.48

average values, we give results for every n = 1, . . . , 100 at our web site, http://www.mi.sanu.ac.rs/∼nenad/rmnm. The following observations can be made: (i) The best method on average is our RMNM: it has the smallest number of instances in divergence column (see Table 2); its average objective function values are better for all instances (see Table 3). (ii) NM and MNM have similar performance in terms of solution quality. However, as expected, MNM is faster. 4.2. Instances from Humphrey and Wilson [18] In this subsection, MNM and RMNM, along with NM and RSS, are tested on instances used in [18]. For functions f1 –f6 , the dimension of the problem may be 1, 2, . . . , ∞, while the dimension of f4 must be a multiple of 4. For all those functions (with finite dimension), the minimum objective function values are equal to 1 (fmin = 1).

In our computational analysis the dimension of functions fi (x), i = 1, . . . , 6, i  4 are set from 4 to 100, with step 2, while the dimension of f4 is set from 4 to 96, with step 4 (i.e., 4, 8, . . . , 96). Therefore, the total number of test instances are 49 and 24, respectively. Tables 4 and 5 give the same type of information as Tables 2 and 3, respectively. It appears that: (i) Our restart MNM dominates RSS method in terms of solution quality. The largest difference is in solving problem f3 where RMNM found all 49 global minima, while RSS failed to find any. The reason for such a difference in solution quality is the shape of f3 function; by performing a classical shrinking step within RSS, global minima are missed. (ii) While both RSS and RMNM found all global minima (in solving f2 ), RMNM did that in less computing time on average (compare 64.61 s of RMNM with 335.67 s of RSS).

3268

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271

(iii) It is also interesting to note that all methods behave the same in solving the f6 instance. Function f6 has a deep valley with local minimum that is not the global one and it is not easy to get out from it with general parameter values. Of course, by increasing the size of the initial simplex around that local minimum (instead of keeping the same value for it as in RMNM or a smaller one as in RSS) may allow us to solve this local optimum trap problem more efficiently.

Table 6. In the last column of Table 6 the right-hand side values of formulae (3) are given. In order to compare our RMNM heuristic with several recent solvers, we divide all of them into two groups: eight methods that do not use NM as subroutine and eight methods that use it. In both groups there are heuristics that are based on metaheuristics. For example, in [7] the following metaheuristics are included: simulated annealing (SA), GA, VNS, etc. Methods and their sources that do not use NM as local optimizer are given in Table 7. The usual way to compare non-linear programming methods is to consider the number of function evaluations before the global minimum is reached. The comparison of the average computational efforts for 100 runs are given in Table 8, where the number in bold line in each row is the best result ever found by these methods. Columns contain the average number of function evaluations needed to find the first global minimum satisfying |f˜ − fmin | < 1 |fmin | + 2 . The numbers in parentheses denote the number of runs for which the method found the global minimum; in the case of 100% no number is reported. For all methods except RMNM, we accept the published results from the references given in Table 7. Since some results are not available for some instances, Table 8 contains empty entries. In Table 8 there are nine global optimizers and they are tested on 11 functions (or part of them). For five functions RMNM gets global optimal solution with minimum average number of function evaluations, when compared with the other eight methods. It appears that RMNM and GVNS [25] perform better than other solvers, on average. In the last few years some researchers used NM method within different metaheuristics, such as GA, VNS, or Tabu search. Non-linear solvers that belong to this group are listed in Table 9. Table 10 displays the average computational efforts of methods listed in Table 9. In each row of Table 10, the number in bold line is the best result ever found by these methods. Averages are obtained in 100 runs for all methods except for [4], where methods are restarted 1000 times. It appears that RMNM were able to find six (out of nine) global minima with smallest computational efforts.

4.3. Global optimization instances The RMNM is in fact LS deterministic heuristic. In this subsection we compare RMNM with methods that follow some metaheuristic rules (or frameworks for building heuristics). Such methods usually use LS as there part. Thus, it is clear that RMNM may also be used as a subroutine of some metaheuristic, such as genetic search (GS) or variable neighborhood search (VNS) [24]. Nevertheless, here we want to check how our RMNM compares with heuristics from the literature that are based on metaheuristic principles. Functions tested in this section can be found in [11,15]. Their names (Fun.), abbreviations (Abbr.), dimensions (n), and optimal values (fmin ) are given in

Table 6 Global optimization test problems. Fun.

Abbr.

n

fmin

10−4 |fmin | + 10−6

Branin RCOS Goldstein and Price Griewank Hartman

BR GP GR HT 3 HT 6 MG RO2 RO10 SH5 SH10 SB

2 2 10 3 6 2 2 10 4 4 2

0.3979 3.0000 0.0000 −3.8628 −3.3224 0.0000 0.0000 0.0000 −10.1532 −10.5364 −186.7309

0.00004079 0.000301 0.000001 0.00038728 0.00033324 0.000001 0.000001 0.000001 0.00101632 0.00105464 0.01867409

Martin and Gaddy Rosenbrock Shekel Shubert

Table 7 The global minimizers without NM as subroutine. Table 9 The global minimizers with NM as subroutine.

Method

Reference

General variable neighborhood search (GVNS) Enchanced continuous Tabu search (ECTS) Continuous reactive tabu search (CRTSa ) Taboo search (TS) Continuous genetic algorithm (CGA) Continuous interacting and colony algorithm (CIAC) Hybrid continuous interacting ant colony (HCIAC) Directed Tabu search with adaptive pattern search (DTS2)

Mladenovic et al. [25] Chelouah and Siarry [8] Battiti and Tecchiolli [4] Cvijovic and Klinowski [10] Chelouah and Siarry [6] Dreo and Siarry [11] Dreo and Siarry[12] Hedar and Fukushima [15]

a

The best results of variants CRTSmin and CRTSave are chosen.

Method

Reference

Genetic and Nelder–Mead (GNM) Genetic with Nelder–Mead (GANM) Swarm with Nelder–Mead (SNM) Niche hybrid genetic algorithm (NHGA) Tabu search and Nelder–Mead (TSNM) Directed Tabu search with Nelder–Mead (DTS1) Continuous scatter search (CSS) Revised Nelder–Mead simplex (RSS)

Chelouah and Siarry [9] Fan et al. [13] Fan et al. [13] Wei and Zhao [35] Chelouah and Siarry [7] Hedar and Fukushima [15] Herrera et al. [16] Humphrey and Wilson [18]

Table 8 Comparison of RMNM with methods that do not use NM as subroutine. Function

GVNS

ECTS

CRTS

TS

CGA

BR GP HT 3 HT 6 SB RO2 RO10 GR SH5 SH10 MG

45 150 416 225 529 152 72,689 8360 432 627 260

245 231 548 1520 370 480 15,720 (85) – 825 (75) 898 (75) –

38 171 513 750 – – – – 664 693 –

492 486 508 2845 727 – – – – – –

– 410 – – – 960 – – 610 (76) – –

CIAC

23,391 (56)

11,797 50,121 (52) 39,311 (5) 11,751 (20)

HCIAC

DTS2

RMNM

– 34,533 – – – 18,747 – 23,206 (18) 17,761 – 24,596

212 230 438 1787 (83) 274 (92) 254 9037 (85) – 819 (75) 828 (52) –

60 69 (80) 67 398 (50) 275 (40) 224 5946 (95) 18,972 912 (90) 318 (75) 76

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271

3269

Table 10 Comparison of RMNM with heuristics that use NM as subroutine. Function

GNM

GANM

SNM

NHGA

TSNM

DTS1

CSS

RSS

RMNM

BR GP HT 3 HT 6 SB RO2 RO10 SH5 SH10

295 259 492 930 345 459 14,563 (83) 698 (85) 635 (85)

356 422 688 – 1009 738 5194 2366 –

230 304 436 – 753 440 3303 850 –

– – – – – 239 6257 – –

125 151 698 2638 279 369 – 545 (69) 589 (65)

274 293 (88) 670 (97) 3978 (68) 298 (44) 358 19,139 (78) 1426 (39) 1438 (22)

65 108 – – 762 292 5847 (75) 1197 –

61 75 (75) 70 354 (40) 280 (29) 234 2056 (90) 251 (5) 330

60 69 (80) 67 398 (50) 275 (40) 224 5946 (95) 912 (90) 318 (75)

5. Conclusions Unlike other gradient type optimization methods, NM direct search allows continuation of the search after the first local minimum is found; one can simply build a new initial simplex around the incumbent solution. This step may be repeated until there is an improvement in the objective function value. Thus, the NM restart procedure exploits in the simplest possible way the empirical observation that most local optima are relatively close one to another. In this paper we investigate such restarting possibility of the NM method. Moreover, we suggest a simple modification of NM's shrink step so that we are able to reduce the complexity of one NM iteration by using a heap data structure. Extensive empirical analysis shows that: (i) our modified method outperforms on average the original version as well as some other recent successful modifications; (ii) it is comparable with the state-of-the-art heuristics for solving global optimization problem; (iii) the restart procedure with the new simplex length equal to its initial value seems to be a better choice than reducing it in each new restart as in [18]. It should be noted that our modifications may be implemented within recent convergent variants of NM as well. Therefore, future research could include theoretical investigation of the convergence properties of an extended (convergent) modified version. Use of our MNM within metaheuristics, such as VNS [24] could be tried out as well.

simplex vertex is replaced with the new one. Finding its position in the given ordered set and updating the heap is in O(log n) time, instead of O(n) without using heap (see Fig. 2). The detailed pseudocodes are given in Algorithms 6–9.

Acknowledgments

Algorithm 6. Pseudo-code of creating heap.

Fig. 1. Illustration of a heap data structure.

The work of the first author was partially supported by National Natural Science Foundation of China (under Project nos. 70771001 and 70821061) and by New Century Excellent Talents in University of China under Project no. NCET-07-0049. The second and third authors are partially supported by Serbian Ministry of Science, Project no. 144007. Appendix A. Heap data structure Heap may be represented as an one-dimensional array h(1), h(2), . . . , h(m) that is able to keep the order of numbers stored in it h(1)  h(2)  · · · h(m) (see Fig. 1). • The value stored in the root is element h(1) of the heap; • If the index of an element in the heap is k, then indices of its children are 2k (left child) and 2k + 1 (right child) respectively; • If the value is stored in any non-root node h(k), then its parent value is stored in h( 2k ). We use the heap data structure for storing objective function values of the current simplex vertices (m = n + 1). In each iteration a worst

Algorithm 7. Pseudo-code of updating heap.

Algorithm 8. Pseudo-code of moveup in updating RMNM.

3270

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271

25

25

38

33

53

57

49

47

41

38

33

42

41

53

57

31

25

38

31

57

33

53

42

47

25

41

49

31

49

38

31

42

47

41

57

33

53

49

42

47

Fig. 2. Inserting a new value in heap.

Table 11 Criteria for judging success of the trials of the methods in Tables 7 and 9.

We now comment on five different criteria used in the literature.

Method

Criterion for judging success

Appellation

GVNS ECTS CRTS TS CGA CIAC HCIAC DTS2

|f˜ − fmin | < 10−4 |fmin | + 10−6 |f˜ − fmin | < 10−4 |finit | + 10−6 The same as GVNS |f˜ − fmin | < 0.01 The same as GVNS |f˜ − fmin | < 10−4 |fmin | + 10−4 The same as CIAC The same as GVNS

Criterion Criterion Criterion Criterion Criterion Criterion Criterion Criterion

1 2 1 3 1 4 4 1

GNM GANM SNM NHGA TSNM DTS1 CSS CGA

The same as ECTS The same as ECTS The same as ECTS |f˜ − fmin | < 0.001 The same as ECTS The same as GVNS The same as GVNS The same as GVNS

Criterion Criterion Criterion Criterion Criterion Criterion Criterion Criterion

2 2 2 5 2 1 1 1

• finit is an empirical average of the objective function values, calculated over 100 randomly selected feasible points obtained before running the algorithm. • The criterion for judging CRTS is not shown in the original paper. However, in papers where the two methods were cited for comparative analysis, various criteria are given (e.g. see [13,16]). We use here the most strict one. • It is obvious that criterion 1 is more strick than criteria 2 and 4. • Criterion 3 was used by [10]. For five functions whose computational results are available in [10] (see Table 8), only SB gets a slightly more relax criterion when criterion 1 instead of criterion 3 is used (0.01867409 vs. 0.01, i.e., see the last column of Table 6). • Criterion 5 was used in [35]. The computational results for two test instances are available (see Table 8). It is clear that criterion 1 is harder than Criterion 5 (see the last column of Table 6).

References Algorithm 9. Pseudo-code of movedown in updating RMNM.

Appendix B. Stopping criteria for methods compared in Tables 7 and 9 The first column of Table 11 gives the abbreviations of the methods compared in Tables 7 and 9, while the second column lists criteria used for judging the success of one trial by the corresponding method.

[1] Aho AV, Ullman JD, Hopcroft JE. Data structures and algorithms. Reading, MA: Addison-Wesley; 1983. [2] Audet C, Béchard V, Le Digabel S. Nonsmooth optimization through mesh adaptive direct search and variable neighborhood search. Journal of Global Optimization 2008;41:299–318. [3] Audet C, Dennis Jr. JE. Mesh adaptive direct search algorithms for constrained optimization. SIAM Journal on Optimization 2006;17:188–217. [4] Battiti R, Tecchiolli G. The continuous reactive tabu search: blending combinatorial optimization and stochastic search for global optimization. Annals of Operations Research 1996;63:153–88. [5] Burmen A, Puhan J, Tuma T. Grid restrained Nelder–Mead algorithm. Computational Optimization and Applications 2006;34:359–75. [6] Chelouah R, Siarry P. A continuous genetic algorithm designed for the global optimization of multimodal functions. Journal of Heuristics 2000;6:191–213. [7] Chelouah R, Siarry P. A hybrid method combining continuous tabu search and Nelder–Mead simplex algorithms for the global optimization of multiminima functions. European Journal of Operational Research 2005;161:636–54. [8] Chelouah R, Siarry P. Tabu search applied to global optimization. European Journal of Operational Research 2000;123:256–70. [9] Chelouah R, Siarry P. Genetic and Nelder–Mead algorithms hybridized for a more accurate global optimization of continuous multiminima functions. European Journal of Operational Research 2003;148:335–48. [10] Cvijovic D, Klinowski J. Taboo search: an approach to the multiple minima problem. Science 1995;267:664–6. [11] Dreo J, Siarry P. Continuous interacting ant colony algorithm based on dense heterarchy. Future Generation Computer Systems 2004;20:841–56.

Q.H. Zhao et al. / Computers & Operations Research 36 (2009) 3263 -- 3271

[12] Dreo J, Siarry P. Hybrid continuous interacting ant colony aimed at enhanced global optimization. Algorithmic Operations Research 2007;2:52–64. [13] Fan SKS, Liang YC, Zahara E. A genetic algorithm and a particle swarm optimizer hybridized with Nelder–Mead simplex search. Computers and Industrial Engineering 2006;50:401–25. [15] Hedar AR, Fukushima M. Tabu search directed by direct search methods for nonlinear global optimization. European Journal of Operational Research 2006;170:329–49. [16] Herrera F, Lozano M, Molina D. Continuous scatter search: an analysis of the integration of some combination methods and improvement strategies. European Journal of Operational Research 2006;169:450–76. [17] Hooke R, Jeeves TA. “Direct search” solution of numerical and statistical problems. Journal of the ACM 1961;8:212–29. [18] Humphrey DG, Wilson JR. A revised simplex search procedure for stochastic simulation response surface optimization. INFORMS Journal on Computing 2000;12:272–83. [19] Karam A, Hansen P. Arbitrary-norm hyperplane separation by variable neighbourhood search. IMA Journal of Management Mathematics 2007;18: 173–89. [20] Kelley CT. Detection and remediation of stagnation in the Nelder–Mead algorithm using an sufficient decrease condition. SIAM Journal of Optimization 1999;10:33–45. [21] Kolda TG, Lewis RM, Torczon V. Optimization by direct search: new perspectives on some classical and modern methods. SIAM Review 2003;45: 385–482. [22] Lagarias JC, Reeds JA, Wright MH, Wright PE. Convergence properties of the Nelder–Mead simplex method in low dimensions. SIAM Journal on Optimization 1998;9:112–47. [23] Mckinnon KIM. Convergence of the Nelder-Mead simplex method to a nonstationary point. SIAM Journal on Optimization 1998;9:148–58. [24] Mladenovic N, Hansen P. Variable neighborhood search. Computers and Operations Research 1997;24:1097–100.

3271

 [25] Mladenovic N, Draic M, Kovacevic-Vuj cic V, Cangalovic M. General variable  neighborhood search for the continuous optimization. European Journal of Operational Research 2008;191:753–70. [26] Nazareth L, Tseng P. Gilding the Lily: a variant of the Nelder–Mead algorithm based on golden-section search. Computational Optimization and Applications 2002;22:133–44. [27] Nelder JA, Mead R. A simplex method for function minimization. Computer Journal 1965;7:308–13. [28] Pacheco J, Casado S, Nuñez L. Use of VNS and TS in classification: variable selection and determination of the linear discrimination function coefficients. IMA Journal of Management Mathematics 2007;18:191–206. [29] Parkinson JM, Hutchinson D. An investigation into the efficiency of variants on the simplex method. In: Lootsma FA, editor. Numerical methods for non-linear optimization. London: Academic Press; 1972. p. 115–35. [30] Price CJ, Coope ID, Byatt D. A convergent variant of the Nelder–Mead algorithm. Journal of Optimization Theory and Applications 2002;113:5–19. [31] Spendley W, Hext GR, Himsworth FR. Sequential application of simplex designs in optimisation and evolutionary operation. Technometrics 1962;4:441–61. [32] V. Torczon, Multi-directional search: a direct search algorithm for parallel machines. PhD thesis, Rice University, Houston, Texas, USA. 1989. [33] Torczon V. On the convergence of pattern search algorithms. SIAM Journal on Optimization 1997;7:1–25. [34] Tseng P. Fortified-descent simplicial search method: a general approach. SIAM Journal on Optimization 1999;10:269–88. [35] Wei LY, Zhao M. A niche hybrid genetic algorithm for global optimization of continuous multimodal functions. Applied Mathematics and Computation 2005;160:649–61. [36] D.J. Woods, An interactive approach for solving multi-objective optimization problems. PhD thesis, Rice University, Houston, Texas, USA. 1985. [37] Wright MH. Direct search methods once scorned now respectable. In: Griffiths DF, Watson GA, editors. Numerical analysis. Harlow, United Kingdom: AddisonWesley Longman; 1996. p. 191–208.