ArticlePDF Available

Using Sampling and Simplex Derivatives in Pattern Search Methods

Authors:

Abstract

Pattern search methods can be made more ecien t if past function evaluations are appropriately reused. In this paper we will introduce a number of ways of reusing previous evaluations of the objective function based on the computation of simplex derivatives (e.g., simplex gradients) to improve the eciency of a pattern search iteration. At each iteration of a pattern search method, one can attempt to compute an accurate simplex gradient by identifying a sampling set of previous iterates with good geometrical properties. This simplex gradient computation can be done using only past successful iterates or by considering all past function evaluations. The simplex gradient can then be used, for instance, to reorder the evaluations of the objective function associated with the positive spanning set or positive basis used in the poll step. But it can also be used to update the mesh size parameter according to a sucien t decrease criterion. None of these modications demands new function evaluations. A search step can also be tried along the negative simplex gradient at the beginning of the current pattern search iteration. We will present these procedures in detail and show how promising they are to enhance the practical performance of pattern search methods.
A preview of the PDF is not available
... In the three simplex derivatives mentioned above, it is also possible to use a different number of sample points. In [6,8,9,16], the authors define the generalized simplex gradient (GSG), which covers the cases where either fewer or more than n + 1 sample points are provided, and provide the corresponding error bounds. Hare et al. [10] define and study the generalized centred simplex gradient (GCSG), which does not require exactly 2n+1 points. ...
... The generalized simplex gradient generalizes the simplex gradient by allowing the number of sample points to be not exactly n + 1. This definition can be found in multiple papers, e.g., [6,8,9,16]. ...
... Error bounds with floating point errors are established in later sections. The error bounds of the GSG can be found in, e.g., [6,8,9,16]. Essentially, the error bound shows that the GSG provides an accuracy of O(∆) where ∆ is the approximate diameter of the sample set. ...
Preprint
Full-text available
Gradient approximations are a class of numerical approximation techniques that are of central importance in numerical optimization. In derivative-free optimization, most of the gradient approximations, including the simplex gradient, centred simplex gradient, and adapted centred simplex gradient, are in the form of simplex derivatives. Owing to machine precision, the approximation accuracy of any numerical approximation technique is subject to the influence of floating point errors. In this paper, we provide a general framework for floating point error analysis of simplex derivatives. Our framework is independent of the choice of the simplex derivative as long as it satisfies a general form. We review the definition and approximation accuracy of the generalized simplex gradient and generalized centred simplex gradient. We define and analyze the accuracy of a generalized version of the adapted centred simplex gradient. As examples, we apply our framework to the generalized simplex gradient, generalized centred simplex gradient, and generalized adapted centred simplex gradient. Based on the results, we give suggestions on the minimal choice of approximate diameter of the sample set.
... One recent line of research explores methods of improving or generalizing the simplex gradient for its use in DFO [15,16,18,21,25,28,39,42]. Working from ideas in [16], in [25] the simplex gradient was generalized so as not to require exactly n + 1 points; an error-controlled approximation can now be found using any finite number of properly-spaced points. ...
... In a similar vein, researchers have also explored methods to approximate full Hessians or partial Hessians. In [21], the authors outline an idea for a "simplex Hessian" that is constructed via quadratic interpolation through (n + 1)(n + 2)/2 well-poised sample points. They further suggested that if only a portion of the Hessian were desired (say the diagonal component), then fewer points could be used. ...
... This paper can be viewed as an extension of the work related to simplex Hessians introduced in [17,21]. We introduce an explicit formula based on matrix algebra concepts to approximate the Hessian of a function. ...
Article
Full-text available
This work presents a novel matrix-based method for constructing an approximation Hessian using only function evaluations. The method requires less computational power than interpolation-based methods and is easy to implement in matrix-based programming languages such as MATLAB. As only function evaluations are required, the method is suitable for use in derivative-free algorithms. For reasonably structured sample sets, the method is proven to create an order-$1$ accurate approximation of the full Hessian. Under more specialized structures, the method is proved to yield order-$2$ accuracy. The underdetermined case, where the number of sample points is fewer than required for full interpolation, is studied and error bounds are developed for the resulting partial Hessians.
... This section briefly covers some of the most popular deterministic derivative-free global optimization algorithms based on various techniques such as the DIRECT (DIvide RECTangles) framework [50], [51], the branch-and-bound paradigm [52], [53], Multi-level Coordinate Search (MCS) [17], and Pattern Search Methodology (PSM) [23]. Two primary criteria guided the selection of algorithms. ...
... In the black-box optimization competition at the Genetic and Evolutionary Computation Conference (GECCO'15), NMSO showcased superior performance in addressing problems characterized by separability, multi-modality, limited evaluation budget, and low dimensionality. The third algorithm is SID-PSM [23], a directional optimization technique known as pattern search, guided by Simplex Derivatives (SD). This algorithm employs a limited set of directions with descent properties. ...
Article
This paper addresses the challenge of selecting the most suitable optimization algorithm by presenting a comprehensive computational comparison between stochastic and deterministic methods. The complexity of algorithm selection arises from the absence of a universal algorithm and the abundance of available options. Manual selection without comprehensive studies can lead to suboptimal or incorrect results. In order to address this issue, we carefully selected twenty-five promising and representative state-of-the-art algorithms from both aforementioned classes. The evaluation with up to the twenty dimensions and large evaluation budgets (105×n) was carried out in a significantly expanded and improved version of the DIRECTGOLib v2.0 library, which included ten distinct collections of primarily continuous test functions. The evaluation covered various aspects, such as solution quality, time complexity, and function evaluation usage. The rankings were determined using statistical tests and performance profiles. When it comes to the problems and algorithms examined in this study, EA4eig, EBOwithCMAR, APGSK-IMODE, 1-DTC-GL, OQNLP, and DIRMIN stand out as superior to other derivative-free solvers in terms of solution quality. While deterministic algorithms can locate reasonable solutions with comparatively fewer function evaluations, most stochastic algorithms require more extensive evaluation budgets to deliver comparable results. However, the performance of stochastic algorithms tends to excel in more complex and higher-dimensional problems. These research findings offer valuable insights for practitioners and researchers, enabling them to tackle diverse optimization problems effectively.
... Researchers from the DFO community have previously explored methods to approximate full Hessians or some of the entries of the Hessian. In [8], the authors outline an idea for a simplex Hessian that is constructed via quadratic interpolation through (n+1)(n+2)/2 wellpoised sample points. They further posit that if only the diagonal entries are desired, then 2n + 1 sample points are sufficient. ...
... It is shown that the GSH is an order-1 accurate approximation of the full Hessian and that the GCSH is an order-2 accurate approximation of the full Hessian. The GSH can be viewed as a generalization of the simplex Hessian discussed in [6,8]. The simplex Hessian requires (n + 1)(n + 2)/2 sample points poised for quadratic interpolation. ...
Preprint
Full-text available
This paper presents two methods for approximating a proper subset of the entries of a Hessian using only function evaluations. These approximations are obtained using the techniques called generalized simplex Hessian and generalized centered simplex Hessian. We show how to choose the matrices of directions involved in the computation of these two techniques depending on the entries of the Hessian of interest. We discuss the number of function evaluations required in each case and develop a general formula to approximate all order-P partial derivatives. Since only function evaluations are required to compute the methods discussed in this paper, they are suitable for use in derivative-free optimization methods.
... Simplex gradients were firstly proposed in [19], when defining the implicit filtering method [34] to optimize functions subject to numerical noise. Since then, they have been used to define new classes of simplex-based direct search methods [46], to develop convergent variants of the simplex of Nelder-Mead [32], or as descent indicators for ordering poll directions [27], when exploring opportunistic variants of DDS. ...
... The selected test set includes only smooth objective functions: 22 from CUTEr [31] and 20 from [27]. Additionally, we tested a simple quadratic function named as Quadratic, described in Experiment 5.1, and a strictly convex function denoted by Strictly Convex 2 (SC2), which is described in Experiment 5.5. ...
... Besides, quite a few excellent software implementations exist for DFO problems. Examples of them are CMA-ES [56], DFO [57], HOPSPACK [58], IMFIL [59], PSwarm [60], SID-PSM [61] and SNOBFIT [62]. The late professor M. J. D. Powell proposed COBYLA [63], UOBYQA [64], NEWUOA [65], BOBYQA [66], LINCOA [67]. ...
Preprint
Full-text available
Derivative-free optimization problems are optimization problems where derivative information is unavailable. The least Frobenius norm updating quadratic interpolation model function is one of the essential under-determined model functions for model-based derivative-free trust-region methods. This article proposes derivative-free optimization with transformed objective functions and gives a trust-region method with the least Frobenius norm model. The model updating formula is based on Powell's formula. The method shares the same framework with those for problems without transformations, and its query scheme is given. We propose the definitions related to optimality-preserving transformations to understand the interpolation model in our method. We prove the existence of model optimality-preserving transformations beyond translation transformation. The necessary and sufficient condition for such transformations is given. The affine transformation with a positive multiplication coefficient is not model optimality-preserving. We also analyze the corresponding least Frobenius norm updating model and its interpolation error when the objective function is affinely transformed. Convergence property of a provable algorithmic framework containing our model is given. Numerical results of solving test problems and a real-world problem with the implementation NEWUOA-Trans show that our method can successfully solve most problems with objective optimality-preserving transformations, even though such transformations will change the optimality of the model function. To our best knowledge, this is the first work providing the model-based derivative-free algorithm and analysis for transformed problems with the function evaluation oracle (not the function-value comparison oracle). This article also proposes the ``moving-target'' optimization problem.
Article
The centred simplex gradient (CSG) is a popular gradient approximation technique in derivative-free optimization. Its computation requires a perfectly symmetric set of sample points and is known to provide an accuracy of $\mathcal {O}(\varDelta ^2)$, where $\varDelta $ is the radius of the sampling set. In this paper, we consider the situation where the set of sample points is not perfectly symmetric. By adapting the formula for the CSG to compensate for the misaligned points, we define a new Adapted-CSG. We study the error bounds and the numerical stability of the Adapted-CSG. We also present numerical examples to demonstrate its properties relative to each new parameter and make a comparison to an alternative method.
Article
Full-text available
Direct search methods have been an area of active research in recent years. On many real-world problems involving computationally expensive and often noisy functions, they are one of the few applicable alternatives. However, although these methods are usually easy to implement, robust and provably convergent in many cases, they suffer from a slow rate of convergence. Usually these methods do not take the local topography of the objective function into account. We present a new algorithm for unconstrained optimisation which is a modification to a basic generating set search method. The new algorithm tries to adapt its search directions to the local topography by accumulating curvature information about the objective function as the search progresses. The curvature information is accumulated over a region thus smoothing out noise and minor discontinuities. We present some theory regarding its properties, as well as numerical results. Preliminary numerical testing shows that the new algorithm outperforms the basic method most of the time, sometimes by significant relative margins, on noisy as well as smooth problems.
Article
Full-text available
A common question asked by users of direct search algorithms is how to use derivative information at iterates where it is available. This paper addresses that question with respect to Generalized Pattern Search (GPS) methods for unconstrained and linearly constrained optimization. Specifically, this paper concentrates on the GPS pollstep. Polling is done to certify the need to refine the current mesh, and it requires O(n) function evaluations in the worst case. We show that the use of derivative information significantly reduces the maximum number of function evaluations necessary for pollsteps, even to a worst case of a single function evaluation with certain algorithmic choices given here. Furthermore, we show that rather rough approximations to the gradient are sufficient to reduce the pollstep to a single function evaluation. We prove that using these less expensive pollsteps does not weaken the known convergence properties of the method, all of which depend only on the pollstep.
Article
Full-text available
We consider derivative free methods based on sampling approaches for nonlinear optimization problems where derivatives of the objective function are not available and cannot be directly approximated. We show how the bounds on the error between an interpolating polynomial and the true function can be used in the convergence theory of derivative free sampling methods. These bounds involve a constant that reflects the quality of the interpolation set. The main task of such a derivative free algorithm is to maintain an interpolation sampling set so that this constant remains small, and at least uniformly bounded. This constant is often described through the basis of Lagrange polynomials associated with the interpolation set. We provide an alternative, more intuitive, definition for this concept and show how this constant is related to the condition number of a certain matrix. This relation enables us to provide a range of algorithms whilst maintaining the interpolation set so that this condition number or the geometry constant remain uniformly bounded. We also derive bounds on the error between the model and the function and between their derivatives, directly in terms of this condition number and of this geometry constant.
Article
It has been shown recently that the efficiency of direct search methods that use opportunistic polling in positive spanning directions can be improved significantly by reordering the poll directions according to descent indicators built from simplex gradients. The purpose of this paper is twofold. First, we analyze the properties of simplex gradients of nonsmooth functions in the context of direct search methods like the Generalized Pattern Search (GPS) and the Mesh Adaptive Direct Search (MADS), for which there exists a convergence analysis in the nonsmooth setting. Our analysis does not require continuous differentiability and can be seen as an extension of the accuracy properties of simplex gradients known for smooth functions. Secondly, we test the use of simplex gradients when pattern search is applied to nonsmooth functions, confirming the merit of the poll ordering strategy for such problems.
Article
In this paper, we introduce a number of ways of making pattern search more efficient by reusing previous evaluations of the objective function, based on the computation of simplex derivatives (e.g., simplex gradients). At each iteration, one can attempt to compute an accurate simplex gradient by identifying a sampling set of previously evaluated points with good geometrical properties. This can be done using only past successful iterates or by considering all past function evaluations. The simplex gradient can then be used to reorder the evaluations of the objective function associated with the directions used in the poll step or to update the mesh size parameter according to a sufficient decrease criterion, neither of which requiresnew function evaluations. A search step can also be tried along the negative simplex gradient at the beginning of the current pattern search iteration. We present these procedures in detail and apply them to a set of problems from the CUTEr collection. Numerical results show that these procedures can enhance significantly the practical performance of pattern search methods.
Article
A direct search method for unconstrained optimization is described. The method makes use of any partial separability structure that the objective function may have. The method uses successively finer nested grids, and minimizes the objective function over each grid in turn. All grids are aligned with the coordinate directions, which allows the partial separability structure of the objective function to be exploited. This has two advantages: it reduces the work needed to calculate function values at the points required and it provides function values at other points as a free by-product. Numerical results show that using partial separability can dramatically reduce the number of function evaluations needed to minimize a function, in some cases allowing problems with thousands of variables to be solved. Results show that the algorithm is effective on strictly C problems and on a class of non-smooth problems.
Article
The initial release of CUTE, a widely used testing environment for optimization software, was de-scribed by Bongartz, et al. [1995]. A new version, now known as CUTEr, is presented. Features include reorganisation of the environment to allow simultaneous multi-platform installation, new tools for, and interfaces to, optimization packages, and a considerably simplified and entirely auto-mated installation procedure for UNIX systems. The environment is fully backward compatible with its predecessor, and offers support for Fortran 90/95 and a general C/C++ Application Programming Interface. The SIF decoder, formerly a part of CUTE, has become a separate tool, easily callable by various packages. It features simple extensions to the SIF test problem format and the generation of files suited to automatic differentiation packages.