Contexts in source publication

Context 1
... and He used a double-elimination tournament with quality indicator ensemble successfully to rank MOEAs [12], [13]. The whole process is shown in Figure 1. As input, we give the MOEAs to be ranked. ...
Context 2
... EARS framework is shown in Figure 2. Before the tournament start we have to select the MOEAs that will participate, QIs for the ensemble and one or more problems to be solved. This step is similar as in DET/QIE (Figure 1), except we are not limited to one problem. In the next step selected MOEAs solve the given problems. ...

Citations

... The resulting Pareto-frontiers were evaluated using three different quality indicators, namely, hypervolume, spread and coverage over Pareto front (CPF). The selected quality indicators assess different aspects of quality, which is very important, since a single quality indicator cannot measure all aspects successfully at once [12,13]. ...
Article
Full-text available
Traveling Salesman Problems (TSPs) have been a long-lasting interesting challenge to researchers in different areas. The difficulty of such problems scales up further when multiple objectives are considered concurrently. Plenty of work in evolutionary algorithms has been introduced to solve multi-objective TSPs with promising results, and the work in deep learning and reinforcement learning has been surging. This paper introduces a multi-objective deep graph pointer network-based reinforcement learning (MODGRL) algorithm for multi-objective TSPs. The MODGRL improves an earlier multi-objective deep reinforcement learning algorithm, called DRL-MOA, by utilizing a graph pointer network to learn the graphical structures of TSPs. Such improvements allow MODGRL to be trained on a small-scale TSP, but can find optimal solutions for large scale TSPs. NSGA-II, MOEA/D and SPEA2 are selected to compare with MODGRL and DRL-MOA. Hypervolume, spread and coverage over Pareto front (CPF) quality indicators were selected to assess the algorithms’ performance. In terms of the hypervolume indicator that represents the convergence and diversity of Pareto-frontiers, MODGRL outperformed all the competitors on the three well-known benchmark problems. Such findings proved that MODGRL, with the improved graph pointer network, indeed performed better, measured by the hypervolume indicator, than DRL-MOA and the three other evolutionary algorithms. MODGRL and DRL-MOA were comparable in the leading group, measured by the spread indicator. Although MODGRL performed better than DRL-MOA, both of them were just average regarding the evenness and diversity measured by the CPF indicator. Such findings remind that different performance indicators measure Pareto-frontiers from different perspectives. Choosing a well-accepted and suitable performance indicator to one’s experimental design is very critical, and may affect the conclusions. Three evolutionary algorithms were also experimented on with extra iterations, to validate whether extra iterations affected the performance. The results show that NSGA-II and SPEA2 were greatly improved measured by the Spread and CPF indicators. Such findings raise fairness concerns on algorithm comparisons using different fixed stopping criteria for different algorithms, which appeared in the DRL-MOA work and many others. Through these lessons, we concluded that MODGRL indeed performed better than DRL-MOA in terms of hypervolumne, and we also urge researchers on fair experimental designs and comparisons, in order to derive scientifically sound conclusions.
... When performing benchmarks, there are many aspects to consider, such as: Which optimization problems should be included in the benchmark, which algorithms should be used in the comparison, how to set the control parameters of the algorithms used in the comparison, which statistical method should be used [1][2][3][4][5], in the case of multi-objective problems which quality indicators should be considered [6], which stopping criterion should be used. There are many papers [7,8] which try to answer these questions, but they still remain open and require further exploration. ...
... When all algorithms have been added to the benchmark, we can start it. The algorithms are compared using a novel method called CRS4EAs (Chess Rating System for Evolutionary Algorithms), which has been applied successfully in many works [6,29,43,[82][83][84][85][86][87][88][89][90][91]. CRS4EAs is comparable to the null hypothesis significance tests but has many advantages. ...
Article
Full-text available
Evolutionary algorithms have been shown to be very effective in solving complex optimization problems. This has driven the research community in the development of novel, even more efficient evolutionary algorithms. The newly proposed algorithms need to be evaluated and compared with existing state-of-the-art algorithms, usually by employing benchmarks. However, comparing evolutionary algorithms is a complicated task, which involves many factors that must be considered to ensure a fair and unbiased comparison. In this paper, we focus on the impact of stopping criteria in the comparison process. Their job is to stop the algorithms in such a way that each algorithm has a fair opportunity to solve the problem. Although they are not given much attention, they play a vital role in the comparison process. In the paper, we compared different stopping criteria with different settings, to show their impact on the comparison results. The results show that stopping criteria play a vital role in the comparison, as they can produce statistically significant differences in the rankings of evolutionary algorithms. The experiments have shown that in one case an algorithm consumed 50 times more evaluations in a single generation, giving it a considerable advantage when max gen was used as the stopping criterion, which puts the validity of most published work in question.
... Since a single performance indicator measures only a certain quality aspect, researchers are often interested in comparing the performance of algorithms using a set of performance indicators. To consider the influence of more than one performance indicator, ensembles can be applied [15]. There are many ensembling techniques that can be used, such as voting-based methods, regression-based methods, and simple statistics. ...
Article
Full-text available
DSCTool is a statistical tool for comparing performance of stochastic optimization algorithms on a single benchmark function (i.e. single-problem analysis) or a set of benchmark functions (i.e., multiple-problem analysis). DSCTool implements a recently proposed approach, called Deep Statistical Comparison (DSC), and its variants. DSC ranks optimization algorithms by comparing distributions of obtained solutions for a problem instead of using a simple descriptive statistic such as the mean or the median. The rankings obtained for an individual problem give the relations between the performance of the applied algorithms. To compare optimization algorithms in the multiple-problem scenario, an appropriate statistical test must be applied to the rankings obtained for a set of problems. The main advantage of DSCTool are its REST web services, which means all its functionalities can be accessed from any programming language. In this paper, we present the DSCTool in detail with examples for its usage.
... Therefore, a fixed number of indicators seems not sufficient to make a comprehensive measure for MOEAs [65,66]. To address this issue, we plan in our future work to use ensemble performance indicators [67][68][69][70]. ...
Article
Modern software development builds on external Web services reuse as a promising way that allows developers delivering feature-rich software by composing existing Web service Application Programming Interfaces, known as APIs. With the overwhelming number of Web services that are available on the Internet, finding the appropriate Web services for automatic service composition, i.e., mashup creation, has become a time-consuming, difficult, and error-prone task for software designers and developers when done manually. To help developers, a number of approaches and techniques have been proposed to automatically recommend Web services. However, they mostly focus on recommending individual services. Nevertheless, in practice, service APIs are intended to be used together forming a social network between different APIs, thus should be recommended collectively. In this paper, we introduce a novel automated approach, called SerFinder, to recommend service sets for automatic mashup creation. We formulate the service set recommendation as a multi-objective combinatorial problem and use the non-dominated sorting genetic algorithm (NSGA-II) as a search method to extract an optimal set of services to create a given mashup. We aim at guiding the search process towards generating the adequate compromise among three objectives to be optimized (i) maximize services historical co-usage, (ii) maximize services functional matching with the mashup requirements, and (iii) maximize services functional diversity. We perform a large-scale empirical experiment to evaluate SerFinder on a benchmark of real-world mashups and services. The obtained results demonstrate the effectiveness of SerFinder in comparison with recent existing approaches for mashup creation and services recommendation. The statistical analysis results provide an empirical evidence that SerFinder, significantly outperforms four state-of-the-art widely-used multi-objective search-based algorithms as well as random search.
... The rating acts as the configuration's fitness. Because different QIs are used in the assessment, the rating (fitness) reflects different aspects of quality [7]. The same Chess Rating System is incorporated in the Evolutionary Algorithms Rating System (EARS) framework [8]. ...
Article
Full-text available
Multi-Objective Evolutionary Algorithms (MOEAs) have been applied successfully for solving real-world multi-objective problems. Their success can depend highly on the configuration of their control parameters. Different tuning methods have been proposed in order to solve this problem. Tuning can be performed on a set of problem instances in order to obtain robust control parameters. However, for real-world problems, the set of problem instances at our disposal usually are not very plentiful. This raises the question: What is a sufficient number of problems used in the tuning process to obtain robust enough parameters? To answer this question, a novel method called MOCRS-Tuning was applied on different sized problem sets for the real-world integration and test order problem. The configurations obtained by the tuning process were compared on all the used problem instances. The results show that tuning greatly improves the algorithms’ performance and that a bigger subset used for tuning does not guarantee better results. This indicates that it is possible to obtain robust control parameters with a small subset of problem instances, which also substantially reduces the time required for tuning.
... The results of the evaluation shows that the method is performing more or less as a majority vote. The same idea was used by Ravber et al. [25], where instead of double elimination tournament, they used the chess rating system based on the Glicko-2 system [15]. The comparison between two approximation sets was made by a randomly selected quality indicator from the ensemble. ...
Chapter
To find the strengths and weaknesses of a new multi-objective optimization algorithm, we need to compare its performance with the performances of the state-of-the-art algorithms. Such a comparison involves a selection of a performance metric, a set of benchmark problems, and a statistical test to ensure that the results are statistical significant. There are also studies in which instead of using one performance metric, a comparison is made using a set of performance metrics. All these studies assume that all involved performance metrics are equal. In this paper, we introduce a data-driven preference-based approach that is a combination of multiple criteria decision analysis with deep statistical rankings. The approach ranks the algorithms for each benchmark problem using the preference (the influence) of each performance metric that is estimated using its entropy. Experimental results show that this approach achieved similar rankings to a previously proposed method, which is based on the idea of the majority vote, where all performance metrics are assumed equal. However, as it will be shown, this approach can give different rankings because it is based not only on the idea of counting wins, but also includes information about the influence of each performance metric.
... By using a self-adaptive algorithm, we removed the additional parameters needed for the tuning process. The solutions in the population are evaluated with a Chess Rating System with a Quality Indicator Ensemble (CRS4MOEA/QIE) [10]. The Quality Indicator Ensemble ensures that the outcome of each candidate solution is evaluated with different Quality Indicators (QIs), making sure that different aspects of quality are taken into account [11]. ...
... For comparing and evaluating MOEAs we used a novel method called Chess Rating System with a Quality Indicator Ensemble (CRS4MOEA/QIE) [10]. CRS4MOEA/QIE uses the Glicko-2 system [18] to rate and rank players. ...
... The [28]. The diversity of QIs ensures that all the aspects of quality are covered [10]. In all experiments, MOEAs solved the ITO problem on eight previously mentioned systems for which the stopping criteria was set to 300,000 evaluations for each system. ...
Chapter
Full-text available
Multi-Objective Evolutionary Algorithms (MOEAs) are one of the most used search techniques in Search-Based Software Engineering (SBSE). However, MOEAs have many control parameters which must be configured for the problem at hand. This can be a very challenging task by itself. To make matters worse, in Multi-Objective Optimization (MOO) different aspects of quality of the obtained Pareto front need to be taken in to account. A novel method called MOCRS-Tuning is proposed to address this problem. MOCRS-Tuning is a meta-evolutionary algorithm which uses a chess rating system with quality indicator ensemble. The chess rating system enables us to determine the performance of an MOEA on different problems easily. The ensemble of quality indicators ensures that different aspects of quality are considered. The tuning was carried out on five different MOEAs on the Integration and Test Order Problem (ITO). The experimental results show significant improvement after tuning of all five MOEAs used in the experiment.
... The main contributions of this paper are: A detailed analysis of QIs using a novel method called CRS4EAs, and finding coherent and robust QIs to increase the reliability of the assessment of Pareto approximation sets and, thereby, also increase the legitimacy of claims about MOEA performance. This paper is an extended version of the conference paper published in [15], where a CRS4EAs was first introduced for comparing MOEAs. We extended our previous work with a more detailed analysis of QIs and added a second scenario where we repeat the experiment on a real world problem to increase the practical value of the research. ...
Article
Full-text available
Evaluating and comparing multi-objective optimizers is an important issue. But, when doing a comparison, it has to be noted that the results can be influenced highly by the selected Quality Indicator. Therefore, the impact of individual Quality Indicators on the ranking of Multi-objective Optimizers in the proposed method must be analyzed beforehand. In this paper the comparison of several different Quality Indicators with a method called Chess Rating System for Evolutionary Algorithms (CRS4EAs) was conducted in order to get a better insight on their characteristics and how they affect the ranking of Multi-objective Evolutionary Algorithms (MOEAs). Although it is expected that Quality Indicators with the same optimization goals would yield a similar ranking of MOEAs, it has been shown that results can be contradictory and significantly different. Consequently, revealing that claims about the superiority of one MOEA over another can be misleading.
Article
Full-text available
Time-consuming objective functions are inevitable in practical engineering optimization problems. This kind of function makes the implementation of metaheuristic methodologies challenging since the designer must compromise between the quality of the final solution and the overall runtime of the optimization procedure. This paper proposes a novel multi-objective optimization algorithm called FC-MOEO/AEP appropriate for highly time-consuming objective functions. It is based on an equilibrium optimizer equipped with an archive evolution path (AEP) mechanism. The AEP mechanism considers the evolutionary trajectory of the decision space and anticipates the potential regions for optimal solutions. Besides having a high convergence rate, the newly proposed approach also possesses an intelligent balance between exploration and exploitation capabilities, enabling the algorithm to effectively avoid getting stuck in the local Pareto. To assess the efficacy of the FC-MOEO/AEP, a range of mathematical optimization problems and three real-world structural design problems were employed. In order to gauge the method's performance against other approaches, a novel performance indicator known as convergence speed was introduced and utilized, in addition to standard metrics. The numerical findings demonstrate the robust and consistent performance of the FC-MOEO/AEP in tackling complex multi-objective problems.
Article
Full-text available
Making a statistical comparison of meta-heuristic multi-objective optimization algorithms is crucial for identifying the strengths and weaknesses of a newly proposed algorithm. Currently, state-of-the-art comparison approaches involve user-preference-based selection of a single quality indicator or an ensemble of quality indicators as a comparison metric. Using these quality indicators, high-dimensional data is transformed into one-dimensional data. By doing this, information contained in the high-dimensional space can be lost, which will affect the results of the comparison. To avoid losing this information, we propose a novel ranking scheme that compares the distributions of high-dimensional data. Experimental results show that the proposed approach reduces potential information loss when statistical significance is not observed in high-dimensional data. Consequently, the selection of a quality indicator is required only in cases when statistical significance is observed in high-dimensional data. With this the cases that are affected by the user preference selection are reduced.