Figure 4 - uploaded by Mohammad Azzeh
Content may be subject to copyright.
Boxplot of absolute residuals for COCOMO 

Boxplot of absolute residuals for COCOMO 

Source publication
Conference Paper
Full-text available
Background: Adaptation technique is a crucial task for analogy based estimation. Current adaptation techniques often use linear size or linear similarity adjustment mechanisms which are often not suitable for datasets that have complex structure with many categorical attributes. Furthermore, the use of nonlinear adaptation technique such as neural...

Citations

... The former proposes making use of the experiences of human experts whereas, the latter usually generates estimates based on learning methods. The latter has two distinct advantages over the former such that they have capability to model complex set of relationships between dependent variable and the independent variables, and they are capable to learn from historical project data [2] [27]. ...
... Since these measures tend to behave differently [32], the final outcome of MOPSO is not a single solution but a set of solutions that make a good trade-off between these objective functions. In this study, each possible solution is composed of three variables: (1) number of nearest analogies (k), (2) ...
... Adaptation is a process that attempts to minimize the difference between test observation and each nearest observation and reflects that difference on the derived solution in order to obtain better accuracy. Then all adapted solutions are aggregated using either simple statistical approach such as mean, median or Inverse Ranked Weighted Mean (IRWM) as shown in Eq. (1) and (2), or by more sophisticated approaches such as machine learning algorithms. ...
Article
Full-text available
Analogy Based Effort Estimation (ABE) is one of the prominent methods for software effort estimation. The fundamental concept of ABE is closer to the mentality of expert estimation but with an automated procedure in which the final estimate is generated by reusing similar historical projects. The main key issue when using ABE is how to adapt the effort of the retrieved nearest neighbors. The adaptation process is an essential part of ABE to generate more successful accurate estimation based on tuning the selected raw solutions, using some adaptation strategy. In this study we show that there are three interrelated decision variables that have great impact on the success of adaptation method: (1) number of nearest analogies (k), (2) optimum feature set needed for adaptation, and (3) adaptation weights. To find the right decision regarding these variables, one need to study all possible combinations and evaluate them individually to select the one that can improve all prediction evaluation measures. The existing evaluation measures usually behave differently, presenting sometimes opposite trends in evaluating prediction methods. This means that changing one decision variable could improve one evaluation measure while it is decreasing the others. Therefore, the main theme of this research is how to come up with best decision variables that improve adaptation strategy and thus, the overall evaluation measures without degrading the others. The impact of these decisions together has not been investigated before, therefore we propose to view the building of adaptation procedure as a multi-objective optimization problem. The Particle Swarm Optimization Algorithm (PSO) is utilized to find the optimum solutions for such decision variables based on optimizing multiple evaluation measures
... The former proposes making use of the experiences of human experts whereas, the latter usually generates estimates based on learning methods. The latter has two distinct advantages over the former such that they have capability to model complex set of relationships between dependent variable and the independent variables, and they are capable to learn from historical project data [2] [27]. ...
... Since these measures tend to behave differently [32], the final outcome of MOPSO is not a single solution but a set of solutions that make a good trade-off between these objective functions. In this study, each possible solution is composed of three variables: (1) number of nearest analogies (k), (2) ...
... Adaptation is a process that attempts to minimize the difference between test observation and each nearest observation and reflects that difference on the derived solution in order to obtain better accuracy. Then all adapted solutions are aggregated using either simple statistical approach such as mean, median or Inverse Ranked Weighted Mean (IRWM) as shown in Eq. (1) and (2), or by more sophisticated approaches such as machine learning algorithms. ...
Article
Full-text available
Analogy Based Effort Estimation (ABE) is one of the prominent methods for software effort estimation. The fundamental concept of ABE is closer to the mentality of expert estimation but with an automated procedure in which the final estimate is generated by reusing similar historical projects. The main key issue when using ABE is how to adapt the effort of the retrieved nearest neighbors. The adaptation process is an essential part of ABE to generate more successful accurate estimation based on tuning the selected raw solutions, using some adaptation strategy. In this study we show that there are three interrelated decision variables that have great impact on the success of adaptation method: (1) number of nearest analogies (k), (2) optimum feature set needed for adaptation, and (3) adaptation weights. To find the right decision regarding these variables, one need to study all possible combinations and evaluate them individually to select the one that can improve all prediction evaluation measures. The existing evaluation measures usually behave differently, presenting sometimes opposite trends in evaluating prediction methods. This means that changing one decision variable could improve one evaluation measure while it is decreasing the others. Therefore, the main theme of this research is how to come up with best decision variables that improve adaptation strategy and thus, the overall evaluation measures without degrading the others. The impact of these decisions together has not been investigated before, therefore we propose to view the building of adaptation procedure as a multi-objective optimization problem. The Particle Swarm Optimization Algorithm (PSO) is utilized to find the optimum solutions for such decision variables based on optimizing multiple evaluation measures. We evaluated the proposed approaches over 15 datasets and using 4 evaluation measures. After extensive experimentation we found that: (1) predictive performance of ABE has noticeably been improved, (2) optimizing all decision variables together is more efficient than ignoring any one of them. (3) Optimizing decision variables for each project individually yield better accuracy than optimizing them for the whole dataset.
... Developing adjustment methods for EBA was the core of several research studies (Azzeh, 2011;Chiu and Huang, 2007;Kirsopp et al., 2003;Li et al., 2007;Li et al., 2009;Walkerden and Jeffery, 1999). Azzeh (2012) classified the existing adjustment methods into two main categories according to the procedure they follow: linear and nonlinear. ...
... • Similarity based adjustment (AQUA) (Li et al., 2007). • Model tree (Azzeh, 2011). ...
... The fundamental process of CBR is based on the premise that history almost repeats itself which means that the solution of new case is generated based on using the solutions of similar successful historical cases. CBR has been favored over regression methods because software datasets often exhibit complex structure with some non-normal characteristics and discontinuities [1]. It was also remarked that the predictive performance of CBR is a dataset dependent and subject to a large space of configuration possibilities induced for each dataset [8]. ...
... Chiu and Huang [4] proposed another adjustment based on Genetic Algorithm (GA) to optimize the distance weights based on minimizing performance measure, using k=1…5. Recently, Li et al. [10] used Neural Network (NN) and Azzeh [1] used Model Tree to learn the difference between projects and reflects the difference on the final estimate, with k=1…5. We can notice that these studies use limited number of features (e.g. ...
Conference Paper
Full-text available
Case-Based Reasoning (CBR) is considered as one of the efficient methods in the area of software effort estimation because of its outstanding performance and capability of handling noisy datasets. This study examines the performance of multi-objective Particle Swarm Optimization algorithm to find the best configuration parameters for the adaptation process. Particularly, we propose a new adaptation method for which its parameters can be optimized by making tradeoff between multiple accuracy measures. The proposed adaptation is fully automated and able to dynamically adapt each case in the dataset individually. Based on empirical validation over 8 datasets, the performance figures have seen good improvements against conventional CBR and some adapted versions of CBR. Keywords—Case-Based Reasoning; adaptation method; software effort estimation; multi-objective particle swarm optimization;
Article
Software effort estimation is a vital process in the software industry for successfully administering 5Ds of the software development life cycle (SDLC). The 5Ds stand for demand, development, direction, deployment, and designated cost of the software. Software development effort estimation (SDEE) is an effort prediction mechanism to calculate the effort for the development of the software product in order to minimize the challenges in the software field. Academics and practitioners are striving to identify which machine learning estimation technique yields more accurate results based on evaluation metrics, datasets, and other pertinent aspects. The feature selection techniques impact accuracy by selecting the main and relevant features in the dataset and eliminating the redundant and irrelevant features in the dataset. To achieve accurate estimations, the paper utilizes feature selection algorithms, along with various machine learning techniques, which predict the desired effort and the performance of the model has been measured in terms of prediction accuracy, value, relative error, and mean absolute error. The datasets China and Maxwell are trained with the relevant features by applying feature selection algorithms, and estimation techniques are applied to predict the effort. The performance is compared with the regression models and feature selection techniques utilized by many authors previously. The result of the proposed methodology significantly gives the best performance with the combination of feature selection and estimation models than all regression models when applied alone, to both datasets. From the results, it is perceptible that random forest is performing well with the feature selection techniques and obtains the highest prediction accuracy of 99.33% with the China and 89.47% with the Maxwell datasets.
Article
Full-text available
Analogy-based estimation (ABE) estimates the effort of the current project based on the information of similar past projects. The solution function of ABE provides the final effort prediction of a new project. Many studies on ABE in the past have provided various solution functions, but its effectiveness can still be enhanced. The present study is an attempt to improve the effort prediction accuracy of ABE by proposing a solution function SABE: Stacking regularization in analogy-based software effort estimation. The core of SABE is stacking, which is a machine learning technique. Stacking is beneficial as it works on multiple models harnessing their capabilities and provides a better estimation accuracy as compared to a single model. The proposed method is validated on four software effort estimation datasets and compared with the already existing solution functions: closet analogy, mean, median and inverse distance weighted mean. The evaluation criteria used are mean magnitude of relative error (MMRE), median magnitude of relative error (MdMRE), prediction (PRED) and standard accuracy (SA). The results suggested that the SABE showed promising performance for almost all the evaluation criteria when compared with the results of the earlier studies.
Article
Full-text available
The immense increase in software technology has resulted in the convolution of software projects. Software effort estimation is fundamental to commence any software project and inaccurate estimation may lead to several complications and setbacks for present and future projects. Several techniques have been following for ages of the software effort estimation. As the application of software is extensively increased in its size and complexity, the traditional methods aren’t adequate to meet the requirements. To achieve the accurate estimation of software effort, in this paper, a gradient boosting regressor model is proposed as a robust approach. The performance is compared with regression models such as stochastic gradient descent, K-nearest neighbor, decision tree, bagging regressor, random forest regressor, Ada-boost regressor, and gradient boosting regressor by employing COCOMO’81 containing 63 projects and CHINA of 499 projects. The regression models are evaluated by the evaluation metrics such as MAE, MSE, RMSE, and R2. From the results, it is evident that the gradient boosting regressor model is performing well by obtaining an accuracy of 98% with COCOMO’81 and 93% with CHINA dataset. The proposed method significantly performs better than all regression models used in comparison with both the datasets.
Preprint
Full-text available
It is well recognized that the project productivity is a key driver in estimating software project effort from Use Case Point size metric at early software development stages. Although, there are few proposed models for predicting productivity, there is no consistent conclusion regarding which model is the superior. Therefore, instead of building a new productivity prediction model, this paper presents a new ensemble construction mechanism applied for software project productivity prediction. Ensemble is an effective technique when performance of base models is poor. We proposed a weighted mean method to aggregate predicted productivities based on average of errors produced by training model. The obtained results show that the using ensemble is a good alternative approach when accuracies of base models are not consistently accurate over different datasets, and when models behave diversely.
Article
Full-text available
The problem of estimating the effort for software packages is one of the most significant challenges encountering software designers. The precision in estimating the effort or cost can have a huge impact on software development. Various methods have been investigated in order to discover good enough solutions to this problem; lately evolutionary intelligent techniques are explored like Genetic Algorithms, Genetic Programming, Neural Networks, and Swarm Intelligence. In this work, Gene Expression Programming (GEP) is investigated to show its efficiency in acquiring equations that best estimates software effort. Datasets employed are taken from previous projects. The comparisons of learning and testing results are carried out with COCOMO, Analogy, GP and four types of Neural Networks, all show that GEP outperforms all these methods in discovering effective functions for the estimation with robustness and efficiency.