Framework of the proposed algorithm

Framework of the proposed algorithm

Source publication
Article
Full-text available
Imbalanced data classification is one of the major problems in machine learning. This imbalanced dataset typically has significant differences in the number of data samples between its classes. In most cases, the performance of the machine learning algorithm such as Support Vector Machine (SVM) is affected when dealing with an imbalanced dataset. T...

Citations

... The proposed method is more stable in classifying imbalanced data and exhibits better performance in many cases. Hussein et al. [26] proposed a hybrid approach combining data pre-processing techniques and SVM optimized by improved Simulated Annealing (SA). It generated new minority samples removed redundancy and duplicated majority samples to equalize the number of samples between classes first, and then used the improved SA algorithm to search optimum penalty parameters for the SVM. ...
... Therefore, they suggest applying the model to standard high-dimensional data. Hussein et al. [37] proposed a hybrid strategy using an enhanced simulated annealing (SA)-based SVM algorithm and a data preprocessing technique, achieving 89.65% accuracy. Their research focused solely on binary classification, without addressing multi-class classification issues. ...
... It generated columns with features obtained from the target values of the dataset and used these values to determine similarities between the validation data and the prediction data. The target value's role was to forecast performance based on these similarities [37]. Since all processing occurred concurrently during testing and involved an iterative process of training samples, it calculated comparable values each time to form the clustering results [47]. ...
... For probability prediction in classification algorithms, BS, ECE, and MCE are considered the most reliable performance indicators. Additionally, the calibration curve, also known as the reliability curve, was utilized to visualize the categorization of the dataset used for probability prediction [17,37]. Equations 6 to 8 provide the mathematical expressions for calculating BS or log-loss, ECE, and MCE, as follows: ...