Table 3 - uploaded by Jay Lee
Content may be subject to copyright.
Classification accuracy of test sets using ten RBF kernel SVM classifiers

Classification accuracy of test sets using ten RBF kernel SVM classifiers

Source publication
Article
Full-text available
Ensemble classification – combining the results of a set of base learners – has received much attention in the machine learning community and has demonstrated promising capabilities in improving classification accuracy. Compared with neural network or decision tree ensembles, there is no comprehensive empirical research in support vector machine (S...

Contexts in source publication

Context 1
... average accuracy and average standard deviation of the testing set over all 20 data sets for ensembles incorporating from 5 to 50 base SVM learners with different kernel functions are shown in Fig. 6-11. As an example, the accuracy and standard deviation (in the parentheses) of a testing set with 10 base SVM classifiers is shown in Table 3-5. Polynomial function Fig. 7. Average standard deviation of RBF SVM ensembles over 20 data sets ...
Context 2
... shown in Tables 3 to 5, BaggingSVM performed the best in 9, 8, 9 out of 20 data sets for RBF, linear and polynomial kernel functions, respectively. As a general technique for ensembled SVMs, bagging with a polynomial kernel function appears to provide the best performance and generalization. ...

Similar publications

Preprint
Full-text available
Current applications of quantum machine learning are emerging, dueto the potential benefits that quantum technologies could bring in thenear future. One of the most recent developments is tensor network-based architectures, to explore the feasibility of the application of thismethod to healthcare, in this paper tensor networks are applied to theIEE...
Preprint
Full-text available
The accuracy of coronary artery disease (CAD) diagnosis is dependent on a variety of factors, including demographic, symptom, and medical examination, ECG, and echocardiography data, among others. In this context, artificial intelligence (AI) can help clinicians identify high-risk patients early in the diagnostic process, by synthesizing informatio...
Article
Full-text available
This paper estimates the value of information for highly uncertain projects whose decisions have long-term impacts. We present a mathematically consistent framework using decision trees, Bayesian updating, and Monte Carlo simulation to value future information today, even when that future information is imperfect. One drawback of Monte Carlo method...
Preprint
Full-text available
This research paper investigates the effectiveness of simple linear models versus complex machine learning techniques in breast cancer diagnosis, emphasizing the importance of interpretability and computational efficiency in the medical domain. We focus on Logistic Regression (LR), Decision Trees (DT), and Support Vector Machines (SVM) and optimize...
Article
Full-text available
In recent years, with the advances in remote sensing and geospatial technology, various machine learning algorithms found applications in determining potentially flooded areas, which have an important place in basin planning and depend on various environmental parameters. This study uses ensemble models of decision trees (DT), gradient boosting tre...

Citations

... The limitation of the HDP method is that with the exponential increase in the total number of topics, and this scheme becomes unrealistic [19]. ...
Chapter
Full-text available
Text mining is a popular research area in the field of computer science and engineering that enables the processing of natural language which has applications in the area of aerospace, biomedical, and so on. Text mining unsheathes the unknown information present in the data such that the extraction of the data seems to be effective. Text classification is a subdomain of the text mining that plays a major role in labelling the documents based on their semantic meaning and context. Different Machine Learning algorithms are available to classify the available text documents. The main contribution of this paper is use of semantic analysis with Lion Optimization Algorithm and Neural Network architecture. The semantic analysis technique is used for the text classification through semantic keywords rather than using independent features of keywords in the documents. Lion Optimization Algorithm is used to adjust the weight of the Neural Network to maximize the efficiency of the classifier. Two well-known open source dataset namely 20 Newsgroups and Reuters-21578 are used for the experimentation and evaluate the performance of the classification algorithms. Significant improvement in all three performance parameters in terms of accuracy, specificity, and sensitivity is observed. The maximum values observed with our proposed algorithm are 91.86, 95.54, and 84.96 for accuracy, sensitivity, and specificity, respectively.
... Qutub et al. [42] use ensemble learning with a Decision tree, logistic regression, and SVM for classifying the IQ on the IBM HR analytics data and the ensemble learning model outperforms all three models. The performance of SVM ensembles outperforms that of other approaches with a higher degree of generality [43]. The suggested ensemble technique focuses on a weighted classification model based on individual projected accuracy. ...
... Wang et al. [19] focused on ensemble classification. They conducted an analysis and comparison of SVM ensembles using four different ensemble constructing techniques. ...
Article
Full-text available
Credit scoring models serve as pivotal instruments for lenders and financial institutions, facilitating the assessment of creditworthiness. Traditional models, while instrumental, grapple with challenges related to efficiency and subjectivity. The advent of machine learning heralds a transformative era, offering data-driven solutions that transcend these limitations. This research delves into a comprehensive analysis of various machine learning algorithms, emphasizing their mathematical underpinnings and their applicability in credit score classification. A comprehensive evaluation is conducted on a range of algorithms, including logistic regression, decision trees, support vector machines, and neural networks, using publicly available credit datasets. Within the research, a unified mathematical framework is introduced, which encompasses preprocessing techniques and critical algorithms such as Particle Swarm Optimization (PSO), the Light Gradient Boosting Model, and Extreme Gradient Boosting (XGB), among others. The focal point of the investigation is the LIME (Local Interpretable Model-agnostic Explanations) explainer. This study offers a comprehensive mathematical model using the LIME explainer, shedding light on its pivotal role in elucidating the intricacies of complex machine learning models. This study’s empirical findings offer compelling evidence of the efficacy of these methodologies in credit scoring, with notable accuracies of 88.84%, 78.30%, and 77.80% for the Australian, German, and South German datasets, respectively. In summation, this research not only amplifies the significance of machine learning in credit scoring but also accentuates the importance of mathematical modeling and the LIME explainer, providing a roadmap for practitioners to navigate the evolving landscape of credit assessment.
... Another distinctive feature of our approach is that the SVM is embedded in an ensemble method (see Petrides and Verbeke (2022) and references therein for more details regarding ensemble methods) which, as will be shown, means an improvement in performance (De Bock and Van den Poel (2012); Wang et al. (2009);Benítez-Peña et al. (2021)). It is known that, in order to solve the SVM problem (1), a tuning process concerning the regularization parameters in the grid Θ needs to be performed. ...
... In this section we present our methodology to obtain point estimates for the posterior class probabilities using the SVM classifier together with a bagging procedure (Wang et al. (2009);Kim et al. (2002)). First, in Section 2.1 we explain how to integrate a bootstrap sampling into the SVM to produce posterior class probabilities estimates P (y = +1 | x). ...
Preprint
Full-text available
Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for 2-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information which are not fully exploited, and not used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The highlights of the paper are: first, a SVM method is designed to be cost-sensitive, and thus the different importance of sensitivity and speci-ficity is readily accommodated in the model. Second, SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing probabilities estimation in SVM. Numerical tests show the advantages of our procedures.
... Previous authors [44] developed the Adaboost (adaptive boosting) method, which adjusts the weight without requiring prior information on learner learning. Adaboost has been employed in ensembles to increase prediction performance, most notably in neural networks [45], support vector machines [46], and decision trees [47]. The classifier uses an adaptive resampling strategy to select training samples, which means that a misclassified dataset generated by a prior classifier is chosen more frequently than correctly classified ones, allowing a new classifier to perform well in a fresh dataset. ...
Article
Full-text available
Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.
... Additionally, the application of ML is well-known in assessing various phenomena in relation to natural disasters [30], soil erosion, and crop growth management [31]. Several algorithms were used to perform soil suitability maps based on artificial neural networks (ANNs) [30], random forest (RF) [26,32], support vector machines (SVMs) [31,33], K-Nearest Neighbor (K-NN) [34], and Extreme Gradient Boosting (XgbTree) [35]. Likewise, ML methods have demonstrated greater robustness and stability, making them popular and cost-effective in assessing agricultural land potentiality [27]. ...
Article
Full-text available
Increasing agricultural production is a major concern that aims to increase income, reduce hunger, and improve other measures of well-being. Recently, the prediction of soil-suitability has become a primary topic of rising concern among academics, policymakers, and socio-economic analysts to assess dynamics of the agricultural production. This work aims to use physico-chemical and remotely sensed phenological parameters to produce soil-suitability maps (SSM) based on Machine Learning (ML) Algorithms in a semi-arid and arid region. Towards this goal an inventory of 238 suitability points has been carried out in addition to14 physico-chemical and 4 phenological parameters that have been used as inputs of machine-learning approaches which are five MLA prediction, namely RF, XgbTree, ANN, KNN and SVM. The results showed that phenological parameters were found to be the most influential in soil-suitability prediction. The validation of the Receiver Operating Characteristics (ROC) curve approach indicates an area under the curve and an AUC of more than 0.82 for all models. The best results were obtained using the XgbTree with an AUC = 0.97 in comparison to other MLA. Our findings demonstrate an excellent ability for ML models to predict the soil-suitability using physico-chemical and phenological parameters. The approach developed to map the soil-suitability is a valuable tool for sustainable agricultural development, and it can play an effective role in ensuring food security and conducting a land agriculture assessment.
... It attempts to find the optimal hyperplane to separate out different classes. Researchers have used SVM for topic detection (Wang et al., 2009), opinion summarization, key player and event detection. In our proposed method, SVM is used along with radial basis function (RBF) as kernel. ...
Article
Full-text available
In the current times where human safety is threatened by man-made and natural calamities, surveillance systems have gained immense importance. But, even in presence of high definition (HD) security cameras and manpower to monitor the live feed 24/7, room for missing important information due to human error exists. In addition to that employing adequate number of people for the job is not always feasible either. The solution lies in a system that allows automated surveillance through classification and other data mining techniques that can be used for extraction of useful information out of these inputs. In this research, a data mining based framework has been proposed for surveillance. The research includes interpretation of data from different networks using hybrid data mining technique. In order to show the validity of the proposed hybrid data mining technique, an online data set containing network of a suspicious group has been utilized and main leaders of network has been identified.
... -For the LS dataset, the developed classifier was compared with the work [87]. This work was selected as it achieved higher accuracy than the other works [80,85,[88][89][90][91][92][93][94] on the LS dataset. -Finally, for the ULC dataset, we compared the findings of the proposed classifier with the results of [95]. ...
Article
Full-text available
Agriculture is the economic backbone and the main means of livelihood in numerous developing countries. Numerous challenges related to farming and agriculturists exist. Cultivators face crop loss due to inappropriate selection of crops, inappropriate use of fertilizers, alterations in soil, ambiguous conditions in climate, and so on. The type of soil forms a crucial element in agriculture.The class of soil plays an important role in identifying what kind of crop should be planted along with the manure type to be applied. Classification of soil is essential to make effective use of the resources of soil. The texture of the soil has a major impact on crop growth. The role played by soil texture in determining the type of crop to be grown is significant. It is also employed in soil labs for determining the categories of soil. Soil texture plays a major role in determining the suitability of crops and handling famines. Soil chemical properties include “Electrical Conductivity” (\(E_C\)), “Organic Carbon” (\(O_C\)), “Phosphorous” (P), “Potassium” (K), “Power of Hydrogen” (\(P_H\)), “Zinc” (\(Z_n\)), “Boron” (B), and “Sulphur” (S). The crop growth is heavily influenced by the soil’s chemical composition. Keeping these considerations in mind, this work develops a customised decision tree (\(C_{DT}\)) that serves as a soil classifier (SC). A predictive framework is then devised that utilises the \(C_{DT}\) to perform the soil classification based on the texture of the soil and its chemical properties. Extensive experiments on several real-world soil datasets from Karnataka, India; and benchmark agricultural datasets such as seeds, Urban Land Cover (ULC), Satellite Image of Land Data (LS), and Forest Cover Type (FCT) were conducted. The results demonstrated that the designed \(C_{DT}\) classifier outperformed existing classifiers such as k-Nearest Neighbor (\(K_{NN}\)), Logistic Regression (\(L_R\)), Artificial Neural Network (\(A_{NN}\)), Classification and Regression Trees (\(C_{ART}\)), \(C_{4.5}\)), traditional SVM (\(S_{VM}\)), and Random Forest (\(R_F\)) in terms of Accuracy (\(A_{cc}\)), Sensitivity (\(S_{ens}\)), Specificity (\(S_{pec}\)), Precision (\(P_{rec}\)), and F-Score (\(F_S\)) on these datasets. The devised SC was deployed on the Heroku (Hk) cloud for effective access. Effective access in terms of end-user availability at all times was provided. An expert system for soil classification was built to provide information about soil classification round the clock using an internet-enabled device to the stakeholders of agriculture, such as cultivators and agricultural organizations. The agricultural raw data was stored in the form of blob objects on Amazon S3 (AS3).
... based models etc.(Wang et al., 2009). This classifier uses adaptive resampling technique at the time of selection of training samples. ...
Article
Full-text available
Landslides are the indicator of slope instability particularly in mountain terrain and causing different types of reimbursements and threats of life and property. The Himalayan terrains are highly susceptible to different natural hazards as well as disasters particularly land failure activities mainly due to inherent tectonic activities which further enhanced by various Neo-tectonic and Neolithic activities. This scientific study provides an enhanced framework for the assessment of proper and precise landslide susceptibility in the two districts of Arunachal Pradesh (Tawang and West Kameng) considering both physical and anthropogenic factors and various machine learning models (SVM, AdaBoost and XGBoost). At first, landslide inventory maps were developed based on previous landslide events. Here, 70% of the data were randomly selected for training and remaining was used for validation and optimization of the models using statistical implications and validation assessment methods. The result showed that the high and very high landslide susceptible areas are mainly concentrated in the middle portion along the Bhalukpong-Bomdia road section. Based on the AUC value and other statistical indicators it has been observed that AdaBoost is the most efficient model here (AUC = 0.92). AUC values of SVM and XGBoost are 0.85 and 0.89 respectively. AdaBoost model identifies that very low susceptibility class occupies 60.22% area and very high landslide susceptibility class occupies 15.51% area and it will be considered as more encouraging method for landslide susceptibility determination in this kind of cases for better accurateness. This high accuracy susceptibility map positively helps during the execution of various developmental projects.
... Still, there are limitations to defining groups of learning data and having constant parameters specified, causing bias. Researchers, such as Wang, S.-j., et al. [2], advise that "Generally, the effectiveness of integration depends on the diversity and accuracy of agents in data classification". For this reason, various classifications have been tested to help optimize the model's accuracy in providing suggestive information. ...
Article
Full-text available
The decision-making for a suitable area of study in the university seems to be a crucial task for students. The machine learning technique can help provide alternatives based on user profiles. This research proposes an improved predictive model of the subject area for learner groups in higher education. The proposed techniques are focused on hybrid ensemble learning techniques to optimize traditional predictor-building practices by Dimensionality Reduction to model by Neural Networks Autoencoders (NNAE). The results showed that the proposed ensemble NNAE techniques performed better than other ensemble techniques.