Classification accuracy of test sets using ten RBF kernel SVM classifiers

Source publication

Empirical analysis of support vector machine ensemble classifiers

Article

Full-text available

Apr 2009

Ensemble classification – combining the results of a set of base learners – has received much attention in the machine learning community and has demonstrated promising capabilities in improving classification accuracy. Compared with neural network or decision tree ensembles, there is no comprehensive empirical research in support vector machine (S...

Context 1

... average accuracy and average standard deviation of the testing set over all 20 data sets for ensembles incorporating from 5 to 50 base SVM learners with different kernel functions are shown in Fig. 6-11. As an example, the accuracy and standard deviation (in the parentheses) of a testing set with 10 base SVM classifiers is shown in Table 3-5. Polynomial function Fig. 7. Average standard deviation of RBF SVM ensembles over 20 data sets ...

View in full-text

Context 2

... shown in Tables 3 to 5, BaggingSVM performed the best in 9, 8, 9 out of 20 data sets for RBF, linear and polynomial kernel functions, respectively. As a general technique for ensembled SVMs, bagging with a polynomial kernel function appears to provide the best performance and generalization. ...

View in full-text

Figure 4: Tensor block in a quantum circuit

Figure 6: 4 qubit classical TTN on left and Q-TTN ansatz on right

Separate training and testing -Quantum Defaults

Quantum Machine Learning and TensorNetworks as an Aid in the Clinical Diagnosis ofCoronary Artery Disease

Preprint

Full-text available

Feb 2024

Current applications of quantum machine learning are emerging, dueto the potential benefits that quantum technologies could bring in thenear future. One of the most recent developments is tensor network-based architectures, to explore the feasibility of the application of thismethod to healthcare, in this paper tensor networks are applied to theIEE...

Figure 1: The number of publications during 1991-2020

AI Framework for Early Diagnosis of Coronary Artery Disease: An Integration of Borderline SMOTE, Autoencoders and Convolutional Neural Networks Approach

Preprint

Full-text available

Aug 2023

The accuracy of coronary artery disease (CAD) diagnosis is dependent on a variety of factors, including demographic, symptom, and medical examination, ECG, and echocardiography data, among others. In this context, artificial intelligence (AI) can help clinicians identify high-risk patients early in the diagnostic process, by synthesizing informatio...

Valuing Future Information Under Uncertainty Using Polynomial Chaos

Article

Full-text available

Sep 2008

This paper estimates the value of information for highly uncertain projects whose decisions have long-term impacts. We present a mathematically consistent framework using decision trees, Bayesian updating, and Monte Carlo simulation to value future information today, even when that future information is imperfect. One drawback of Monte Carlo method...

Comparison of Logistic Regression (LR), Decision Trees (DT), Support...

The Power Of Simplicity: Why Simple Linear Models Outperform Complex Machine Learning Techniques -- Case Of Breast Cancer Diagnosis

Preprint

Full-text available

Jun 2023

This research paper investigates the effectiveness of simple linear models versus complex machine learning techniques in breast cancer diagnosis, emphasizing the importance of interpretability and computational efficiency in the medical domain. We focus on Logistic Regression (LR), Decision Trees (DT), and Support Vector Machines (SVM) and optimize...

Eastern Mediterranean Basin location and flood inventory

A flow chart representing the general modeling strategy

The ROC curves for the test data of the models

Flood susceptibility maps of ANN models: a ANN, b Ensemble ANN

Flood susceptibility mapping with ensemble machine learning: a case of Eastern Mediterranean basin, Türkiye

Article

Full-text available

Jul 2023

In recent years, with the advances in remote sensing and geospatial technology, various machine learning algorithms found applications in determining potentially flooded areas, which have an important place in basin planning and depend on various environmental parameters. This study uses ensemble models of decision trees (DT), gradient boosting tre...

Automatic Text Document Classification by Using Semantic Analysis and Lion Optimization Algorithm

Chapter

Full-text available

Feb 2024

Text mining is a popular research area in the field of computer science and engineering that enables the processing of natural language which has applications in the area of aerospace, biomedical, and so on. Text mining unsheathes the unknown information present in the data such that the extraction of the data seems to be effective. Text classification is a subdomain of the text mining that plays a major role in labelling the documents based on their semantic meaning and context. Different Machine Learning algorithms are available to classify the available text documents. The main contribution of this paper is use of semantic analysis with Lion Optimization Algorithm and Neural Network architecture. The semantic analysis technique is used for the text classification through semantic keywords rather than using independent features of keywords in the documents. Lion Optimization Algorithm is used to adjust the weight of the Neural Network to maximize the efficiency of the classifier. Two well-known open source dataset namely 20 Newsgroups and Reuters-21578 are used for the experimentation and evaluate the performance of the classification algorithms. Significant improvement in all three performance parameters in terms of accuracy, specificity, and sensitivity is observed. The maximum values observed with our proposed algorithm are 91.86, 95.54, and 84.96 for accuracy, sensitivity, and specificity, respectively.

An ensemble learning model for predicting the intention to quit among employees using classification algorithms

Article

Oct 2023

Mathematical Modeling and Analysis of Credit Scoring Using the LIME Explainer: A Comprehensive Approach

Article

Full-text available

Sep 2023

Credit scoring models serve as pivotal instruments for lenders and financial institutions, facilitating the assessment of creditworthiness. Traditional models, while instrumental, grapple with challenges related to efficiency and subjectivity. The advent of machine learning heralds a transformative era, offering data-driven solutions that transcend these limitations. This research delves into a comprehensive analysis of various machine learning algorithms, emphasizing their mathematical underpinnings and their applicability in credit score classification. A comprehensive evaluation is conducted on a range of algorithms, including logistic regression, decision trees, support vector machines, and neural networks, using publicly available credit datasets. Within the research, a unified mathematical framework is introduced, which encompasses preprocessing techniques and critical algorithms such as Particle Swarm Optimization (PSO), the Light Gradient Boosting Model, and Extreme Gradient Boosting (XGB), among others. The focal point of the investigation is the LIME (Local Interpretable Model-agnostic Explanations) explainer. This study offers a comprehensive mathematical model using the LIME explainer, shedding light on its pivotal role in elucidating the intricacies of complex machine learning models. This study’s empirical findings offer compelling evidence of the efficacy of these methodologies in credit scoring, with notable accuracies of 88.84%, 78.30%, and 77.80% for the Australian, German, and South German datasets, respectively. In summation, this research not only amplifies the significance of machine learning in credit scoring but also accentuates the importance of mathematical modeling and the LIME explainer, providing a roadmap for practitioners to navigate the evolving landscape of credit assessment.

Cost-sensitive probabilistic predictions for support vector machines

Preprint

Full-text available

Jul 2023

Support vector machines (SVMs) are widely used and constitute one of the best examined and used machine learning models for 2-class classification. Classification in SVM is based on a score procedure, yielding a deterministic classification rule, which can be transformed into a probabilistic rule (as implemented in off-the-shelf SVM libraries), but is not probabilistic in nature. On the other hand, the tuning of the regularization parameters in SVM is known to imply a high computational effort and generates pieces of information which are not fully exploited, and not used to build a probabilistic classification rule. In this paper we propose a novel approach to generate probabilistic outputs for the SVM. The highlights of the paper are: first, a SVM method is designed to be cost-sensitive, and thus the different importance of sensitivity and speci-ficity is readily accommodated in the model. Second, SVM is embedded in an ensemble method to improve its performance, making use of the valuable information generated in the parameters tuning process. Finally, the probabilities estimation is done via bootstrap estimates, avoiding the use of parametric models as competing probabilities estimation in SVM. Numerical tests show the advantages of our procedures.

Robustness of Optimized Decision Tree-Based Machine Learning Models to Map Gully Erosion Vulnerability

Article

Full-text available

May 2023

Gully erosion is a worldwide threat with numerous environmental, social, and economic impacts. The purpose of this research is to evaluate the performance and robustness of six machine learning ensemble models based on the decision tree principle: Random Forest (RF), C5.0, XGBoost, treebag, Gradient Boosting Machines (GBMs) and Adaboost, in order to map and predict gully erosion-prone areas in a semi-arid mountain context. The first step was to prepare the inventory data, which consisted of 217 gully points. This database was then randomly subdivided into five percentages of Train/Test (50/50, 60/40, 70/30, 80/20, and 90/10) to assess the stability and robustness of the models. Furthermore, 17 geo-environmental variables were used as potential controlling factors, and several metrics were examined to evaluate the performance of the six models. The results revealed that all of the models used performed well in terms of predicting vulnerability to gully erosion. The C5.0 and RF models had the best prediction performance (AUC = 90.8 and AUC = 90.1, respectively). However, according to the random subdivisions of the database, these models exhibit small but noticeable instability, with high performance for the 80/20% and 70/30% subdivisions. This demonstrates the significance of database refining and the need to test various splitting data in order to ensure efficient and reliable output results.

Assessment of Soil Suitability Using Machine Learning in Arid and Semi-Arid Regions

Article

Full-text available

Jan 2023

Increasing agricultural production is a major concern that aims to increase income, reduce hunger, and improve other measures of well-being. Recently, the prediction of soil-suitability has become a primary topic of rising concern among academics, policymakers, and socio-economic analysts to assess dynamics of the agricultural production. This work aims to use physico-chemical and remotely sensed phenological parameters to produce soil-suitability maps (SSM) based on Machine Learning (ML) Algorithms in a semi-arid and arid region. Towards this goal an inventory of 238 suitability points has been carried out in addition to14 physico-chemical and 4 phenological parameters that have been used as inputs of machine-learning approaches which are five MLA prediction, namely RF, XgbTree, ANN, KNN and SVM. The results showed that phenological parameters were found to be the most influential in soil-suitability prediction. The validation of the Receiver Operating Characteristics (ROC) curve approach indicates an area under the curve and an AUC of more than 0.82 for all models. The best results were obtained using the XgbTree with an AUC = 0.97 in comparison to other MLA. Our findings demonstrate an excellent ability for ML models to predict the soil-suitability using physico-chemical and phenological parameters. The approach developed to map the soil-suitability is a valuable tool for sustainable agricultural development, and it can play an effective role in ensuring food security and conducting a land agriculture assessment.

Applying Data Mining in Surveillance: Detecting Suspicious Activity on Social Networks

Article

Full-text available

Jan 2023

In the current times where human safety is threatened by man-made and natural calamities, surveillance systems have gained immense importance. But, even in presence of high definition (HD) security cameras and manpower to monitor the live feed 24/7, room for missing important information due to human error exists. In addition to that employing adequate number of people for the job is not always feasible either. The solution lies in a system that allows automated surveillance through classification and other data mining techniques that can be used for extraction of useful information out of these inputs. In this research, a data mining based framework has been proposed for surveillance. The research includes interpretation of data from different networks using hybrid data mining technique. In order to show the validity of the proposed hybrid data mining technique, an online data set containing network of a suspicious group has been utilized and main leaders of network has been identified.

Customized decision tree-based approach for classification of soil on cloud environment

Article

Full-text available

Dec 2022
COMPUTING

Agriculture is the economic backbone and the main means of livelihood in numerous developing countries. Numerous challenges related to farming and agriculturists exist. Cultivators face crop loss due to inappropriate selection of crops, inappropriate use of fertilizers, alterations in soil, ambiguous conditions in climate, and so on. The type of soil forms a crucial element in agriculture.The class of soil plays an important role in identifying what kind of crop should be planted along with the manure type to be applied. Classification of soil is essential to make effective use of the resources of soil. The texture of the soil has a major impact on crop growth. The role played by soil texture in determining the type of crop to be grown is significant. It is also employed in soil labs for determining the categories of soil. Soil texture plays a major role in determining the suitability of crops and handling famines. Soil chemical properties include “Electrical Conductivity” (\(E_C\)), “Organic Carbon” (\(O_C\)), “Phosphorous” (P), “Potassium” (K), “Power of Hydrogen” (\(P_H\)), “Zinc” (\(Z_n\)), “Boron” (B), and “Sulphur” (S). The crop growth is heavily influenced by the soil’s chemical composition. Keeping these considerations in mind, this work develops a customised decision tree (\(C_{DT}\)) that serves as a soil classifier (SC). A predictive framework is then devised that utilises the \(C_{DT}\) to perform the soil classification based on the texture of the soil and its chemical properties. Extensive experiments on several real-world soil datasets from Karnataka, India; and benchmark agricultural datasets such as seeds, Urban Land Cover (ULC), Satellite Image of Land Data (LS), and Forest Cover Type (FCT) were conducted. The results demonstrated that the designed \(C_{DT}\) classifier outperformed existing classifiers such as k-Nearest Neighbor (\(K_{NN}\)), Logistic Regression (\(L_R\)), Artificial Neural Network (\(A_{NN}\)), Classification and Regression Trees (\(C_{ART}\)), \(C_{4.5}\)), traditional SVM (\(S_{VM}\)), and Random Forest (\(R_F\)) in terms of Accuracy (\(A_{cc}\)), Sensitivity (\(S_{ens}\)), Specificity (\(S_{pec}\)), Precision (\(P_{rec}\)), and F-Score (\(F_S\)) on these datasets. The devised SC was deployed on the Heroku (Hk) cloud for effective access. Effective access in terms of end-user availability at all times was provided. An expert system for soil classification was built to provide information about soil classification round the clock using an internet-enabled device to the stakeholders of agriculture, such as cultivators and agricultural organizations. The agricultural raw data was stored in the form of blob objects on Amazon S3 (AS3).

Modelling and predicting of landslide in Western Arunachal Himalaya, India

Article

Full-text available

Dec 2022

Landslides are the indicator of slope instability particularly in mountain terrain and causing different types of reimbursements and threats of life and property. The Himalayan terrains are highly susceptible to different natural hazards as well as disasters particularly land failure activities mainly due to inherent tectonic activities which further enhanced by various Neo-tectonic and Neolithic activities. This scientific study provides an enhanced framework for the assessment of proper and precise landslide susceptibility in the two districts of Arunachal Pradesh (Tawang and West Kameng) considering both physical and anthropogenic factors and various machine learning models (SVM, AdaBoost and XGBoost). At first, landslide inventory maps were developed based on previous landslide events. Here, 70% of the data were randomly selected for training and remaining was used for validation and optimization of the models using statistical implications and validation assessment methods. The result showed that the high and very high landslide susceptible areas are mainly concentrated in the middle portion along the Bhalukpong-Bomdia road section. Based on the AUC value and other statistical indicators it has been observed that AdaBoost is the most efficient model here (AUC = 0.92). AUC values of SVM and XGBoost are 0.85 and 0.89 respectively. AdaBoost model identifies that very low susceptibility class occupies 60.22% area and very high landslide susceptibility class occupies 15.51% area and it will be considered as more encouraging method for landslide susceptibility determination in this kind of cases for better accurateness. This high accuracy susceptibility map positively helps during the execution of various developmental projects.

The Predictive Model of Higher Education Guidance for Information Overload of Learner Groups Using Hybrid Ensemble Techniques

Article

Full-text available

Nov 2022

The decision-making for a suitable area of study in the university seems to be a crucial task for students. The machine learning technique can help provide alternatives based on user profiles. This research proposes an improved predictive model of the subject area for learner groups in higher education. The proposed techniques are focused on hybrid ensemble learning techniques to optimize traditional predictor-building practices by Dimensionality Reduction to model by Neural Networks Autoencoders (NNAE). The results showed that the proposed ensemble NNAE techniques performed better than other ensemble techniques.

Classification accuracy of test sets using ten RBF kernel SVM classifiers

Contexts in source publication

Similar publications

Citations