Diagram of k-fold cross-validation with k = 10. Image from Karl Rosaen Log http://karlrosaen.com/ml/learning-log/2016-06-20/

Source publication

Introduction to Support Vector Machines and Kernel Methods

Article

Full-text available

Apr 2019

We explain the support vector machine algorithm, and its extension the kernel method, for machine learning using small datasets. We also briefly discuss the Vapnik-Chervonenkis theory which forms the theoretical foundation of machine learning. This review is based on lectures given by the second author.

FIG. 1: An overview of prospective multiple kernel learning methods in...

FIG. 2: Illustration of a hybrid kernel that combines inner products in...

FIG. 3: Examples of trained SVM decision functions from the IQP (Tab....

FIG. 5: Density of optimized kernel weights (γ ⋆ 1 + γ ⋆ 2 = 1) with...

Quantum-Classical Multiple Kernel Learning

Preprint

Full-text available

May 2023

As quantum computers become increasingly practical, so does the prospect of using quantum computation to improve upon traditional algorithms. Kernel methods in machine learning is one area where such improvements could be realized in the near future. Paired with kernel methods like support-vector machines, small and noisy quantum computers can eval...

Fig. 3 Dataset called (a) Circle, (b) Exp, (c) Moon, and (d) Xor

Quantum circuit of UΦ(x) with the encoding function Φ(x) =...

Scheme of the kernel synthesizer composed of different (weak) quantum...

An example for calculating the minimum accuracy in the case of N = 10...

Color map of all the 2-dimensional spaces ai(x) with the encoding...

Analysis and synthesis of feature map for kernel-based quantum classifier

Article

Full-text available

Jul 2020

A method for analyzing the feature map for the kernel-based quantum classifier is developed; that is, we give a general formula for computing a lower bound of the exact training accuracy, which helps us to see whether the selected feature map is suitable for linearly separating the dataset. We show a proof of concept demonstration of this method fo...

Figure 1. SG-GPSVM and Pegasos on a noisy artificial dataset. The above...

Figure 2. The accuracy of four algorithms with different noise factors...

Figure 3. Classification results on the UCI datasets with minibatch...

Accuracy obtained on the UCI datasets using the linear and nonlinear (...

Stochastic Subgradient for Large-Scale Support Vector Machine Using the Generalized Pinball Loss Function

Article

Full-text available

Sep 2021

In this paper, we propose a stochastic gradient descent algorithm, called stochastic gradient descent method-based generalized pinball support vector machine (SG-GPSVM), to solve data classification problems. This approach was developed by replacing the hinge loss function in the conventional support vector machine (SVM) with a generalized pinball...

Outline of quantum SVM model comprising three main parts: train, test,...

Outline of the train part in existing quantum SVM model. First,...

Outline of the test part in existing quantum SVM model. X gate is...

Outline of the train part in the proposed quantum SVM model. First,...

Outline of the test part in the proposed existing quantum SVM model. X...

Pairwise classification using quantum support vector machine with Kronecker kernel

Article

Full-text available

Aug 2022

We investigated the potential application of quantum computing using the Kronecker kernel to pairwise classification and have devised a way to apply the Harrow-Hassidim-Lloyd (HHL)-based quantum support vector machine algorithm. Pairwise classification can be used to predict relationships among data and is used for problems such as link prediction...

Examining the Performance of Kernel Methods for Software Defect Prediction Based on Support Vector Machine

Article

Full-text available

Dec 2022

Secure Aviation Control through a Streamlined ADS-B Perception System

Article

Full-text available

Mar 2024

Automatic dependent surveillance-broadcast (ADS-B) is the future of aviation surveillance and traffic control, allowing different aircraft types to exchange information periodically. Despite this protocol’s advantages, it is vulnerable to flooding, denial of service, and injection attacks. In this paper, we decided to join the initiative of securing this protocol and propose an efficient detection method to help detect any exploitation attempts by injecting these messages with the wrong information. This paper focused mainly on three attacks: path modification, ghost aircraft injection, and velocity drift attacks. This paper aims to provide a revolutionary methodology that, even in the face of new attacks (zero-day attacks), can successfully detect injected messages. The main advantage was utilizing a recent dataset to create more reliable and adaptive training and testing materials, which were then preprocessed before using different machine learning algorithms to feasibly create the most accurate and time-efficient model. The best outcomes of the binary classification were obtained with 99.14% accuracy, an F1-score of 99.14%, and a Matthews correlation coefficient (MCC) of 0.982. At the same time, the best outcomes of the multiclass classification were obtained with 99.41% accuracy, an F1-score of 99.37%, and a Matthews correlation coefficient (MCC) of 0.988. Eventually, our best outcomes outdo existing models, but we believe the model would benefit from more testing of other types of attacks and a bigger dataset.

Automatic detection of Breast Cancer by using Ensemble Learning

Preprint

Full-text available

May 2023

Breast cancer is a significant health problem, with about 2 million new cases annually diagnosed and 600,000 deaths. Early detection and accurate diagnosis are critical to patient prognosis. Machine learning (ML) models show promising results in accurate and efficient diagnosis. In the present work, the performance of different models of ML are studied in the publicly accessible online dataset "Wisconsin Breast Cancer Dataset". Those models are formed by logistic regressions, Random Forest, Naïve Bayes, and Support Vector Machine algorithms, being the last one the best performing. An ensemble model combining the best proposed models is then implemented. An SVM model with standardized dataset is used, a logistic regression model with standardized dataset and 10-component PCA analysis. A Random Forest model with standardized dataset and 60 estimators. All models use a test dataset formed by 30% of the original dataset. The models are combined using a majority weighted voting system. The SVM model has a weight of 0.5 while the regression and Random Forest models have weights of 0.25. The ensemble voting model manages to improve the results of the individual models with an accuracy of 98%, precision of 97%, recall of 99% and F1 score of 98%.

Development of a Methane Emission Prediction Tool (POMEP178) for Palm Oil Mill Effluent Using Gaussian Process Regression

Article

Full-text available

Apr 2023

Palm oil mill effluent (POME) contributes to 23.7% of the methane emissions in Malaysia. Development of a methane emission prediction tool by using machine learning (ML) enables the estimated volume of methane released to be determined. In this study, Gaussian Process Regression (GPR) along with its respective kernels was explored for the development of the prediction tool. Synthetic minority oversampling technique (SMOTE) was also implemented to study the effect of the training sample size used on the model validation. The GPR model was trained using synthetic data created from SMOTE, while the measure data from the plant was used to test the reliability of the trained model. The application of SMOTE was capable in producing high model validation performance (R² = 0.98, RMSE = 0.133, MSE = 0.018 and MAE = 0.08) using the common squared exponential kernel GPR model. However, the Matern 5/2 and rational quadratic kernel GPR model had the best model validation performance (R² = 0.98, RMSE = 0.131, MSE = 0.017 and MAE = 0.083). In terms of model testing performance, rational quadratic kernel had the best performance with R² = 0.99, RMSE = 0.061, MSE = 0.0037 and MAE = 0.044. The results of this study indicate the prediction tool developed using SMOTE-based rational quadratic kernel GPR model can predict methane emissions with high accuracy. The methane emissions prediction tool developed is an alternative cost friendly and reliable option to existing methods.

Cost-effective Detection Model for Injected Automatic Dependent Surveillance-Broadcast Messages (ADS-B) for Secure Aviation Control

Preprint

Full-text available

Apr 2023

Automatic Dependent Surveillance-Broadcast (ADS-B) is considered the future of aviation surveillance and traffic control as it allows different types of aircraft to transmit and gain information about their and other nearby aircraft's positions, velocity, and various other variables periodically. But, as this protocol still show that it lacks in terms of security and that researchers are still developing more methods and frameworks in order to secure this technology, we decided to join the initiative and propose an efficient detection method to help aid with detecting any attempts at injecting these messages which would cause multiple risks to aircrafts such as causing collision avoidance system failure, reporting wrong status of an aircraft, or even stealing it. This paper focused mainly on three different attacks; path modification, ghost aircraft injection, and velocity drift attacks. The dataset we utilized consisted of authentic messages captured from the OpenSky Network and generated injected messages using PyCharm. This study aims to provide a revolutionary methodology that, even in the face of new attacks (zero-day attacks), can successfully detect injected messages. The main advantage was utilizing a recent dataset to create more reliable and adaptive training and testing materials, which were then preprocessed before using different machine learning algorithms to feasibly create the most accurate and time-efficient model. The best outcomes of the binary classification were obtained with 99.14% accuracy, an F1-Score of 99.14%, and a Matthews' Correlation Coefficient (MCC) of 0.982. At the same time, the best outcomes of the multiclass classification were obtained with 99.41% accuracy, an F1-Score of 99.37%, and a Matthews' Correlation Coefficient (MCC) of 0.988. The dataset is thought to offer good outcomes, but the model still requires more testing and a bigger dataset, bearing in mind that the model still needs to be tested against other types of attacks.

A Robust Automated Analog Circuits Classification Involving a Graph Neural Network and a Novel Data Augmentation Strategy Special Issue Integrated Circuits and Technologies for Real-Time Sensing Edited by A Robust Automated Analog Circuits Classification Involving a Graph Neural Network and a Novel Data Augmentation Strategy

Article

Full-text available

Mar 2023
SENSORS-BASEL

Citation: Deeb, A.; Ibrahim, A.; Salem, M.; Pichler, J.; Tkachov, S.; Karaj, A.; Al Machot, F.; Kyandoghere, K. A Robust Abstract: Analog mixed-signal (AMS) verification is one of the essential tasks in the development process of modern systems-on-chip (SoC). Most parts of the AMS verification flow are already automated, except for stimuli generation, which has been performed manually. It is thus challenging and time-consuming. Hence, automation is a necessity. To generate stimuli, subcircuits or subblocks of a given analog circuit module should be identified/classified. However, there currently needs to be a reliable industrial tool that can automatically identify/classify analog sub-circuits (eventually in the frame of a circuit design process) or automatically classify a given analog circuit at hand. Besides verification, several other processes would profit enormously from the availability of a robust and reliable automated classification model for analog circuit modules (which may belong to different levels). This paper presents how to use a Graph Convolutional Network (GCN) model and proposes a novel data augmentation strategy to automatically classify analog circuits of a given level. Eventually, it can be upscaled or integrated within a more complex functional module (for a structure recognition of complex analog circuits), targeting the identification of subcircuits within a more complex analog circuit module. An integrated novel data augmentation technique is particularly crucial due to the harsh reality of the availability of generally only a relatively limited dataset of analog circuits' schematics (i.e., sample architectures) in practical settings. Through a comprehensive ontology, we first introduce a graph representation framework of the circuits' schematics, which consists of converting the circuit's related netlists into graphs. Then, we use a robust classifier consisting of a GCN processor to determine the label corresponding to the given input analog circuit's schematics. Furthermore, the classification performance is improved and robust by involving a novel data augmentation technique. The classification accuracy was enhanced from 48.2% to 76.6% using feature matrix augmentation, and from 72% to 92% using Dataset Augmentation by Flipping. A 100% accuracy was achieved after applying either multi-Stage augmentation or Hyperphysical Augmentation. Overall, extensive tests of the concept were developed to demonstrate high accuracy for the analog circuit's classification endeavor. This is solid support for a future up-scaling towards an automated analog circuits' structure detection, which is one of the prerequisites not only for the stimuli generation in the frame of analog mixed-signal verification but also for other critical endeavors related to the engineering of AMS circuits.

Classification of Epileptic Seizure Using Machine Learning and Deep Learning Based on Electroencephalography (EEG)

Chapter

Full-text available

Aug 2022

Epilepsy is a type of neurological brain disorder due to a temporary change in the brain’s electrical function. If diagnosed and treated, there can be no seizures. Electroencephalogram (EEG) is the most common technique used in diagnosing epilepsy to avoid danger and take preventive precautions. This paper applied deep learning and machine learning techniques for detecting epileptic seizures and identifying whether machine learning or deep learning classifiers are more pertinent for the purpose and then trying to improve the present techniques for seizure detection. The best performance of the deep learning models has been achieved by implementing the convolutional neural network (CNN) algorithm on the EEG signal dataset in which the result appears as follows: accuracy 99.2%, specificity 99.3% and sensitivity 98.7%. For hybrid deep neural network CNN with long short-term memory (LSTM), the accuracy reached is 98.7%.KeywordsConvolutional neural networkEpilepsySeizuresElectroencephalographyLong short-term memory

Grey Oyster Mushroom Classification toward a Smart Mushroom Grading System for Agricultural Factory

Conference Paper

Jun 2022

Towards Machine Learning and Low Data Rate IoT for Fault Detection in Data Driven Predictive Maintenance

Conference Paper

Full-text available

May 2021

Diagnostic automatisé des pathologies de la rétine à l'aide des volumes OCT

Thesis

Dec 2020

Rami Safarjalani

La principale cause de cécité dans la population pourrait être surtout la détérioration de la rétine causée par les problèmes liés au diabète et la complication du vieillissement. La rétinopathie diabétique (DR) et l'oedème maculaire diabétique (DME) sont les principales causes directes de problèmes de vision chez les citoyens en âge de travailler de la plupart des pays avancés. Le nombre élevé de personnes diabétiques dans le monde indique que le DME et la RD resteront les principaux facteurs de perte de vision partielle ou totale, ce qui affecte la qualité de vie des patients pendant de nombreuses années et menace leur vie. Par conséquent, une détection précoce suivie de procédures de traitement rapide des personnes atteintes de maladies liées au diabète est importante pour prévenir les problèmes optiques et peut réduire le risque de cécité. De plus, les personnes de plus de 50 ans sont exposées à la dégénérescence maculaire liée à l'âge (AMD) qui attaque la rétine. Par conséquent, les chercheurs du monde entier sont attirés par les différences liées à plusieurs maladies rétiniennes. Plusieurs méthodes automatisées utilisant l'AI ont été appliquées pour la détection et le test des maladies rétiniennes. Malheureusement, ces modèles peuvent être confondus avec une incapacité de calcul, ce qui nécessite une intervention supplémentaire de la part de spécialistes. Cette thèse présente une méthode automatique - basée sur des algorithmes de réseaux de neurones d'apprentissage en profondeur - pour détecter DME et DR, ce qui permet de dépasser l'évaluation pratique subjective des ophtalmologistes. Basé sur "Convolutional Neural Network", un modèle proposé est présenté avec un classificateur soft-max et entrainé de bout en bout pour la classification automatique des images rétiniennes de tomographie par cohérence optique (OCT). Ce modèle a la capacité de détecter des caractéristiques permettant d'identifier la DR et le DME en utilisant ces images rétiniennes avec une précision et une sensibilité améliorées. De plus, un modèle préformé a été affiné et réformé à l'aide d'un ensemble de données qui a été enrichi à l'aide de "Generative Adversarial Networks" (GAN). Contrairement au diagnostic manuel de la maladie rétinienne basé sur un examen clinique personnel et l'analyse des images OCT, cette méthode a montré la capacité de prédire automatiquement les cas atteints de DME par rapport aux cas sains. Les expériences ont été évaluées sur plusieurs ensembles de données fournis par différentes institutions. Le modèle, comparé à d'autres modèles CNN entrainés de bout en bout ou pre-entrainés et affinés, montre des fonctionnalités d'extraction efficaces, avec moins de temps, sur la base d'une étape de prétraitement efficace des données. Les résultats expérimentaux ont montré une plus grande précision de classification, ce qui est prometteur dans le domaine de la détection précoce des maladies diabétiques pour aider les ophtalmologistes dans les technologies biomédicales.

Urban pluvial flooding prediction by machine learning approaches – a case study of Shenzhen city, China

Article

Full-text available

Aug 2020
ADV WATER RESOUR

Urban pluvial flooding is a threatening natural hazard in urban areas all over the world, especially in recent years given its increasing frequency of occurrence. In order to prevent flood occurrence and mitigate the subsequent aftermath, urban water managers aim to predict precipitation characteristics, including peak intensity, arrival time and duration, so that they can further warn inhabitants in risky areas and take emergency actions when forecasting a pluvial flood. Previous studies that dealt with the prediction of urban pluvial flooding are mainly based on hydrological or hydraulic models, requiring a large volume of data for simulation accuracy. These methods are computationally expensive. Using a rainfall threshold to predict flooding based on a data-driven approach can decrease the computational complexity to a great extent. In order to prepare cities for frequent pluvial flood events – especially in the future climate – this paper uses a rainfall threshold for classifying flood vs. non-flood events, based on machine learning (ML) approaches, applied to a case study of Shenzhen city in China. In doing so, ML models can determine several rainfall threshold lines projected in a plane spanned by two principal components, which provides a binary result (flood or no flood). Compared to the conventional critical rainfall curve, the proposed models, especially the subspace discriminant analysis, can classify flooding and non-flooding by different combinations of multiple-resolution rainfall intensities, greatly raising the accuracy to 96.5% and lowering the false alert rate to 25%. Compared to the conventional model, the critical indices of accuracy and true positive rate (TPR) were 5%-15% higher in ML models. Such models are applicable to other urban catchments as well. The results are expected to be used to assist early warning systems and provide rational information for contingency and emergency planning.

Diagram of k-fold cross-validation with k = 10. Image from Karl Rosaen Log http://karlrosaen.com/ml/learning-log/2016-06-20/

Similar publications

Citations