Fig 2 - uploaded by Samy Missoum
Content may be subject to copyright.
SVM boundary based on a polynomial kernel separating disjoint classes (+1 and −1)

SVM boundary based on a polynomial kernel separating disjoint classes (+1 and −1)

Source publication
Article
Full-text available
This article presents an improved adaptive sampling scheme for the construction of explicit decision functions (constraints or limit state functions) using Support Vector Machines (SVMs). The proposed work presents substantial modifications to an earlier version of the scheme (Basudhar and Missoum, Comput Struct 86(19–20):1904–1917, 2008). The impr...

Contexts in source publication

Context 1
... p is the degree of the polynomial kernel. An example of SVM classification with a polynomial kernel is depicted in Fig. 2. ...
Context 2
... 2006). These samples are then classified based on the corresponding response values, followed by the construction of a boundary that separates the distinct classes of samples. The use of SVMs for the definition of explicit boundaries has been found to be flexible due to their ability to represent highly nonlinear boundaries and disjoint regions ( Fig. 2) ( Basudhar et al. 2008). The main steps of EDSD using an SVM classifier are listed in Algorithm ...

Citations

... This could reduce the number of biophysical model evaluations while building the GPR model and improve the model accuracy. Adaptive sampling can also be used to improve the classifier accuracy, as suggested in Ref. [63]. Performing this type of adaptive sampling will add points in the region close to the decision boundary (i.e., near the UR − AR boundary in this case) and potentially remove the jump in the model prediction. ...
... The SVM algorithm is a model-based learning technique that is now one of the most effective classifiers according to [44]. In Hitam and Ismail [31], the Support Vector Machine (SVM) classifier or algorithm was established as an initiative principle to avoid the issue of over-fitting in the data when modeling the training dataset, and it is known for its flexibility in creating clear and accurate frontiers [45,46]. SVM is effective in a wide range of applications due to its ease of use, and the quick training results it provides [47]. ...
Article
Full-text available
Cryptocurrencies like Bitcoin are one of today's financial system’s most contentious and difficult technological advances. This study aims to evaluate the performance of three different Machine Learning (ML) algorithms, namely, the Support Vector Machines (SVM), the K Nearest Neighbor (KNN), and the Light Gradient Boosted Machine (LGBM), which seeks to accurately estimate the price movement of Bitcoin, Ethereum, and Litecoin. To test these algorithms, we used an existing continuous dataset extracted from Kaggle and coinmarketcap.com. We implemented models using the Knime platform. We used auto biner for volume and market capital. Sensitivity analysis was performed to match different parameters. The F and accuracy statistics were used for the evaluation of algorithm performances. Empirical findings reveal that the KNN has the highest fore-casting performance for the overall dataset in our first investigation phase. On the other hand, the SVM has the highest for forecasting Bitcoin and the LGBM for Ethereum and Litecoin in the individual dataset in the second investigation phase.
... Other alternatives include the utilization of metamodels, such as Kriging (Kaymaz 2005;Xiao et al. 2020;Zhou and Lu 2020), neural networks (Gondal and Lee 2012;Papadopoulos et al. 2012), support vector regression (Basudhar and Missoum 2010), radial basis functions (Zhou et al. 2019a, b), and polynomial chaos expansions (Diaz et al. 2018;Zhou et al. 2019a, b). Owing to the extremely high computational cost of implicit functions in certain problems, surrogate models are widely used as a replacement (Jiang et al. 2019;Chojaczyk et al. 2015). ...
Article
Full-text available
Asymptotic sampling (AS) is an efficient simulation-based technique for estimating the small failure probabilities of structures. AS utilizes the asymptotic behavior of the reliability index with respect to the standard deviations of random variables. In this method, the standard deviations of random variables are progressively inflated using a scale parameter to obtain a set of scaled reliability indices. The collection of the standard deviation scale parameters and corresponding scaled reliability indices are called support points. Then, least squares regression is performed using these support points to establish a relationship between the scale parameter and scaled reliability indices. Finally, extrapolation is performed to estimate the actual reliability index. Various extrapolation models have been used in AS to improve accuracy. Moreover, a mean extrapolation formulation using the average value of different extrapolation models was proposed to further improve its accuracy. Although the mean extrapolation formulation protects against using the wrong extrapolation model, it did not guarantee a reliability estimation better than that of the best available extrapolation model. In this paper, we propose a weighted average AS formulation in which the weight factors are optimized to minimize the variance of the reliability index estimation through the bootstrapping method. In the weight factor determination, both convex and affine formulations are considered and the results are compared. The performance of the proposed method is evaluated using six benchmark example problems and a complicated engineering problem. It is found that the proposed weighted average formulation has higher accuracy than the mean extrapolation formulation. For weight factor optimization, the affine formulation yields more accurate results than the convex formulation in most cases.
... Basudhar and Missoum [60] proposed an algorithm that selects a new training sample having the highest probability of being misclassified by the SVM decision function and constrained to a minimum distance from the existing training samples which is determined by a function of the hypervolume of the design space, the problem dimensionality, and the number of training samples. Basudhar and Missoum [61] proposed an improved adaptive sampling scheme for the construction of explicit boundaries. Basically, they presented substantial modifications to their previous adaptive scheme [60]. ...
... Basically, they presented substantial modifications to their previous adaptive scheme [60]. Basudhar and Missoum [61] further improved the choice of a new sample such that it removes the locking of SVM, a phenomenon that was not taken care of in the previous version of the algorithm. The locking of SVM in the previous scheme means the selection of new samples only on the SVM boundary and the modification of the SVM boundary due to such a sample may be negligible if the margin is thin. ...
Article
Support vector machine (SVM) is a powerful machine learning technique relying on the structural risk minimization principle. The applications of SVM in structural reliability analysis (SRA) are enormous in the recent past. There are review articles on machine learning-based methods that partly discussed the development of SVM for SRA applications along with other machine learning methods. However, there is no dedicated review on SVM for SRA applications. Thus, a review article on the implementation of various SVM approaches for SRA applications will be useful. The present article provides a synthesis and roadmap to the growing and diverse literature, specifically the classification and regression-based support vector algorithms in SRA applications. In doing so, different advanced variants of SVM in SRA applications and hyperparameter tuning algorithms are also briefly discussed. Following the detailed review studies, future opportunities and challenges in the area of applications are summarized. The review in general reveals that the SVM in SRA applications is getting thrust as it has an excellent capability of handling high-dimensional problems utilizing relatively lesser training data. The review article is expected to enhance the state-of-the-art developments of support vector algorithms for SRA applications.
... Support Vector Machine (SVM) method or classifier is a machine learning algorithm that separates a data set by finding a separating hyperplane. It is regarded as the most adaptable method for creating unambiguous and precise limits [10,11]. It is capable of doing both linear and non-linear classifications and their performances are often compared regarding the same problem. ...
Article
Full-text available
With the rapid growth of technology, cryptocurrency like Bitcoin is attracting more and more attention. Its high volatility in prices creates many difficulties for predicting and there has been much work on this. This paper aims to provide a comparison of various machine learning models like linear regression, SVM, random forest, and neural networks for predicting the directions for Bitcoin close prices. The dataset used is from Jan 2012 to March 2021 and all four prices are used for predictions: Close, Open, High, and Low. Two different methods are used to fit the different types of machine learning algorithms: for regressors, close price predictions are first done and then construct in the predicted direction; for classifiers, direction predictions are done directly. Accuracy is used to do the comparison, which is the percentage of correct direction predictions made via the algorithm. It is shown that LSTM, a neural network algorithm generates the highest accuracy of about 58% and the random forest classifier has the lowest accuracy of about 55.47%.
... The second one is defined based on the confidence bound of the estimated failure probability, considering the statistical uncertainty of surrogate predictions, such as [22,25]. The third one is defined according to the stabilization of the failure probability estimates within several consecutive iterations, such as [13,26]. ...
Article
Full-text available
A failure-informed enrichment algorithm is devised to improve the performance of the existing adaptive Kriging-probability density evolution method (AK-PDEM) for reliability analysis. This improved method is named the AK-PDEMi. Contrary to empirically prescribing the sample size of representative points in the existing AK-PDEM, the representative point set in the AK-PDEMi is sequentially enriched by new sets of representative points generated by a failure-informed enrichment scheme, which aims to sequentially making fine partitions of the key sub-regions where the representative points make critical contributions to the failure probability. In this regard , a double-loop configuration is devised: the inner loop adaptively refines the accuracy of Kriging model to reduce the Kriging-induced error, and the outer loop involves the failure-informed enrichment process to alleviate the PDEM-associated discretization error. The outer and inner loops are complementary and proceed sequentially until both of their convergence criteria are satisfied. Three numerical examples are studied and comprehensive comparisons are made between the proposed AK-PDEMi and other conventional reliability algorithms. Results show that the AK-PDEMi shows remarkable advantage over the existing AK-PDEM.
... A common aspect of these approaches is to encourage exploration of the physical space, , by adding points to the training data that are separate from the existing labelled samples. For instance, Basudhar and Missoum [31,32] introduced an adaptive sampling method for constructing an estimate of the LSF using SVMs based on optimisation. Informative samples are found by maximising an objective function that depends on the nearest neighbour distance of the candidate from the training data in . ...
... However, the non-linear nature of the LSF can make it challenging to identify parameter points in such a subspace. One solution is to search for candidate parameter points based purely on their proximity to the learned LSF, however, this can lead to a divergent search which may not converge to the true LSF, referred to as 'SVM locking' in Basudhar and Missoum [32]. Secondly, while SVMs may generally scale more A trained Support Vector Machine (SVM) finds a separating hyperplane that maximises the width of the volume that separates the 2 classes of training data, the LSV. ...
... If an SVM meta-model is employed, then these points will be those that satisfy { ∶ ( ) = 0}. For numerical reasons this equality is typically relaxed to the inequality constraint { ∶ | ( )| ≤ }, where is a tolerance on the distance of the candidate from the separating hyperplane in feature space (see, e.g., [32]). Identifying points close to the LSF by exploring its physical space representation might be difficult. ...
Article
Full-text available
Given an expensive computational model of a system subject to reliability requirements, this work shows how to approximate the failure probability by learning adaptively the high-likelihood regions of the Limit State Function using Support Vector Machines. To this end, an algorithm is proposed that selects informative parameter points to add to training data at each iteration to improve the accuracy of the approximation. Furthermore, we provide a means to quantify the uncertainty in the Limit State Function, using geometrical arguments to estimate an upper bound to the failure probability.
... The main questions to be answered are: how many numerical simulations should we include, and which ones? Which information is needed in order to devise a systematic strategy? This work is devoted to the investigation of possible answers to these questions, in the spirit of what has been proposed in [BM10], in which an adaptive sampling is proposed in order to improve the performances of a SVM classifier. The selection of the samples aims at improving the position of the support vectors and the margin. ...
Thesis
As a branch of pharmacology, cardiac safety pharmacology aims at investigat- ing compound side effects on the cardiac system at therapeutic doses. These investigations, made through in silico, in vitro and in vivo experiments, allow to select/reject a com- pound at each step of the drug development process. A large subdomain of cardiac safety pharmacology is devoted to the study of the electrical activity of cardiac cells based on in silico and in vitro assays. This electrical activity is the consequence of polarised structure exchanges (mainly ions) between the extracellular and intracellular medium. A modification of the ionic exchanges induces changes in the electrical activity of the cardiac cell which can be pathological (e.g. by generating arrhythmia). Strong knowledges of these electrical signals are therefore essential to prevent risk of lethal events.Patch-clamp techniques are the most common methods to record the electrical activity of a cardiac cell. Although these electrical signals are well known, they are slow and tedious to perform, and therefore, expansive. A recent alternative is to consider microelec- trode array (MEA) devices. Originally developped for neurons studies, its extension to cardiac cells allows a high throughput screening which was not possible with patch-clamp techniques. It consists of a plate with wells in which cardiac cells (forming a tissue) cover some electrodes. Therefore, the extension of these devices to cardiac cells allow to record the electrical activity of the cells at a tissue level (before and after compound addition into the wells). As a new signal, many studies have to be done to understand how ionic exchanges induce this recorded electrical activity, and, finally, to proceed the selection/rejection of a compound. Despite these signals are still not well known, recent studies have shown promising results in the consideration of MEA into cardiac safety pharmacology. The automation of the compound selection/rejection is still challenging and far from industrial applications, which is the final goal of this manuscript.Mathematically, the selection/rejection process can be seen as a binary classification problem. As in any supervised classification (and machine learning tasks, more generally), an input has to be defined. In our case, time series of the cardiac electrical activities are possibly long (minutes or hours) with a high sampling rate (∼ kHz) leading to an input living in a high-dimensional space (hundreds, thousands or even more). Moreover the number of available data is still low (at most hundreds). This critical regime named high dimension/low sample size make the context challenging.The aim of this manuscript is to provide a systematic strategy to select/reject com- pounds in an automated way, under the following constraints:• Deal with high dimension/low sample size regime.• No assumptions on the data distributions.• Exploit in silico models to improve the classification performances. • No or few parameters to tune.The first part of the manuscript is devoted to the context, followed by the description of the patch-clamp and MEA technologies. This part ends by the description of action potential and field potential models to perform in silico experiments.In a second part, two methodological aspects are developped, trying to comply, at best, with the constraints of the industrial application. The first one describes a double greedy goal-oriented strategy to reduce the input space based on a score function related to the classification success rate. Comparisons with classical dimension reduction methods such as PCA and PLS (with default parameters) are performed, showing that the proposed method led to better results. The second method consists in the construction of an augmented training set based on a reservoir of simulations, by considering the Hausdorff distance between sets and the maximisation of same score function as in the first method. The proposed strategy makes it posssible to automatically reject biased and/or wrongly labelled data to construct the augmented training set. A numerical experiments is performed on in silico action potentials and comparison with SVM and KNN (with default parameters) are done, showing that the proposed method globally led to higher classification success rates.In the third part, two applications to patch-clamp data are performed. In a first study, it consists in a regression problem to estimate the ion channel activity from in silico action potential signals. The coupling of the proposed goal-oriented double greedy dimension reduction method with an unscented Kalman filter improves the ion channel activity estimation in terms of computational cost and accuracy. A second study is devoted to the Hit/No hit classification of compounds based on automated patch-clamp signals. The goal-oriented dimension reduction methods led to a better classification of the compounds (particularly at intermediate concentrations) than with the classification strategy proposed by the company with whom we collaborated.Finally, the part number four of the manuscript is devoted to MEA signals. In a first study, a goal-oriented double greedy dimension method is applied to in vitro experiments showing qualitatively good ion channel classification results. The second study highlights the improvements of the classification success rate using the augmented training set construction strategy. This part ends by an application on a larger dataset is performed to consider the two proposed methods into an industrial context.
... SVM parameters are often selected by cross-validation, as Long Zhang [6] demonstrated in his paper published in 2015 [7]. Cross-validation, on the other hand, takes a long time and requires a lot of expertise to develop effective SVM parameters. ...
... Support vector machines for classification was first introduced by Hurtado (2004). Basudhar and Missoum (2008) proposed an adaptive scheme combining SVM classification and Monte Carlo simulation. The enrichment scheme is based on finding the point belonging to the limitstate surface approximation that is the furthest from the existing training points. ...
... Some authors have tried softening it either by considering the whole candidate set Finally, the third family of stopping criteria has been built using the stabilization of either the limit-state surface or the failure probability estimates within enrichment iterations. Basudhar and Missoum (2008) tracked the fraction of some predefined convergence points that changed sign within two updates of an SVM model and assumed convergence when this fraction was relatively small. This criteria, often with slight adjustments, has been used in numbers of SVM-based active learning schemes (Bourinet, 2018). ...
... A workaround consists in tracking the convergence over several iterations, on average 2 to 3 and in some contributions and up to 10 iterations (Bourinet, 2017). An alternative is to smooth out the convergence criterion as in Basudhar and Missoum (2008) by using an exponential curve fitted to the convergence criterion. ...
Article
Full-text available
Active learning methods have recently surged in the literature due to their ability to solve complex structural reliability problems within an affordable computational cost. These methods are designed by adaptively building an inexpensive surrogate of the original limit-state function. Examples of such surrogates include Gaussian process models which have been adopted in many contributions, the most popular ones being the efficient global reliability analysis (EGRA) and the active Kriging Monte Carlo simulation (AK-MCS), two milestone contributions in the field. In this paper, we first conduct a survey of the recent literature, showing that most of the proposed methods actually span from modifying one or more aspects of the two aforementioned methods. We then propose a generalized modular framework to build on-the-fly efficient active learning strategies by combining the following four ingredients or modules: surrogate model, reliability estimation algorithm, learning function and stopping criterion. Using this framework, we devise 39 strategies for the solution of 20 reliability benchmark problems. The results of this extensive benchmark (more than 12,000 reliability problems solved) are analyzed under various criteria leading to a synthesized set of recommendations for practitioners. These may be refined with a priori knowledge about the feature of the problem to solve, i.e. dimensionality and magnitude of the failure probability. This benchmark has eventually highlighted the importance of using surrogates in conjunction with sophisticated reliability estimation algorithms as a way to enhance the efficiency of the latter.