Figure - available from: Artificial Intelligence Review
This content is subject to copyright. Terms and conditions apply.
Illustration diagram of linear SVM

Illustration diagram of linear SVM

Source publication
Article
Full-text available
Twin support vector machines (TWSVM), a novel machine learning algorithm developing from traditional support vector machines (SVM), is one of the typical nonparallel support vector machines. Since the TWSVM has superiorities of the simple model, the high training speed and the good performance, it has drawn extensive attention. The initial TWSVM ca...

Citations

... In SVR, the kernel function is applied to simplify nonlinear approximation [29]. If the kernel function k(x, x ′ ) has k(x, x ′ ) = ϕ(x), ϕ(x ′ ), the following equation is achieved. ...
Article
Full-text available
In the moist environment of soil-water-air, there is a problem of low accuracy in monitoring volatile organic compounds (VOCs) using a photoionization detector (PID). This study is based on the PID water-soil-gas VOC online monitor developed by this group, online monitoring of the concentration of different constituents of VOCs in different production enterprises of the petroleum and chemical industries in Shandong Province, with the concentration of the laboratory test, to build a relevant model. The correlation coefficient about the PID test concentration and the actual concentration correlation coefficient was obtained through the collection of a large number of data trainings. Based on the application of PID in VOC monitoring, the establishment of a PID high-precision calibration model is important for the precise monitoring of VOCs. In this paper, multiple quantitative analyses were conducted, based on SVM regression of PID response to VOC signals, to study the high-precision VOC monitoring method. To select the response signals of PID under different concentrations of environmental VOCs measured by the research group, first, the PID response to VOC signals was modeled using the support vector machine principle to verify the effect of traditional SVM regression. For the problem of raw data redundancy, calculate the time-domain and frequency-domain characteristics of the PID signal, and conduct the principal component analysis of the time-domain of the PID signal. In order to make the SVM regression more generalized and robust, the selection of kernel function parameters and penalty factor of SVM is optimized by genetic algorithm. By comparing the accuracy of PID calibration models such as PID signal feature extraction, SVM regression, and principal component analysis SVM regression, the superiority of photoionization detector using the signal feature extraction PCA-GA-SVM method to monitor VOCs is verified.
... In SVR, the kernel function is employed to simplify nonlinear approximation [25]. If the kernel function , ′ satisfying , ′ =ϕ(x), ϕ ′ , the following equation is formulated. ...
Preprint
Full-text available
In the moist environment of soil-water-air, there is a problem of low accuracy in monitoring Volatile Organic Compounds (VOCs) using a Photoionization Detector (PID). This paper analyzes the reasons for the low accuracy of the traditional Support Vector Machine (SVM) regression method. To address the issue, the PID signal is subjected to feature extraction and Principal Component Analysis (PCA) to reduce the data dimensionality. Moreover, the optimal SVM parameters are selected using a Genetic Algorithm (GA), and a combined approach of SVM regression with PCA and GA is utilized for PID signal regression analysis. And the effectiveness of the method is validated through extensive experiments and simulations. Furthermore, the influence of the sample quantity on the regression accuracy is analyzed, enabling accurate monitoring of VOCs concentration in a moist environment.
... However, recent studies have displayed that some binary techniques cannot effectively address the multi-class problem. K-SVCR [23] and Twin-KSVC [24], as novel multi-class classification algorithms, produce better prediction performance as it evaluates all the training points into a ''oneversus-one-versus-rest'' structure [25]. In addition, the least square version was also presented in literatures [26,27]. ...
Article
Full-text available
Cucumber is one of the most popular vegetable varieties, but leaf disease of cucumber is the key factor restricting the increase of yield. Common cucumber diseases include downy mildew, powdery mildew and gray mold. The plant electronic medical records (PEMRs) formed by “plant clinic” are the diagnosis record of the real disease occurrence issued by the plant doctor, which provides a new idea for the cucumber disease diagnosis. The efficient mining of prescription big data to facilitate precise diagnoses of crop pests and diseases represents an emerging challenge. The data mining technology represented by machine learning has attracted wide attention. Therefore, 15 diagnosis models are proposed to deal with this problem. Since SVM has many advantages including implementing the structural risk minimization principle and can effectively deal with the small sample data, five algorithms based on SVM have achieved better diagnosis performance in comparison with others. Moreover, the highest prediction accuracy is beyond 80%. In addition, the prediction performance has been improved after employing the undersampling technology for the imbalanced data. This means they are suitable for this cucumber diseases diagnosis.
... Several extensions have been proposed for TWSVM. For instance, Ding et al. [8,9] provided a review on TWSVM and proposed Wavelet TWSVM, which employs the Glowworm Swarm Optimization method to optimize model parameters [10]. Nasiri and Mir [11] proposed an enhanced regularized K-nearest neighbor (KNN) TWSVM, which reduces the computational cost of finding KNNs for all samples and addresses the impact of noise and outliers on the model's output. ...
Article
Full-text available
In practical applications, supervised learning algorithms, including support vector machine (SVM), heavily rely on precise labeling to train predictive models. Nonetheless, real-world datasets often comprise mislabeled samples, which can have considerable influence on the performance of these algorithms. On the other hand, SVM suffers from computational costs when facing large-scale datasets. Twin support vector machine (TWSVM) tackles this issue and finds two nonparallel hyperplanes by solving two smaller models compared to SVM such that each one is closer to one of the two classes and is at least a unit distance far away from the samples of the other class. In this paper, to address label noise in datasets, we propose a TWSVM-based mixed-integer programming model for relabeling instances directly, while inheriting the advantages of TWSVM. Each model decides whether the samples of one class should be considered among instances that are as close as possible to its corresponding hyperplane. Therefore, each model exhibits the ability to recognize instances bearing close resemblance to one class while their assigned labels belong to the other one, prompting their reclassification. Conversely, instances demonstrating lower similarities to the other class retain their original labels. To show the efficiency of proposed models experiments are conducted on 12 UCI datasets.
... However, in real-world applications the classifying categories might be more than two. In the literature there exist two groups of approaches to tackle the problem of multiclass classification: all-together methods and decomposition-reconstruction methods (Ding et al. (2019)). In the first approach all data are considered in one large optimization model and the classifier is derived accordingly (see Bredensteiner andBennett (1999), Yajima (2005), Zhong and Fukushima (2007)). ...
... Each of them is solved independently and the binary classifiers are then combined into a multiclass decision function. The decompositionreconstruction procedure is considered to be the most effective way to achieve multiclass separation (Du et al. (2021)), especially due to the high complexity of the alltogether methods with large datasets (Ding et al. (2019)). Within the decompositionreconstruction paradigm, different formulations have been designed. ...
... Other decomposition-reconstruction approaches are direct acyclic graph (Platt et al. (1999)), all-versus-one ) and binary tree SVM structure ). A review on multiclass models especially designed for TWSVM can be found in Ding et al. (2019). ...
Preprint
In this paper we present a Twin Parametric-Margin Support Vector Machine (TPMSVM) model to tackle the problem of multiclass classification. In the spirit of one-versus-all paradigm, for each class we construct a classifier by solving a TPMSVM-type model. Once all classifiers have been determined, they are combined into an aggregate decision function. We consider the cases of both linear and nonlinear kernel-induced classifiers. In addition, we robustify the proposed approach through robust optimization techniques. Indeed, in real-world applications observations are subject to measurement errors and noise, affecting the quality of the solutions. Consequently, data uncertainties need to be included within the model in order to prevent low accuracies in the classification process. Preliminary computational experiments on real-world datasets show the good performance of the proposed approach.
... Some background information was introduced in Section 2, including the TWSVM algorithm and the PSO algorithm. To overcome the shortages of TWSVM, an improved algorithm of PSO-TWSVM was proposed and the recognition steps of the proposed PSO-TWSVM algorithm for gene splicing sites were shown in Section 3. Comparative experiments of the proposed PSO-TWSVM algorithm, traditional TWSVM algorithm and Least Squares Support Vector Machine (LSSVM) algorithm were performed, and the experimental results were analyzed in Section 4. Finally, a conclusion of this paper was made, and some future research work was given in Section 5. [19]. Different from traditional SVM algorithm, the TWSVM algorithm aims to find a pair of uneven 2 ...
Article
Full-text available
Gene splicing site recognition is a very important research topic in smart healthcare. Gene splicing site recognition is of great significance, not only for the large-scale and high-quality computational annotation of genomes but also for the analysis and recognition of the gene sequences evolutionary process. It is urgent to study a reliable and effective algorithm for gene splice site recognition. Traditional Twin Support Vector Machine (TWSVM) algorithm has advantages in solving small-sample, nonlinear, and high-dimensional problems, but it cannot deal with parameter selection well. To avoid the blindness of parameter selection, particle swarm optimization algorithm was used to find the optimal parameters of twin support vector machine. Therefore, a Particle Swarm Optimization Twin Support Vector Machine (PSO-TWSVM) algorithm for gene splicing site recognition was proposed in this paper. The proposed algorithm was compared with traditional Support Vector Machine algorithm, TWSVM algorithm, and Least Squares Support Vector Machine algorithm. The comparison results show that the positive sample recognition rate, negative sample recognition rate, and correlation coefficient (CC) of the proposed algorithm are the best among the four different support vector machine algorithms. The proposed algorithm effectively improves the recognition rate and the accuracy of splice sites. The comparison experiments verify the feasibility of the proposed algorithm.
... Besides, another important issue is to extend TSVM from binary classification to multi-class classification. Ding et al. [3] reviewed the development of multi-class TSVM and provided a specific analysis with respect to the basic theories and geometric meaning, which can help readers better understand the essential differences between different multi-class TSVMs and select suitable one for the classification task. Although the multi-class classification performance and robustness have been promisingly improved, the memory consumption becomes heavy with very large datasets, and directly adopting SVM and TSVM based methods results in a high demand on the configuration of computers. ...
... It can be seen from Tables 5-7, our method have higher robustness to the noisy datasets. 3 http://rgbd-dataset.cs.washington.edu/dataset/ ...
Article
Full-text available
Twin support vector machine (TSVM) definitely improves computational speed compared with the classical SVM, and has been widely used in classification and regression problems. However, two problems should be aroused. First, since the convex hinge loss function of TSVM is unbounded, the generalization performance of TSVM declines under the noisy environment. Second, TSVM is challenging to deal with large-scale data. To handle these problem, in this paper, we propose a new method named Safe Sample Screening for robust TSVM (SSS-RTSVM). As the ramp loss is bounded, robust TSVM clips the hinge loss in the traditional soft margin twin support vector machine to the ramp loss, and provides a pair of nonparallel proximal hyperplanes to achieve good anti-noise ability to noisy data and outlier data. However, the non-convex problem of robust TSVM can be considered as a DC programming problem which is computationally inefficient. Then we integrate safe sample screening rules for RTSVM based on the framework of concave-convex procedure (CCCP) to delete the most training samples, i.e., a subset of the samples called support vectors (SVs) is selected to reduce the computational cost without sacrificing the optimal accuracy. Notably, for the proposed SSS-RTSVM, the security guarantee is provided to the sample screening rule. Extensive experiments are conducted on several benchmark datasets to fully demonstrate the robustness and acceleration of the proposed method.
... TWSVMs are effective in solving binary classification problems, but usually cannot be used directly to solve multi-class classification problems [14], [15]. A common approach to solving multi-class classification problems using TWSVMs is to combine multiple TWSVMs to form an integrated classification model. ...
... A common approach to solving multi-class classification problems using TWSVMs is to combine multiple TWSVMs to form an integrated classification model. A number of combinatorial strategies have been proposed for individual TWSVMs [15], [16], such as one-against-rest methods, one-against-one methods, and decision tree-based methods. One-against-rest TWSVM generates a hyperplane for each class of the sample data, so that each binary TWSVM classifier corresponds to a particular class. ...
... Decision tree TWSVM organizes binary TWSVM classifiers into a reasonable decision tree structure, where each non-leaf node is a binary TWSVM and the leaf nodes are labeled by the classes so that the final decision can be achieved directly. The decision tree TWSVM is faster than the one-against-rest TWSVM and the one-against-one TWSVM, and can solve the unclassifiable region problem well [15]. Therefore, in this paper, the decision tree TWSVM is applied to design a network intrusion detection method. ...
Article
Full-text available
Network intrusion detection is an important technology in national cyberspace security strategy and has become a research hotspot in various cyberspace security issues in recent years. The development of effective and efficient intelligent network intrusion detection methods using advanced machine learning algorithms is of great importance for defending against various network intrusions in complex network environments. In this study, a network intrusion detection method based on decision tree twin support vector machine and hierarchical clustering, named HC-DTTWSVM, is proposed, which can effectively detect different categories of network intrusion. First, the hierarchical clustering algorithm is applied to construct the decision tree for network traffic data, where the bottom-up merging approach is used to maximize the separation of the upper nodes of the decision tree, which reduces the error accumulation in the construction of the decision tree. Then, twin support vector machines are embedded in the constructed decision tree to implement the network intrusion detection model, which can effectively detect the network intrusion category in a top-down manner. The detection performance of the proposed HC-DTTWSVM method is evaluated on NSL-KDD and UNSW-NB15 intrusion detection benchmark datasets. Experimental results show that HC-DTTWSVM can effectively detect different categories of network intrusion and achieves comparable detection performance compared to some of the recently proposed network intrusion detection methods.
... Class prediction is done by either the argmax rule, i.e.,ŷ = arg max 1 j Kpj (x), or the max voting algorithm (Kallas et al., 2012;Tomar and Agarwal, 2015;Ding et al., 2019). Wang et al. (2019) showed that the wSVMs can outperform benchmark methods like kernel multi-category logistic regression (Zhu and Hastie, 2005), random forest, and classification trees. ...
Article
Full-text available
Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.
... A TWSVM method has emerged in recent years, which finds two hyperplanes, one for each class. It classifies a point according to the hyperplane closest to a given point (Ding et al., 2019). The TWSVM method does not include a proper mapping based on reproducing kernel Hilbert space (RKHS) within their optimization. ...
Article
Full-text available
Diarrhetic shellfish poisoning (DSP) toxins are potent marine biotoxins. It can cause a severe gastrointestinal illness by the consumption of mussels contaminated by DSP toxins. New methods for effectively and rapidly detecting DSP toxins-contaminated mussels are required. In this study, we used near-infrared (NIR) reflection spectroscopy combined with pattern recognition methods to detect DSP toxins. In the range of 950-1700 nm, the spectral data of healthy mussels and DSP toxins-contaminated mussels were acquired. To select optimal waveband subsets, a waveband selection algorithm with a Gaussian membership function based on fuzzy rough set theory was applied. Considering that detecting DSP toxins-contaminated mussels from healthy mussels was an imbalanced classification problem, an improved approach of twin support vector machines (TWSVM) was explored, which is based on a centered kernel alignment. The influences of parameters of the waveband selection algorithm and regularization hyperparameters of the improved TWSVM (ITWSVM) on the performance of models were analyzed. Compared to conventional SVM, TWSVM, and other state-of-the-art algorithms (such as multi-layer perceptron, extreme gradient boosting and adaptive boosting), our proposed model exhibited better performance in detecting DSP toxins and was little affected by the imbalance ratio. For the proposed model, the F-measure reached 0.9886, and detection accuracy reached 98.83%. We explored the physical basis for the detection model by analyzing the relationship between the occurrence of overtone and combination bands and selected wavebands. This study supports NIR spectroscopy as an innovative, rapid, and convenient analytical method to detect DSP toxins in mussels.