Illustration diagram of linear SVM

A High-Precision Monitoring Method Based on SVM Regression for Multivariate Quantitative Analysis of PID Response to VOC Signals

Article

Full-text available

May 2024

In the moist environment of soil-water-air, there is a problem of low accuracy in monitoring volatile organic compounds (VOCs) using a photoionization detector (PID). This study is based on the PID water-soil-gas VOC online monitor developed by this group, online monitoring of the concentration of different constituents of VOCs in different production enterprises of the petroleum and chemical industries in Shandong Province, with the concentration of the laboratory test, to build a relevant model. The correlation coefficient about the PID test concentration and the actual concentration correlation coefficient was obtained through the collection of a large number of data trainings. Based on the application of PID in VOC monitoring, the establishment of a PID high-precision calibration model is important for the precise monitoring of VOCs. In this paper, multiple quantitative analyses were conducted, based on SVM regression of PID response to VOC signals, to study the high-precision VOC monitoring method. To select the response signals of PID under different concentrations of environmental VOCs measured by the research group, first, the PID response to VOC signals was modeled using the support vector machine principle to verify the effect of traditional SVM regression. For the problem of raw data redundancy, calculate the time-domain and frequency-domain characteristics of the PID signal, and conduct the principal component analysis of the time-domain of the PID signal. In order to make the SVM regression more generalized and robust, the selection of kernel function parameters and penalty factor of SVM is optimized by genetic algorithm. By comparing the accuracy of PID calibration models such as PID signal feature extraction, SVM regression, and principal component analysis SVM regression, the superiority of photoionization detector using the signal feature extraction PCA-GA-SVM method to monitor VOCs is verified.

Monitoring Method of VOCs Based on PID in Soil-water-gas Environment

Preprint

Full-text available

Jan 2024

In the moist environment of soil-water-air, there is a problem of low accuracy in monitoring Volatile Organic Compounds (VOCs) using a Photoionization Detector (PID). This paper analyzes the reasons for the low accuracy of the traditional Support Vector Machine (SVM) regression method. To address the issue, the PID signal is subjected to feature extraction and Principal Component Analysis (PCA) to reduce the data dimensionality. Moreover, the optimal SVM parameters are selected using a Genetic Algorithm (GA), and a combined approach of SVM regression with PCA and GA is utilized for PID signal regression analysis. And the effectiveness of the method is validated through extensive experiments and simulations. Furthermore, the influence of the sample quantity on the regression accuracy is analyzed, enabling accurate monitoring of VOCs concentration in a moist environment.

Cucumber diseases diagnosis based on multi-class SVM and electronic medical record

Article

Full-text available

Dec 2023
NEURAL COMPUT APPL

Cucumber is one of the most popular vegetable varieties, but leaf disease of cucumber is the key factor restricting the increase of yield. Common cucumber diseases include downy mildew, powdery mildew and gray mold. The plant electronic medical records (PEMRs) formed by “plant clinic” are the diagnosis record of the real disease occurrence issued by the plant doctor, which provides a new idea for the cucumber disease diagnosis. The efficient mining of prescription big data to facilitate precise diagnoses of crop pests and diseases represents an emerging challenge. The data mining technology represented by machine learning has attracted wide attention. Therefore, 15 diagnosis models are proposed to deal with this problem. Since SVM has many advantages including implementing the structural risk minimization principle and can effectively deal with the small sample data, five algorithms based on SVM have achieved better diagnosis performance in comparison with others. Moreover, the highest prediction accuracy is beyond 80%. In addition, the prediction performance has been improved after employing the undersampling technology for the imbalanced data. This means they are suitable for this cucumber diseases diagnosis.

Relabeling Noisy Labels: A Twin SVM Approach

Article

Full-text available

Nov 2023

In practical applications, supervised learning algorithms, including support vector machine (SVM), heavily rely on precise labeling to train predictive models. Nonetheless, real-world datasets often comprise mislabeled samples, which can have considerable influence on the performance of these algorithms. On the other hand, SVM suffers from computational costs when facing large-scale datasets. Twin support vector machine (TWSVM) tackles this issue and finds two nonparallel hyperplanes by solving two smaller models compared to SVM such that each one is closer to one of the two classes and is at least a unit distance far away from the samples of the other class. In this paper, to address label noise in datasets, we propose a TWSVM-based mixed-integer programming model for relabeling instances directly, while inheriting the advantages of TWSVM. Each model decides whether the samples of one class should be considered among instances that are as close as possible to its corresponding hyperplane. Therefore, each model exhibits the ability to recognize instances bearing close resemblance to one class while their assigned labels belong to the other one, prompting their reclassification. Conversely, instances demonstrating lower similarities to the other class retain their original labels. To show the efficiency of proposed models experiments are conducted on 12 UCI datasets.

A Robust Twin Parametric Margin Support Vector Machine for Multiclass Classification

Preprint

Jun 2023

In this paper we present a Twin Parametric-Margin Support Vector Machine (TPMSVM) model to tackle the problem of multiclass classification. In the spirit of one-versus-all paradigm, for each class we construct a classifier by solving a TPMSVM-type model. Once all classifiers have been determined, they are combined into an aggregate decision function. We consider the cases of both linear and nonlinear kernel-induced classifiers. In addition, we robustify the proposed approach through robust optimization techniques. Indeed, in real-world applications observations are subject to measurement errors and noise, affecting the quality of the solutions. Consequently, data uncertainties need to be included within the model in order to prevent low accuracies in the classification process. Preliminary computational experiments on real-world datasets show the good performance of the proposed approach.

Study on Gene Splicing Site Recognition Based on Particle Swarm Optimization Twin Support Vector Machine Algorithm for Smart Healthcare

Article

Full-text available

Apr 2023
WIREL COMMUN MOB COM

Gene splicing site recognition is a very important research topic in smart healthcare. Gene splicing site recognition is of great significance, not only for the large-scale and high-quality computational annotation of genomes but also for the analysis and recognition of the gene sequences evolutionary process. It is urgent to study a reliable and effective algorithm for gene splice site recognition. Traditional Twin Support Vector Machine (TWSVM) algorithm has advantages in solving small-sample, nonlinear, and high-dimensional problems, but it cannot deal with parameter selection well. To avoid the blindness of parameter selection, particle swarm optimization algorithm was used to find the optimal parameters of twin support vector machine. Therefore, a Particle Swarm Optimization Twin Support Vector Machine (PSO-TWSVM) algorithm for gene splicing site recognition was proposed in this paper. The proposed algorithm was compared with traditional Support Vector Machine algorithm, TWSVM algorithm, and Least Squares Support Vector Machine algorithm. The comparison results show that the positive sample recognition rate, negative sample recognition rate, and correlation coefficient (CC) of the proposed algorithm are the best among the four different support vector machine algorithms. The proposed algorithm effectively improves the recognition rate and the accuracy of splice sites. The comparison experiments verify the feasibility of the proposed algorithm.

Safe sample screening for robust twin support vector machine

Article

Full-text available

Mar 2023
APPL INTELL

Twin support vector machine (TSVM) definitely improves computational speed compared with the classical SVM, and has been widely used in classification and regression problems. However, two problems should be aroused. First, since the convex hinge loss function of TSVM is unbounded, the generalization performance of TSVM declines under the noisy environment. Second, TSVM is challenging to deal with large-scale data. To handle these problem, in this paper, we propose a new method named Safe Sample Screening for robust TSVM (SSS-RTSVM). As the ramp loss is bounded, robust TSVM clips the hinge loss in the traditional soft margin twin support vector machine to the ramp loss, and provides a pair of nonparallel proximal hyperplanes to achieve good anti-noise ability to noisy data and outlier data. However, the non-convex problem of robust TSVM can be considered as a DC programming problem which is computationally inefficient. Then we integrate safe sample screening rules for RTSVM based on the framework of concave-convex procedure (CCCP) to delete the most training samples, i.e., a subset of the samples called support vectors (SVs) is selected to reduce the computational cost without sacrificing the optimal accuracy. Notably, for the proposed SSS-RTSVM, the security guarantee is provided to the sample screening rule. Extensive experiments are conducted on several benchmark datasets to fully demonstrate the robustness and acceleration of the proposed method.

HC-DTTSVM: A Network Intrusion Detection Method Based on Decision Tree Twin Support Vector Machine and Hierarchical Clustering

Article

Full-text available

Mar 2023

Network intrusion detection is an important technology in national cyberspace security strategy and has become a research hotspot in various cyberspace security issues in recent years. The development of effective and efficient intelligent network intrusion detection methods using advanced machine learning algorithms is of great importance for defending against various network intrusions in complex network environments. In this study, a network intrusion detection method based on decision tree twin support vector machine and hierarchical clustering, named HC-DTTWSVM, is proposed, which can effectively detect different categories of network intrusion. First, the hierarchical clustering algorithm is applied to construct the decision tree for network traffic data, where the bottom-up merging approach is used to maximize the separation of the upper nodes of the decision tree, which reduces the error accumulation in the construction of the decision tree. Then, twin support vector machines are embedded in the constructed decision tree to implement the network intrusion detection model, which can effectively detect the network intrusion category in a top-down manner. The detection performance of the proposed HC-DTTWSVM method is evaluated on NSL-KDD and UNSW-NB15 intrusion detection benchmark datasets. Experimental results show that HC-DTTWSVM can effectively detect different categories of network intrusion and achieves comparable detection performance compared to some of the recently proposed network intrusion detection methods.

Linear Algorithms for Robust and Scalable Nonparametric Multiclass Probability Estimation

Article

Full-text available

Nov 2022
J Data Sci

Multiclass probability estimation is the problem of estimating conditional probabilities of a data point belonging to a class given its covariate information. It has broad applications in statistical analysis and data science. Recently a class of weighted Support Vector Machines (wSVMs) has been developed to estimate class probabilities through ensemble learning for K-class problems (Wu et al., 2010; Wang et al., 2019), where K is the number of classes. The estimators are robust and achieve high accuracy for probability estimation, but their learning is implemented through pairwise coupling, which demands polynomial time in K. In this paper, we propose two new learning schemes, the baseline learning and the One-vs-All (OVA) learning, to further improve wSVMs in terms of computational efficiency and estimation accuracy. In particular, the baseline learning has optimal computational complexity in the sense that it is linear in K. Though not the most efficient in computation, the OVA is found to have the best estimation accuracy among all the procedures under comparison. The resulting estimators are distribution-free and shown to be consistent. We further conduct extensive numerical experiments to demonstrate their finite sample performance.

Fast Detection of Diarrhetic Shellfish Poisoning Toxins in Mussels Using NIR Spectroscopy and Improved Twin Support Vector Machines

Article

Full-text available

Jun 2022

Diarrhetic shellfish poisoning (DSP) toxins are potent marine biotoxins. It can cause a severe gastrointestinal illness by the consumption of mussels contaminated by DSP toxins. New methods for effectively and rapidly detecting DSP toxins-contaminated mussels are required. In this study, we used near-infrared (NIR) reflection spectroscopy combined with pattern recognition methods to detect DSP toxins. In the range of 950-1700 nm, the spectral data of healthy mussels and DSP toxins-contaminated mussels were acquired. To select optimal waveband subsets, a waveband selection algorithm with a Gaussian membership function based on fuzzy rough set theory was applied. Considering that detecting DSP toxins-contaminated mussels from healthy mussels was an imbalanced classification problem, an improved approach of twin support vector machines (TWSVM) was explored, which is based on a centered kernel alignment. The influences of parameters of the waveband selection algorithm and regularization hyperparameters of the improved TWSVM (ITWSVM) on the performance of models were analyzed. Compared to conventional SVM, TWSVM, and other state-of-the-art algorithms (such as multi-layer perceptron, extreme gradient boosting and adaptive boosting), our proposed model exhibited better performance in detecting DSP toxins and was little affected by the imbalance ratio. For the proposed model, the F-measure reached 0.9886, and detection accuracy reached 98.83%. We explored the physical basis for the detection model by analyzing the relationship between the occurrence of overtone and combination bands and selected wavebands. This study supports NIR spectroscopy as an innovative, rapid, and convenient analytical method to detect DSP toxins in mussels.

Illustration diagram of linear SVM

Citations