Visualization of the third SVM model (polynomial kernel) on the example of the two circles.

Visualization of the third SVM model (polynomial kernel) on the example of the two circles.

Source publication
Article
Full-text available
Problem setting Support vector machines (SVMs) are very popular tools for classification, regression and other problems. Due to the large choice of kernels they can be applied with, a large variety of data can be analysed using these tools. Machine learning thanks its popularity to the good performance of the resulting models. However, interpreting...

Similar publications

Article
Full-text available
In this paper, we investigated the possibility to classify different performers playing the same melodies at the same manner being subjectively quite similar and very difficult to distinguish even for musically skilled persons. For resolving this problem we propose the use of multifractal (MF) analysis, which is proven as an efficient method for de...

Citations

... The CD8 + T cell exhaustion genes with prognostic potential in the TARGETs dataset were identified through Univariate Cox regression analysis (P<0.05). Subsequently, a combination of six machine learning algorithms was employed, which included the least absolute shrinkage and selection operator (LASSO) Cox regression algorithm (14), Boruta feature selection algorithm (15), survival support vector machine (survival-SVM) based on 10-fold cross-validation (16), Boosting in Cox regression (Cox-boost) (17), Extreme Gradient Boosting(XG-boost) (18), and generalized boosted regression modeling (GBM) (19), to further refine the valuable T cell exhaustion signature. In constructing the model, the output of the biomarkers from the machine learning models was intersected, followed by the utilization of multiple Cox regression to calculate the weight of each gene. ...
Article
Full-text available
Background T cell exhaustion in the tumor microenvironment has been demonstrated as a substantial contributor to tumor immunosuppression and progression. However, the correlation between T cell exhaustion and osteosarcoma (OS) remains unclear. Methods In our present study, single-cell RNA-seq data for OS from the GEO database was analysed to identify CD8+ T cells and discern CD8+ T cell subsets objectively. Subgroup differentiation trajectory was then used to pinpoint genes altered in response to T cell exhaustion. Subsequently, six machine learning algorithms were applied to develop a prognostic model linked with T cell exhaustion. This model was subsequently validated in the TARGETs and Meta cohorts. Finally, we examined disparities in immune cell infiltration, immune checkpoints, immune-related pathways, and the efficacy of immunotherapy between high and low TEX score groups. Results The findings unveiled differential exhaustion in CD8+ T cells within the OS microenvironment. Three genes related to T cell exhaustion (RAD23A, SAC3D1, PSIP1) were identified and employed to formulate a T cell exhaustion model. This model exhibited robust predictive capabilities for OS prognosis, with patients in the low TEX score group demonstrating a more favorable prognosis, increased immune cell infiltration, and heightened responsiveness to treatment compared to those in the high TEX score group. Conclusion In summary, our research elucidates the role of T cell exhaustion in the immunotherapy and progression of OS, the prognostic model constructed based on T cell exhaustion-related genes holds promise as a potential method for prognostication in the management and treatment of OS patients.
... These methods provide justification for the predictions made by ensemble models. Explanation by simplification, feature attribution, and visualizations have been commonly used to shed light on SVM models and understand how they make decisions (Van Belle et al., 2016;Shakerin and Gupta, 2020). Similarly, deep learning models, including Deep Neural Networks (DNN), Convolutional Neural Networks (CNN), and Recurrent Neural Networks (RNN), require specialized explainability methods. ...
Article
Full-text available
Explainable Artificial Intelligence (XAI) has gained significant attention as a means to address the transparency and interpretability challenges posed by black box AI models. In the context of the manufacturing industry, where complex problems and decision-making processes are widespread, the XMANAI platform emerges as a solution to enable transparent and trustworthy collaboration between humans and machines. By leveraging advancements in XAI and catering the prompt collaboration between data scientists and domain experts, the platform enables the construction of interpretable AI models that offer high transparency without compromising performance. This paper introduces the approach to building the XMANAI platform and highlights its potential to resolve the “transparency paradox” of AI. The platform not only addresses technical challenges related to transparency but also caters to the specific needs of the manufacturing industry, including lifecycle management, security, and trusted sharing of AI assets. The paper provides an overview of the XMANAI platform main functionalities, addressing the challenges faced during the development and presenting the evaluation framework to measure the performance of the delivered XAI solutions. It also demonstrates the benefits of the XMANAI approach in achieving transparency in manufacturing decision-making, fostering trust and collaboration between humans and machines, improving operational efficiency, and optimizing business value.
... The results and performance of the following data mining techniques were compared: Naive Bayes (NB) (Chhogyal & Nayak, 2016), Generalized Linear Model (GLM) (Dobson & Barnett, 2008), Logistic Regression (LR), Random Forest (RF) (Couronné et al., 2018), Deep Learning (DL) (Guo et al., 2016), Decision Tree (DT) (Jahan et al., 2018), and Support Vector Machine (SVM) ( Van et al., 2016), as these techniques are the most used for classification problems in related literature. The configuration of the algorithms was set as the default parameters given by the data mining system. ...
Article
Full-text available
Data mining can be applied to seek hidden information on large volumes of data. On Human Resources Management, it helps to identify reasons behind turnover and employee behavior. That knowledge leads to identify unwanted employee profiles and help to improve personnel selection processes, which are a media to reduce the turnover rate in companies. In this paper we analyzed the situation of a Human Resources Outsourcing company and tested various data mining techniques to compare which presented a better performance and had a better suitability to classify labor turnover on low skill employees of an outsourcing company. A limitation for this research was the partial absence of sociodemographic data in the employees data bases as well variables related to organizational climate and culture. Through the CRISP-DM methodology we created and evaluated different classification models and obtained a list of relevant characteristics of employee’s profiles prone to turnover. The results showed that Age, Salary, Location and Work Experience in Time and Area are key factors which help to classify turnover and can be used to suggest personnel selection policies to the company. This research implied the analysis of a Human Resources Outsourcing company and low skill employee’s data, of which little research has been done in both approaches. The results obtained can help other companies with low skill employees or even other Human Resources Outsourcing to have a framework of where to start to get data of employees and analysis the profiles prone to turnover
... Our chosen technique is fast to be implemented intra-operatory and straightforwardly to solve the superposition of intervals due to increased variability assumptions. The SVM classifier is a boundary decision (binary class label prediction) that finds a hyperplane that maximizes the margin distances between the neighbor's data (support vectors) to this hyperplane [33]. These characteristics gave an advantage for better pivot shift grading based on the accelerations previously classified by surgeons' experts, such as Fig. 1 illustrates in this study. ...
Article
Introduction: Rotatory laxity acceleration still lacks objective classification due to interval grading superposition, resulting in a biased pivot shift grading prior to the Anterior Cruciate Ligament (ACL) reconstruction. However, data analysis might help improve data grading in the operative room. Therefore, we described the improvement of the pivot-shift categorization in Gerdy's acceleration under anesthesia prior to ACL surgery using a support vector machine (SVM) classification, surgeon, and literature reference. Methods: Seventy-five patients (aged 30.3 ± 10.2 years, and IKDC 52.0 ± 16.5 points) with acute ACL rupture under anesthesia prior to ACL surgery were analyzed. Patients were graded with pivot-shift sign glide (+), clunk (++), and (+++) gross by senior orthopedic surgeons. At the same time, the tri-axial tibial plateau acceleration was measured. Categorical data were statistically described, and the accelerometry and categorical data were associated (α = 5%). A multiclass SVM kernel with the best accuracy trained by orthopedic surgeons and assisted from literature for missing data was compared with experienced surgeons and literature interval grading. The cubic SVM classifier achieved the best grading. Results: The intra-group proportions were different for each grading in the three compared strategies (p<0.001). The inter-group proportions were different for all comparisons (p<0.001). There were significant (p<0.001) associations (Tau: 0.69, -0.28, and -0.50) between the surgeon and SVM, the surgeon and interval grading, and the interval and SVM, respectively. Conclusion: The multiclass SVM classifier improves the acceleration categorization of the (+), (++), and (+++) pivot shift sign prior to the ACL surgery in agreement with surgeon criteria.
... However, it is difficult to derive 310 them from data in a computationally efficient manner. Some advances have been made in generating nomograms for flexible models applied to tabular data [41]. This approach has also been pursued with a constructive approach to infer from a trained MLP a model with univariate and bivariate effects, in the form of partial response networks [42,43]. ...
... An SVM model represents different classes in a hyperplane in multidimensional space and divides the datasets into classes through maximizing the margin of hyperplane. Methods have been developed to facilitate the interpretability of SVMs by visualizing results as nomograms, such as the nomogram method using decomposable kernels in SVMs [38], the nomogram representation by replacing the lines by color bars with colors offering the same interpretation as the length of the lines in conventional nomograms [39]. In addition, non-image features such as clinical findings could easily be integrated into conventional machine learning systems, which is particularly important in the development of medical AI solution. ...
Article
Full-text available
Purpose To compare a deep learning model with a radiomics model in differentiating high-grade (LR-3, LR-4, LR-5) liver imaging reporting and data system (LI-RADS) liver tumors from low-grade (LR-1, LR-2) LI-RADS tumors based on the contrast-enhanced magnetic resonance images. Methods Magnetic resonance imaging scans of 361 suspected hepatocellular carcinoma patients were retrospectively reviewed. Lesion volume segmentation was manually performed by two radiologists, resulting in 426 lesions from the training set and 83 lesions from the test set. The radiomics model was constructed using a support vector machine (SVM) with pre-defined features, which was first selected using Chi-square test, followed by refining using binary least absolute shrinkage and selection operator (LASSO) regression. The deep learning model was established based on the DenseNet. Performance of the models was quantified by area under the receiver-operating characteristic curve (AUC), accuracy, sensitivity, specificity and F1-score. Results A set of 8 most informative features was selected from 1049 features to train the SVM classifier. The AUCs of the radiomics model were 0.857 (95% confidence interval [CI] 0.816–0.888) for the training set and 0.879 (95% CI 0.779–0.935) for the test set. The deep learning method achieved AUCs of 0.838 (95% CI 0.799–0.871) for the training set and 0.717 (95% CI 0.601–0.814) for the test set. The performance difference between these two models was assessed by t -test, which showed the results in both training and test sets were statistically significant. Conclusion The deep learning based model can be trained end-to-end with little extra domain knowledge, while the radiomics model requires complex feature selection. However, this process makes the radiomics model achieve better performance in this study with smaller computational cost and more potential on model interpretability.
... Four machine learning algorithms, Boruta [37], eXtreme Gradient Boosting (XG-Boost) [38], support vector machine (SVM) [39] and Random Forest [40], were used to screen out the most valuable ICD-related genes for predicting prognosis at a pan-cancer level. Next, univariate Cox regression analysis were used to screen out the prognostic ICD-related genes considering important clinical characteristics including age, gender, T stage, N stage and M stage. ...
Article
Full-text available
Immunogenic cell death (ICD), a form of regulated cell death, is related to anticancer therapy. Due to the absence of widely accepted markers, characterizing ICD-related phenotypes across cancer types remained unexplored. Here, we defined the ICD score to delineate the ICD landscape across 33 cancerous types and 31 normal tissue types based on transcriptomic, proteomic and epigenetics data from multiple databases. We found that ICD score showed cancer type-specific association with genomic and immune features. Importantly, the ICD score had the potential to predict therapy response and patient prognosis in multiple cancer types. We also developed an ICD-related prognostic model by machine learning and cox regression analysis. Single-cell level analysis revealed intra-tumor ICD state heterogeneity and communication between ICD-based clusters of T cells and other immune cells in the tumor microenvironment in colon cancer. For the first time, we identified IGF2BP3 as a potential ICD regulator in colon cancer. In conclusion, our study provides a comprehensive framework for evaluating the relation between ICD and clinical relevance, gaining insights into identification of ICD as a potential cancer-related biomarker and therapeutic target.
... The SVM learning process was performed in two steps. First, cross-validation was used to optimize two radial basis function (RBF) Kernelbased hyperparameters, namely the overlapping penalty (C) between (0. 1 and 10.0) and Gamma (γ) between (0. 1 and 5.0) [21,25,41], based on the Bayesian optimizer tree-structured Parzen estimation (TPE) [42], wherein the Bayesian technique and optimization algorithm has been used before in other studies as well [43,44]. This was done to optimize the hyperparameters that yield the maximum accuracy as an objective function for 100 iterations. ...
Article
Full-text available
Impact hammer testing is a regular structure inspection method for detecting surface and internal damages. Inspectors use the sound from impact hammer testing to determine the damaged area. However, manual impact hammer testing cannot meet the reliable accuracy for small damages, such as concrete cracks, and due to the shortage of experienced workers, a reliable tool is needed to evaluate the hammering sound. Therefore, to improve the detection accuracy, this study proposes an automatic crack identification process of impact hammer testing. Three approaches are used to identify crack characteristics, such as width, depth, and location, based on fast Fourier transformation for the hammering sound. To determine the relationship between damaged and intact information values, the first and second approaches use dominant frequency (\(D_{f}\)) and frequency feature value (\(V_{f}\)), respectively, whereas the last one uses Mel-frequency cepstral coefficients (MFCCs). Six concrete specimens with different crack widths and depths were fabricated to validate the three approaches. The experimental results reveal that although \(D_{f}\) can to detect the damage, it cannot classify its depth and width. Furthermore, \(V_{f}\) indicates the cracks, which are 20-mm deep. Three different artificial-intelligence classification algorithms were used to validate the MFCC approach, fuzzy rule, gradient boosted trees, and support vector machine (SVM). The three algorithms are applied and evaluated to enhance the acoustic impact hammer testing. The results reveal that the SVM algorithm confirms the ability and effectiveness for accurately identifying the concrete fine cracks that are 0.2-mm wide and 40-mm deep.
... New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall. It was implemented, in this work, based on [34]. ...
... We used four datasets; two for testing the sarcasm and two for testing the overall system. ISarcas [27] and Rillof [34] datasets are used for sarcasm evaluation. iSar-casm contains 4,484 tweets, out of which 777 are labeled as sarcastic and 3,707 as nonsarcastic. ...
Article
Full-text available
Sentiment analysis (SA) is one of the most important tasks in the natural language processing (NLP) field. Many researchers have been trying to build an efficient SA system for many applications such as detection of terrorist activities, customer support management, analyzing customer feedback, market research, competitive research, and many others. Almost all these researches deal with a classification task, and they tried to solve one or two of the challenges that face this task. In this research, a complete SA system was constructed for processing five main challenges in SA and studying the effect of each one on the system. These challenges are the processing of negation, multi-polarity of words, text multi-polarity, semantically ambiguous words (exact meaning), and sarcasm. Some solutions were introduced for these challenges for getting high accuracy. It is the first work for studying the effect of these five challenges collectively with novel solutions to some of them. For aspect-based SA, three different classifiers were used; support vector machine (SVM), maximum entropy (MaxEnt), and long short-term memory (LSTM). The best results for f-measure were 0.833, 0.852, and 0.875 using the LSTM method on Foursquare ABSA, SemEval2014 (laptop and restaurant) datasets, respectively. We induced the most effective negation processing on the SA system from many test scenarios, followed by multi-polarity of words, semantically ambiguous words, text multi-polarity, and sarcasm, respectively. Also, the proposed sarcasm processing technique was evaluated on two annotated corpora with sarcasm, iSarcasm, and Rillof datasets, before applying it on SA.
... (i) hklncRNAs significantly upregulated in immune cell lines and downregulated in GBM cell lines were defined as TIIClncRNAs. (ii) Six machine learning algorithms, including least absolute shrinkage and selection operator regularized logistic regression (LassoLR) [21,22], Boruta [23], Xgboost [24], support vector machine (SVM) [25], Random Forest [26] and prediction analysis for microarrays (pamr) [27], were further used to screen out the most valuable TIIClncRNAs by extracting the intersected TIIClncRNAs identified from six machine learning algorithms. (iii) Univariate Cox regression analysis was further used to screen out the prognostic TIIClncRNAs in the TCGA GBM dataset. ...
Article
Long noncoding ribonucleic acids (RNAs; lncRNAs) have been associated with cancer immunity regulation. However, the roles of immune cell-specific lncRNAs in glioblastoma (GBM) remain largely unknown. In this study, a novel computational framework was constructed to screen the tumor-infiltrating immune cell-associated lncRNAs (TIIClnc) for developing TIIClnc signature by integratively analyzing the transcriptome data of purified immune cells, GBM cell lines and bulk GBM tissues using six machine learning algorithms. As a result, TIIClnc signature could distinguish survival outcomes of GBM patients across four independent datasets, including the Xiangya in-house dataset, and more importantly, showed superior performance than 95 previously established signatures in gliomas. TIIClnc signature was revealed to be an indicator of the infiltration level of immune cells and predicted the response outcomes of immunotherapy. The positive correlation between TIIClnc signature and CD8, PD-1 and PD-L1 was verified in the Xiangya in-house dataset. As a newly demonstrated predictive biomarker, the TIIClnc signature enabled a more precise selection of the GBM population who would benefit from immunotherapy and should be validated and applied in the near future.