The effect of the degree of a polynomial kernel. The polynomial kernel of degree 1 leads to a linear separation (A). Higher-degree polynomial kernels allow a more flexible decision boundary (B,C). The style follows that of Figure 3.

Source publication

Support Vector Machines and Kernels for Computational Biology

Article

Full-text available

Nov 2008

Learning Ranking Functions for Geographic Information Retrieval Using Genetic Programming

Article

Full-text available

Mar 2009

Geographic Information Retrieval (GIR) has emerged as a new and promising tool for representation, storage, organisation of and access to geographic information. One of the current issues in GIR research is ranking of retrieved documents by both textual and geographic similarity measures. This paper describes an approach that learns GIR ranking fun...

Learning to rank for information retrieval using genetic programming

Article

Full-text available

Jul 2012

One central problem of information retrieval (IR) is to determine which documents are relevant and which are not to the user information need. This problem is practically handled by a ranking function which defines an ordering among documents according to their degree of relevance to the user query. This paper discusses work on using machine learni...

An investigation into event decay from large personal media archives

Article

Full-text available

Oct 2009

With the growth of digital lifelogging technologies there are challenges in terms of detecting and annotating real world events from this multimedia lifelog data. In this paper we use the SenseCam, a passively capturing wearable camera, worn around the neck, which captures about 3,000 photos per day, thereby creating a personal lifelog or visual re...

Average-Optimal Single and Multiple Approximate String Matching

Article

Full-text available

Dec 2004

We present a new algorithm for multiple approximate string matching. It is based on reading backwards enough ℓ-grams from text windows so as to prove that no occurrence can contain the part of the window read, and then shifting the window. We show analytically that our algorithm is optimal on average. Hence our first contribution is to fill an impo...

Fig. 1 The dramatic increase in the rate and amount of sequencing. a...

Fig. 3 a Multiple advances in alignment algorithms have contributed to...

Fig. 4 The number of faculty position hires at 51 US universities in...

a The cost breakdown of next generation sequencing projects. The total...

The real cost of sequencing: Scaling computation to keep pace with data generation

Article

Full-text available

Mar 2016

As the cost of sequencing continues to decrease and the amount of sequence data generated grows, new paradigms for data storage and analysis are increasingly important. The relative scaling behavior of these evolving technologies will impact genomics research moving forward.

Segmented Fractal and Central Symmetric LBP Based Texture Features for the Detection of Diabetic Retinopathy Using SVM

Article

Full-text available

Jun 2024

Diabetic retinopathy (DR) is a leading cause of blindness among diabetic patients worldwide. Early detection and timely intervention are crucial for preventing vision loss. In this paper, we propose a novel approach for the automated detection of diabetic retinopathy utilizing segmentation-based fractal texture analysis (SFTA) and center symmetric local binary pattern (CSLBP) features with support vector machine (SVM) classification. The proposed methodology begins with the preprocessing stage aimed at enhancing the image quality. This stage includes Gaussian and median filtering to reduce noise, contrast limited adaptive histogram equalization (CLAHE) to improve local contrast, and unsharp filtering to enhance image sharpness. Following preprocessing, we extract the SFTA features that provides a comprehensive analysis of the fractal properties of different image segments and encodes the complex geometric structures present in retinal images. Also, we extract the CSLBP features that provide discriminative texture features capturing local patterns within retinal image. The integration of SFTA and CSLBP features offers a robust representation of retinal texture characteristics, enhancing the detection accuracy of abnormalities associated with DR. We employ the SVM classifier, utilizing three different kernels: linear, quadratic, and Gaussian. This enables us to explore the effectiveness of various kernel functions in distinguishing between healthy and DR retinal images based on the extracted features. Experimental evaluation conducted on a publicly available dataset demonstrates the effectiveness of the proposed approach, achieving superior performance compared to state-of-the-art methods.

Geochemical characteristics and mapping of Reşadiye (Tokat-Türkiye) bentonite deposits using machine learning and sub-pixel mixture algorithms

Article

Apr 2024

BioDeepfuse: a hybrid deep learning approach with integrated feature extraction techniques for enhanced non-coding RNA classification

Article

Full-text available

Mar 2024
RNA BIOL

The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.

Machine Learning to Predict Enzyme–Substrate Interactions in Elucidation of Synthesis Pathways: A Review

Article

Full-text available

Mar 2024

Enzyme–substrate interactions play a fundamental role in elucidating synthesis pathways and synthetic biology, as they allow for the understanding of important aspects of a reaction. Establishing the interaction experimentally is a slow and costly process, which is why this problem has been addressed using computational methods such as molecular dynamics, molecular docking, and Monte Carlo simulations. Nevertheless, this type of method tends to be computationally slow when dealing with a large search space. Therefore, in recent years, methods based on artificial intelligence, such as support vector machines, neural networks, or decision trees, have been implemented, significantly reducing the computing time and covering vast search spaces. These methods significantly reduce the computation time and cover broad search spaces, rapidly reducing the number of interacting candidates, as they allow repetitive processes to be automated and patterns to be extracted, are adaptable, and have the capacity to handle large amounts of data. This article analyzes these artificial intelligence-based approaches, presenting their common structure, advantages, disadvantages, limitations, challenges, and future perspectives.

Machine learning for the prediction of proteolysis in Mozzarella and Cheddar cheese

Article

Mar 2024
FOOD BIOPROD PROCESS

Proteolysis is a complex biochemical event during cheese storage that affects both functionality and quality, yet there are few tools that can accurately predict proteolysis for Mozzarella and Cheddar cheese across a range of parameters and storage conditions. Machine learning models were developed with input features from the literature. A gradient boosting method outperformed random forest and support vector regression methods in predicting proteolysis for both Mozzarella (R2 = 92%) and Cheddar (R2 = 97%) cheese. Storage time was the most important input feature for both cheese types, followed by coagulating enzyme concentration and calcium content for Mozzarella cheese and fat or moisture content for Cheddar cheese. The ability to predict proteolysis could be useful for manufacturers, assisting in inventory management to ensure optimum Mozzarella functionality and Cheddar with a desired taste, flavor and texture; this approach may also be extended to other types of cheese.

Interpretable deep learning methods for multiview learning

Article

Full-text available

Feb 2024
BMC BIOINFORMATICS

Background Technological advances have enabled the generation of unique and complementary types of data or views (e.g. genomics, proteomics, metabolomics) and opened up a new era in multiview learning research with the potential to lead to new biomedical discoveries. Results We propose iDeepViewLearn (Interpretable Deep Learning Method for Multiview Learning) to learn nonlinear relationships in data from multiple views while achieving feature selection. iDeepViewLearn combines deep learning flexibility with the statistical benefits of data and knowledge-driven feature selection, giving interpretable results. Deep neural networks are used to learn view-independent low-dimensional embedding through an optimization problem that minimizes the difference between observed and reconstructed data, while imposing a regularization penalty on the reconstructed data. The normalized Laplacian of a graph is used to model bilateral relationships between variables in each view, therefore, encouraging selection of related variables. iDeepViewLearn is tested on simulated and three real-world data for classification, clustering, and reconstruction tasks. For the classification tasks, iDeepViewLearn had competitive classification results with state-of-the-art methods in various settings. For the clustering task, we detected molecular clusters that differed in their 10-year survival rates for breast cancer. For the reconstruction task, we were able to reconstruct handwritten images using a few pixels while achieving competitive classification accuracy. The results of our real data application and simulations with small to moderate sample sizes suggest that iDeepViewLearn may be a useful method for small-sample-size problems compared to other deep learning methods for multiview learning. Conclusion iDeepViewLearn is an innovative deep learning model capable of capturing nonlinear relationships between data from multiple views while achieving feature selection. It is fully open source and is freely available at https://github.com/lasandrall/iDeepViewLearn.

AntiBP3: A Method for Predicting Antibacterial Peptides against Gram-Positive/Negative/Variable Bacteria

Article

Full-text available

Feb 2024

Most of the existing methods developed for predicting antibacterial peptides (ABPs) are mostly designed to target either gram-positive or gram-negative bacteria. In this study, we describe a method that allows us to predict ABPs against gram-positive, gram-negative, and gram-variable bacteria. Firstly, we developed an alignment-based approach using BLAST to identify ABPs and achieved poor sensitivity. Secondly, we employed a motif-based approach to predict ABPs and obtained high precision with low sensitivity. To address the issue of poor sensitivity, we developed alignment-free methods for predicting ABPs using machine/deep learning techniques. In the case of alignment-free methods, we utilized a wide range of peptide features that include different types of composition, binary profiles of terminal residues, and fastText word embedding. In this study, a five-fold cross-validation technique has been used to build machine/deep learning models on training datasets. These models were evaluated on an independent dataset with no common peptide between training and independent datasets. Our machine learning-based model developed using the amino acid binary profile of terminal residues achieved maximum AUC 0.93, 0.98, and 0.94 for gram-positive, gram-negative, and gram-variable bacteria, respectively, on an independent dataset. Our method performs better than existing methods when compared with existing approaches on an independent dataset. A user-friendly web server, standalone package and pip package have been developed to facilitate peptide-based therapeutics.

Machine learning approach for Migraine Aura Complexity Score prediction based on magnetic resonance imaging data

Article

Full-text available

Dec 2023
J HEADACHE PAIN

Background Previous studies have developed the Migraine Aura Complexity Score (MACS) system. MACS shows great potential in studying the complexity of migraine with aura (MwA) pathophysiology especially when implemented in neuroimaging studies. The use of sophisticated machine learning (ML) algorithms, together with deep profiling of MwA, could bring new knowledge in this field. We aimed to test several ML algorithms to study the potential of structural cortical features for predicting the MACS and therefore gain a better insight into MwA pathophysiology. Methods The data set used in this research consists of 340 MRI features collected from 40 MwA patients. Average MACS score was obtained for each subject. Feature selection for ML models was performed using several approaches, including a correlation test and a wrapper feature selection methodology. Regression was performed with the Support Vector Machine (SVM), Linear Regression, and Radial Basis Function network. Results SVM achieved a 0.89 coefficient of determination score with a wrapper feature selection. The results suggest a set of cortical features, located mostly in the parietal and temporal lobes, that show changes in MwA patients depending on aura complexity. Conclusions The SVM algorithm demonstrated the best potential in average MACS prediction when using a wrapper feature selection methodology. The proposed method achieved promising results in determining MwA complexity, which can provide a basis for future MwA studies and the development of MwA diagnosis and treatment.

Human behavior in free search online shopping scenarios can be predicted from EEG activation using Hjorth parameters

Article

Full-text available

Nov 2023

The present work investigates whether and how decisions in real-world online shopping scenarios can be predicted based on brain activation. Potential customers were asked to search through product pages on e-commerce platforms and decide, which products to buy, while their EEG signal was recorded. Machine learning algorithms were then trained to distinguish between EEG activation when viewing products that are later bought or put into the shopping card as opposed to products that are later discarded. We find that Hjorth parameters extracted from the raw EEG can be used to predict purchase choices to a high level of accuracy. Above-chance predictions based on Hjorth parameters are achieved via different standard machine learning methods with random forest models showing the best performance of above 80% prediction accuracy in both 2-class (bought or put into card vs. not bought) and 3-class (bought vs. put into card vs. not bought) classification. While conventional EEG signal analysis commonly employs frequency domain features such as alpha or theta power and phase, Hjorth parameters use time domain signals, which can be calculated rapidly with little computational cost. Given the presented evidence that Hjorth parameters are suitable for the prediction of complex behaviors, their potential and remaining challenges for implementation in real-time applications are discussed.

Novel method combining multiscale attention entropy of overnight blood oxygen level and machine learning for easy sleep apnea screening

Article

Full-text available

Nov 2023

Zilu Liang

Objective Sleep apnea is a common sleep disorder affecting a significant portion of the population, but many apnea patients remain undiagnosed because existing clinical tests are invasive and expensive. This study aimed to develop a method for easy sleep apnea screening. Methods Three supervised machine learning algorithms, including logistic regression, support vector machine, and light gradient boosting machine, were applied to develop apnea screening models at two apnea–hypopnea index cutoff thresholds: ≥ 5 and ≥ 30 events/hours. The SpO2 recordings of the Sleep Heart Health Study database (N = 5786) were used for model training, validation, and test. Multiscale entropy analysis was performed to derive a set of multiscale attention entropy features from the SpO2 recordings. Demographic features including age, sex, body mass index, and blood pressure were also used. The dependency among the multiscale attention entropy features were handled with the independent component analysis. Results For cutoff ≥ 5/hours, logistic regression model achieved the highest Matthew’s correlation coefficient (0.402) and area under the curve (0.747), and reasonably good sensitivity (75.38%), specificity (74.02%), and positive predictive value (92.94%). For cutoff ≥ 30/hours, support vector machine model achieved the highest Matthew’s correlation coefficient (0.545) and area under the curve (0.823), and good sensitivity (82.00%), specificity (82.69%), and negative predictive value (95.53%). Conclusions Our models achieved better performance than existing methods and have the potential to be integrated with home-use pulse oximeters.

The effect of the degree of a polynomial kernel. The polynomial kernel of degree 1 leads to a linear separation (A). Higher-degree polynomial kernels allow a more flexible decision boundary (B,C). The style follows that of Figure 3.

Similar publications

Citations