Article

Support vector machine classification and validation of cancer tissue sample using microarray expression data

January 2000

January 2000
16(10):906-914

Authors:

Terrence S Furey

University of North Carolina at Chapel Hill

Show all 6 authorsHide

Classification of intra-genomic helitrons based on features extracted from different orders of FCGS

Article

Full-text available

Nov 2019

Helitrons, eukaryotic transposable elements (TEs), were discovered 18 years ago in various genomes. In the Cænorhabditis elegans (C.elegans) genome, helitron sequences have high variability in terms of size by base pairs (bp) varied from 11 to 8965 bp from one sequence to another. These TEs are not uniformly dispersed sequences, and they have the ability to mobilize within a genome by a rolling-circle mechanism. This ability to move and reproduce in genomes enables these elements to play a major role in genomic evolution. In order to follow the evolution, we predicted helitron families (10 classes) in the C.elegans genome using the combination of the features extracted from signals corresponding to DNA sequences and the Support Vector Machine (SVM) classifier. In our classification system, the features extracted from the signals were shown to be efficient to automatically predict helitronic sequences. As a result, the Gaussian radial kernel over 100-fold cross-validation gave the best accuracy rates, ranging from 68% to 97%, with an overall mean score of 83.7%, and we successfully identified the Helitron Y1A class for a specific value of c and gamma, reaching an accuracy rate of 100%. In addition, other notable helitrons (NDNAX2, NDNAX3 Helitron_Y2) were predicted with interesting accuracy rates.

秩和基因选取方法及其在肿瘤诊断中的应用

Article

Jul 2004

Exonic Variants in Aging-Related Genes Are Predictive of Phenotypic Aging Status

Article

Full-text available

Dec 2019

Background: Recent studies investigating longevity have revealed very few convincing genetic associations with increased lifespan. This is, in part, due to the complexity of biological aging, as well as the limited power of genome-wide association studies, which assay common single nucleotide polymorphisms (SNPs) and require several thousand subjects to achieve statistical significance. To overcome such barriers, we performed comprehensive DNA sequencing of a panel of 20 genes previously associated with phenotypic aging in a cohort of 200 individuals, half of whom were clinically defined by an “early aging” phenotype, and half of whom were clinically defined by a “late aging” phenotype based on age (65–75 years) and the ability to walk up a flight of stairs or walk for 15 min without resting. A validation cohort of 511 late agers was used to verify our results. Results: We found early agers were not enriched for more total variants in these 20 aging-related genes than late agers. Using machine learning methods, we identified the most predictive model of aging status, both in our discovery and validation cohorts, to be a random forest model incorporating damaging exon variants [Combined Annotation-Dependent Depletion (CADD) > 15]. The most heavily weighted variants in the model were within poly(ADP-ribose) polymerase 1 (PARP1) and excision repair cross complementation group 5 (ERCC5), both of which are involved in a canonical aging pathway, DNA damage repair. Conclusion: Overall, this study implemented a framework to apply machine learning to identify sequencing variants associated with complex phenotypes such as aging. While the small sample size making up our cohort inhibits our ability to make definitive conclusions about the ability of these genes to accurately predict aging, this study offers a unique method for exploring polygenic associations with complex phenotypes.

Reinforcing Artificial Neural Networks through Traditional Machine Learning Algorithms for Robust Classification of Cancer

Article

Full-text available

Mar 2023
CMC-COMPUT MATER CON

Machine Learning (ML)-based prediction and classification systems employ data and learning algorithms to forecast target values. However, improving predictive accuracy is a crucial step for informed decision-making. In the healthcare domain, data are available in the form of genetic profiles and clinical characteristics to build prediction models for complex tasks like cancer detection or diagnosis. Among ML algorithms, Artificial Neural Networks (ANNs) are considered the most suitable framework for many classification tasks. The network weights and the activation functions are the two crucial elements in the learning process of an ANN. These weights affect the prediction ability and the convergence efficiency of the network. In traditional settings, ANNs assign random weights to the inputs. This research aims to develop a learning system for reliable cancer prediction by initializing more realistic weights computed using a supervised setting instead of random weights. The proposed learning system uses hybrid and traditional machine learning techniques such as Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Random Forest (RF), k-Nearest Neighbour (kNN), and ANN to achieve better accuracy in colon and breast cancer classification. This system computes the confusion matrix-based metrics for traditional and proposed frameworks. The proposed framework attains the highest accuracy of 89.24 percent using the colon cancer dataset and 72.20 percent using the breast cancer dataset, which outperforms the other models. The results show that the proposed learning system has higher predictive accuracies than conventional classifiers for each dataset, overcoming previous research limitations. Moreover, the proposed framework is of use to predict and classify cancer patients accurately. Consequently, this will facilitate the effective management of cancer patients.

Development of an Efficient Image Processing Technique for Wheat Disease Detection

Article

Full-text available

Sep 2018

Application of Advanced Non-Linear Spectral Decomposition and Regression Methods for Spectroscopic Analysis of Targeted and Non-Targeted Irradiation Effects in an In-Vitro Model

Article

Full-text available

Oct 2022
INT J MOL SCI

Irradiation of the tumour site during treatment for cancer with external-beam ionising radiation results in a complex and dynamic series of effects in both the tumour itself and the normal tissue which surrounds it. The development of a spectral model of the effect of each exposure and interaction mode between these tissues would enable label free assessment of the effect of radiotherapeutic treatment in practice. In this study Fourier-transform Infrared microspectroscopic imaging was employed to analyse an in-vitro model of radiotherapeutic treatment for prostate cancer, in which a normal cell line (PNT1A) was exposed to low-dose X-ray radiation from the scattered treatment beam, and also to irradiated cell culture medium (ICCM) from a cancer cell line exposed to a treatment relevant dose (2Gy). Various exposure modes were studied and reference was made to previously acquired data on cellular survival and DNA double strand break damage. Spectral analysis with manifold methods, linear spectral fitting, non-linear classification and non-linear regression approaches were found to accurately segregate spectra on irradiation type and provide a comprehensive set of spectral markers which differentiate on irradiation mode and cell fate. The study demonstrates that high dose irradiation, low-dose scatter irradiation and radiation-induced bystander exposure (RIBE) signalling each produce differential effects on the cell which are observable through spectroscopic analysis.

Plasma protein biomarkers for the detection of pancreatic neuroendocrine tumors and differentiation from small intestinal neuroendocrine tumors

Article

Full-text available

Jun 2022

There is an unmet need for novel biomarkers to diagnose and monitor patients with neuroendocrine neoplasms. The EXPLAIN study explores a multi plasma protein and supervised machine learning (SML) strategy to improve the diagnosis of pancreatic (PanNET) and differentiate them from small intestinal neuroendocrine tumours (SI‐NET). At time of diagnosis blood samples were collected and analysed from 39 patients with PanNET, 135 with SI‐NET, (WHO Grade 1–2) and 144 controls. Exclusion criteria were other malignant diseases, chronic inflammatory diseases, reduced kidney or liver function. Proseek Oncology‐II (OLink) was used to measure 92 cancer related plasma proteins. Chromogranin A (CgA) was analysed separately. Median age in all groups was 65–67 years and with a similar gender distribution (Female; PanNET 51%, SI‐NET 42%, controls 42%). Tumour grade (G1/G2): PanNET 39/61%, SI‐NET 46/54%. Patients with liver metastases: PanNET 78%, SI‐NET 63%. The classification model of PanNET versus controls provided a sensitivity (SEN) of 0.84, specificity (SPE) 0.98, positive predictive value (PPV) of 0.92 and negative predictive value (NPV) of 0.95, and area under ROC (AUROC) of 0.99; the model for the discrimination of PanNET versus SI‐NET providing a SEN 0.61, SPE 0.96, PPV 0.83, NPV 0.90 and AUROC 0.98). These results suggest that a multi plasma protein strategy can significantly improve diagnostic accuracy of PanNET and SI‐NET. This article is protected by copyright. All rights reserved.

Active disease-related compound identification based on capsule network

Article

Jan 2022
BRIEF BIOINFORM

Pneumonia, especially corona virus disease 2019 (COVID-19), can lead to serious acute lung injury, acute respiratory distress syndrome, multiple organ failure and even death. Thus it is an urgent task for developing high-efficiency, low-toxicity and targeted drugs according to pathogenesis of coronavirus. In this paper, a novel disease-related compound identification model-based capsule network (CapsNet) is proposed. According to pneumonia-related keywords, the prescriptions and active components related to the pharmacological mechanism of disease are collected and extracted in order to construct training set. The features of each component are extracted as the input layer of capsule network. CapsNet is trained and utilized to identify the pneumonia-related compounds in Qingre Jiedu injection. The experiment results show that CapsNet can identify disease-related compounds more accurately than SVM, RF, gcForest and forgeNet.

Drug Components-Disease Network Related to Acute Lung Injury Inference Based on Forest Graph-embedded Deep Feedforward Network

Preprint

Full-text available

Jan 2022

Background: Acute lung injury (ALI) is a serious respiratory disease, which can lead to acute respiratory failure or death. It is closely related to the pathogenesis of New Coronavirus pneumonia (COVID-19). Many researches showed that traditional Chinese medicine (TCM) had a good effect on its intervention, and network pharmacology could play a very important role. Results: In order to construct "disease-gene-target-drug" interaction network more accurately, deep learning algorithm is utilized in this paper. Two ALI-related target genes (REAL and SATA3) are considered, and the active and inactive compounds of the two corresponding target genes are collected as training data, respectively. Molecular descriptors and molecular fingerprints are utilized to characterize each compound. Forest graph embedded deep feed forward network (forgeNet) is proposed to train and identify 19 compounds in Erhuang decoction (EhD) and Dexamethasone (DXMS). Conclusions: The experiment results show that forgeNet performs better than support vector machines (SVM), random forest (RF) and gcForest.

Efficient Computer Aided Diagnosis System for Hepatic Tumors Using Computed Tomography Scans

Article

Full-text available

Jan 2022

One of the leading causes of mortality worldwide is liver cancer. The earlier the detection of hepatic tumors, the lower the mortality rate. This paper introduces a computer-aided diagnosis system to extract hepatic tumors from computed tomography scans and classify them into malignant or benign tumors. Segmenting hepatic tumors from computed tomography scans is considered a challenging task due to the fuzziness in the liver pixel range, intensity values overlap between the liver and neighboring organs, high noise from computed tomography scanner, and large variance in tumors shapes. The proposed method consists of three main stages; liver segmentation using Fast Generalized Fuzzy C-Means, tumor segmentation using dynamic threshold-ing, and the tumor's classification into malignant/benign using support vector machines classifier. The performance of the proposed system was evaluated using three liver benchmark datasets, which are MICCAI-Sliver07, LiTS17, and 3Dircadb. The proposed computer adided diagnosis system achieved an average accuracy of 96.75%, sensetivity of 96.38%, specificity of 95.20% and Dice similarity coefficient of 95.13%.

Do Support Vector Machines Play a Role in Stratifying Patient Population Based on Cancer Biomarkers?

Article

Full-text available

Jan 2021

Biomarkers are known to be the key driver behind targeted cancer therapies by either stratifying the patients into risk categories or identifying patient subgroups most likely to benefit. However, the ability of a biomarker to stratify patients relies heavily on the type of clinical endpoint data being collected. Of particular interest is the scenario when the biomarker involved is a continuous one where the challenge is often to identify cut-offs or thresholds that would stratify the population according to the level of clinical outcome or treatment benefit. On the other hand, there are well-established Machine Learning (ML) methods such as the Support Vector Machines (SVM) that classify data, both linear as well as non-linear, into subgroups in an optimal way. SVMs have proven to be immensely useful in data-centric engineering and recently researchers have also sought its applications in healthcare. Despite their wide applicability, SVMs are not yet in the mainstream of toolkits to be utilised in observational clinical studies or in clinical trials. This research investigates the very role of SVMs in stratifying the patient population based on a continuous biomarker across a variety of datasets. Based on the mathematical framework underlying SVMs, we formulate and fit algorithms in the context of biomarker stratified cancer datasets to evaluate their merits. The analysis reveals their superior performance for certain data-types when compared to other ML methods suggesting that SVMs may have the potential to provide a robust yet simplistic solution to stratify real cancer patients based on continuous biomarkers, and hence accelerate the identification of subgroups for improved clinical outcomes or guide targeted cancer therapies.

Prediction of 131i Therapeutic Dose and Prognosis in Hyperthyroidism Patients Using Mechanical Learning Model

Preprint

Full-text available

Sep 2021

Objective Multiple mechanical learning models were used to predict the therapeutic dose of ¹³¹ I radionuclide in patients with hyperthyroidism, and to compare the calculation results of each prediction model to obtain the optimal model for dose prediction. Meanwhile, the classification model was used to classify the prognosis of the existing clinical hyperthyroidism case data in order to evaluate the administration results and provide reference for the dose given by clinicians.Methods According to the data of hyperthyroidism patients treated with ¹³¹ I in nuclear medicine department of many hospitals, a prediction model was established based on MATLAB. Firstly, the prediction results of BP neural network, radial basis function (RBF) neural network and support vector machine (SVM) were compared with small sample data, and then the optimal model was selected to predict the drug dose. BP-AdaBoost, SVM and random forest were used to classify the patients after recovery and evaluate whether the dose was accurate.ResultsThe average errors of BP neural network, RBF neural network and SVM models trained with small samples were 6.58%, 17.25% and 14.09% respectively. After comparison, BP neural network was selected to establish the prediction model. The data of 30 cases were randomly selected to verify BP neural network, and average error of the prediction results was 11.99%. Using SVM, BP-AdaBoost and random forest models, 100 groups of case data were selected as the training set and 10 groups as the test set. The classification accuracy were 80%, 90% and 100% respectively. The random forest model with the highest accuracy was selected as the large sample prediction. When 318 groups of cases were trained and 35 groups of cases were used for the test, the classification accuracy was 97.14%.Conclusion This study compared the prediction effects of various prediction models on ¹³¹ I therapeutic dose in patients with hyperthyroidism and the accuracy of prognosis classification. BP neural network and random forest achieved the best results respectively. The two models provide reference for clinicians when giving the dose, which has clinical practical significance.

Application of Machine Learning Methods for Pallet Loading Problem

Article

Full-text available

Sep 2021

Because of continuous competition in the corporate industrial sector, numerous companies are always looking for strategies to ensure timely product delivery to survive against their competitors. For this reason, logistics play a significant role in the warehousing, shipments, and transportation of the products. Therefore, the high utilization of resources can improve the profit margins and reduce unnecessary storage or shipping costs. One significant issue in shipments is the Pallet Loading Problem (PLP) which can generally be solved by seeking to maximize the total number of boxes to be loaded on a pallet. In many previous studies, various solutions for the PLP have been suggested in the context of logistics and shipment delivery systems. In this paper, a novel two-phase approach is presented by utilizing a number of Machine Learning (ML) models to tackle the PLP. The dataset utilized in this study was obtained from the DHL supply chain system. According to the training and testing of various ML models, our results show that a very high (>85%) Pallet Utilization Volume (PUV) was obtained, and an accuracy of >89% was determined to predict an accurate loading arrangement of boxes on a suitable pallet. Furthermore, a comprehensive analysis of all the results on the basis of a comparison of several ML models is provided in order to show the efficacy of the proposed methodology.

New perspectives of hyperspectral imaging for clinical research

Article

Full-text available

Jul 2021

New developments in instrumentation and data analysis have further improved the perspectives of hyperspectral imaging in clinical use. Thus, hyperspectral imaging can be considered as "Next Generation Imaging" for future clinical research. As a contactless, non-invasive method with short process times of just a few seconds, it quantifies predefined substance classes. Results of hyperspectral imaging may support the detection of carcinomas and the classification of different tissue structures as well as the assessment of tissue blood flow. Taken together, this method combines the principle of spectros-copy with imaging using conventional visual cameras. Compared to other optical imaging methods, hyperspectral imaging also analyses deeper layers of tissue.

Increased Global-Brain Functional Connectivity Is Associated with Dyslipidemia and Cognitive Impairment in First-Episode, Drug-Naive Patients with Bipolar Disorder

Article

Full-text available

Jun 2021
Neural Plast

Objectives: Previous researches have demonstrated that abnormal functional connectivity (FC) is associated with the pathophysiology of bipolar disorder (BD). However, inconsistent results were obtained due to different selections of regions of interest in previous researches. This study is aimed at examining voxel-wise brain-wide functional connectivity (FC) alterations in the first-episode, drug-naive patient with BD in an unbiased way. Methods: A total of 35 patients with BD and 37 age-, sex-, and education-matched healthy controls underwent resting-state functional magnetic resonance imaging (rs-fMRI). Global-brain FC (GFC) was applied to analyze the image data. Support vector machine (SVM) was adopted to probe whether GFC abnormalities could be used to identify the patients from the controls. Results: Patients with BD exhibited increased GFC in the left inferior frontal gyrus (LIFG), pars triangularis and left precuneus (PCu)/superior occipital gyrus (SOG). The left PCu belongs to the default mode network (DMN). Furthermore, increased GFC in the LIFG, pars triangularis was positively correlated with the triglycerides (TG) and low-density lipoprotein cholesterol (LDL-C) and negatively correlated with the scores of the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) coding test and Stroop color. Increased GFC values in the left PCu/SOG can be applied to discriminate patients from controls with preferable sensitivity (80.00%), specificity (75.68%), and accuracy (77.78%). Conclusions: This study found increased GFC in the brain regions of DMN; LIFG, pars triangularis; and LSOG, which was associated with dyslipidemia and cognitive impairment in patients with BD. Moreover, increased GFC values in the left PCu/SOG may be utilized as a potential biomarker to differentiate patients with BD from controls.

Analisis Klasifikasi Data Tracer Study Dengan Support Vector Machine Dan Neural Network

Article

Full-text available

Mar 2021

Perguruan tinggi secara rutin melakukan Tracer study setiap tahun yang berguna sebagai pemenuhan kebutuhan data akreditasi, perbaikan pembelajaran dan pengembangan kurikulum di perguruan tinggi agar bisa meningkatkan kualitas lulusan. Kualitas lulusan dapat dilihat dari kelancaran memperoleh pekerjaan setelah mahasiswa lulus dari perguruan tinggi. Semakin lancar lulusan memperoleh pekerjaan maka kualitas lulusan dianggap baik, sebaliknya semakin tidak lancar maka kualitas lulusan dianggap belum atau kurang baik. Penelitian ini bertujuan melakukan analisis klasifikasi waktu tunggu kerja untuk mengetahui tingkat kelancaran alumni dalam mendapatkan pekerjaan menggunakan metode klasifikasi Support Vector Machines (SVM) dan Backpropagation Neural Network (BPNN). Kedua metode klasifikasi baik BPNN dan SVM dengan fungsi Kernel Anova dapat menggambarkan klasifikasi data tracer study berdasarkan tingkat kelancaran alumni untuk mendapatkan pekerjaan (lancar dan tidak lancar) dengan tingkat akurasi yang hampir sama, yaitu sebesar 83,33% untuk tangkat akurasi BPNN dan 83,00% untuk tingkat akutasi SVM. Diharapkan dengan mengetahui faktor yang dapat mengklasifikasikan tingkat kelancaran dalam mendapatkan pekerjaan, pihak universitas dapat memberikan kebijakan yang relevan sehingga kualitas lulusan akan semakin baik.

Machine learning and deep learning for clinical data and PET/SPECT imaging in Parkinson's disease: a review

Article

Full-text available

Feb 2021
IET IMAGE PROCESS

Machine Learning (ML) is a subfield of Artificial Intelligence (AI) that is increasingly applied to several medical diagnosis tasks, including a wide range of diseases. Importantly, various ML models were developed to address the complexity of Parkinson’s Disease (PD) diagnosis. PD is a neurodegenerative disease characterized by motor and non-motor disorders where its syndromes affect the daily lives of patients. Several Computer Aided Diagnosis and Detection (CADD) systems based on hand-crafted ML algorithms achieved promising results in distinguishing PD patients from Healthy Control (HC) subjects and other Parkinsonian syndrome categories using clinical data (e.g., speech and gait impairments) and medical imaging [e.g., Position Emission Tomography (PET) and Single Photon Emission Computed Tomography (SPECT)]. Despite the good performance of hand-crafted ML algorithms, there is still a problem linked to the features’ extraction and selection. In fact, Deep Learning DL has provided an ultimate solution for the features’ extraction and selection related issue. An important number of studies on the diagnosis of PD using DL algorithms were developed recently. This study provides an overview of the application of hand-crafted ML algorithms and DL techniques for PD diagnosis. It also introduces key concepts for understanding the application of ML methods to diagnose PD.

面向合成生物学的机器学习方法及应用

Article

Full-text available

Feb 2021

机器学习的目标是设计可以根据先验知识和观测数据不断改进其性能的算法. 该算法可以帮助机器从大量的数据中提取知识, 从而提升其在特定任务上的性能. 作为数据驱动的方法, 机器学习可以有效利用高通量实验技术产生的大批量生物数据, 实现合成生物体的功能预测与智能化设计, 改变合成生物学的研究范式. 本文首先介绍机器学习在合成生物学领域广泛应用的几个模型及方法, 如支持向量机、神经网络、生成式对抗网络、深度强化学习等. 然后介绍机器学习方法在合成生物学领域的典型应用, 如启动子预测、酶催化设计、代谢途径构建、基因线路设计等. 本文综述面向合成生物学的机器学习方法及应用, 并试图启发读者如何选择和设计机器学习方法用于合成生物学的研究.

Research Status of Gesture Recognition Based on Vision: A Review

Article

Full-text available

Jan 2021

Gesture control, as a new type of interactive method, has the characteristics of rich expression, convenient control, and fast. It has huge application prospects in entertainment, home furnishing, and industry. Gesture recognition is the basis of gesture control. Gesture recognition technology based on visual detection acquires gesture information in a non-contact manner, which enables the operator to have a better operating experience and is favored by scholars at home and abroad. In order to fully understand the existing research methods of visual gesture recognition, firstly, the basic process of visual gesture recognition is explained. According to the principle of the gesture recognition method, it is divided into gesture recognition based on traditional methods and gesture recognition based on deep learning. And the specific methods are analyzed and summarized in detail. Finally, the technical difficulties of visual gesture recognition are analyzed and discussed, and the development trend of gesture recognition based on vision is prospected.

Building Gene Networks by Analyzing Gene Expression Profiles

Chapter

Jan 2019

Crescenzio Gallo

The possible applications of modeling and simulation in the field of bioinformatics are very extensive, ranging from understanding basic metabolic paths to exploring genetic variability. Experimental results carried out with DNA microarrays allow researchers to measure expression levels for thousands of genes simultaneously, across different conditions and over time. A key step in the analysis of gene expression data is the detection of groups of genes that manifest similar expression patterns. In this chapter, the authors examine various methods for analyzing gene expression data, addressing the important topics of (1) selecting the most differentially expressed genes, (2) grouping them by means of their relationships, and (3) classifying samples based on gene expressions.

Methods for multiple outcome meta-analysis of gene-expression data

Article

Full-text available

Feb 2020

Meta-analysis is a valuable tool for the synthesis of evidence across a wide range study types including high-throughput experiments such as genome-wide association studies (GWAS) and gene expression studies. There are situations though, in which we have multiple outcomes or multiple treatments, in which the multivariate meta-analysis framework which performs a joint modeling of the different quantities of interest may offer important advantages, such as increasing statistical power and allowing performing global tests. In this work we adapted the multivariate meta-analysis method and applied it in gene expression data. With this method we can test for pleiotropic effects, that is, for genes that influence both outcomes or discover genes that have a change in expression not detectable in the univariate method. We tested this method on data regarding inflammatory bowel disease (IBD), with its two main forms, Crohn’s disease (CD) and Ulcerative colitis (UC), sharing many clinical manifestations, but differing in the location and extent of inflammation and in complications. The Stata code is given in the Appendix and it is available at: www.compgen.org/tools/multivariate-microarrays. • Multivariate meta-analysis method for gene expression data. • Discover genes with pleiotropic effects. • Differentially Expressed Genes (DEGs) identification in complex traits. Method name: Multivariate meta-analysis, Keywords: Multiple outcome, Pleiotropic effects, Microarrays, Meta-analysis

What Is Machine Learning: a Primer for the Epidemiologist

Article

Sep 2019

Machine learning is a branch of computer science that has the potential to transform epidemiological sciences. Amid a growing focus on "Big Data," it offers epidemiologists new tools to tackle problems for which classical methods are not well-suited. In order to critically evaluate the value of integrating machine learning algorithms and existing methods, however, it is essential to address language and technical barriers between the two fields that can make it difficult for epidemiologists to read and assess machine learning studies. Here, we provide an overview of the concepts and terminology used in machine learning literature, which encompasses a diverse set of tools with goals ranging from prediction, to classification, to clustering. We provide a brief introduction to five common machine learning algorithms and four ensemble-based approaches. We then summarize epidemiological applications of machine learning techniques in the published literature. We recommend approaches to incorporate machine learning in epidemiological research and discuss opportunities and challenges for integrating machine learning and existing epidemiological research methods.

Spam profiles detection on social networks using computational intelligence methods: The effect of the lingual context

Article

Full-text available

Aug 2019
J INF SCI

In online social networks, spam profiles represent one of the most serious security threats over the Internet; if they do not stop producing bad advertisements, they can be exploited by criminals for various purposes. This article addresses the nature and the characteristics of spam profiles in a social network like Twitter to improve spam detection, based on a number of publicly available language-independent features. In order to investigate the effectiveness of these features in spam detection, four datasets are extracted for four different language contexts (i.e. Arabic, English, Korean and Spanish), and a fifth is formed by combining them all. We conduct our experiments using a set of five well-known classification algorithms in spam detection field, k-Nearest Neighbours ( k-NN), Random Forest (RF), Naive Bayes (NB), Decision Tree (DT) (J48) and Multilayer Perceptron (MLP) classifiers, along with five filter-based feature selection methods, namely, Information Gain, Chi-square, ReliefF, Correlation and Significance. The results show oscillating performance of each classifier across all datasets, but improved classification results with feature selection. In addition, detailed analysis and comparisons are carried out on two different levels: in the first level, we compare the selected features’ importance among the feature selection methods, whereas in the second level, we observe the relations and the importance of the selected features across all datasets. The findings of this article lead to a better understanding of social spam and improving detection methods by considering the various important features resulting from the different lingual contexts.

Exploiting the concept level feature for enhanced name entity recognition in Chinese EMRs

Article

Full-text available

Aug 2020
J SUPERCOMPUT

The accumulation and explosive growth of the electronic medical records (EMRs) make the name entity recognition (NER) technologies become critical for the meaningful use of EMR data and then the practice of evidence-based medicine. The dominate NER approaches use the distributed representation of the words and characters to build deep learning-based NER models. However, for the task of biomedical named entity recognition, there are a large amount of complicated medical terminologies that are composed of multiple words. Splitting these terminologies to learn the word and character embeddings might cause semantic ambiguities. In this paper, we treat each medical terminology as a concept and propose a concept-enhanced named entity recognition model (CNER), where the features from three different granularities (i.e., concept, word, and character) are combined together for bio-NER. The extensive experiments are conducted on two real-world corpora: fully labeled corpus and partially labeled corpus. CNER achieves the highest F1 score (fully labeled corpus: precision = 88.23, recall = 88.29, and F1 = 88.26; partially labeled corpus: precision = 87.03, recall = 88.19, and F1 = 87.61) by outperforming the baseline CW-BLSTM-CRF approach for 0.58% and 1.15% respectively, which demonstrates the effectiveness of the proposed approach.

Automated, Efficient, and Accelerated Knowledge Modeling of the Cognitive Neuroimaging Literature Using the ATHENA Toolkit

Article

Full-text available

May 2019

Neuroimaging research is growing rapidly, providing expansive resources for synthesizing data. However, navigating these dense resources is complicated by the volume of research articles and variety of experimental designs implemented across studies. The advent of machine learning algorithms and text-mining techniques has advanced automated labeling of published articles in biomedical research to alleviate such obstacles. As of yet, a comprehensive examination of document features and classifier techniques for annotating neuroimaging articles has yet to be undertaken. Here, we evaluated which combination of corpus (abstract-only or full-article text), features (bag-of-words or Cognitive Atlas terms), and classifier (Bernoulli naïve Bayes, k-nearest neighbors, logistic regression, or support vector classifier) resulted in the highest predictive performance in annotating a selection of 2,633 manually annotated neuroimaging articles. We found that, when utilizing full article text, data-driven features derived from the text performed the best, whereas if article abstracts were used for annotation, features derived from the Cognitive Atlas performed better. Additionally, we observed that when features were derived from article text, anatomical terms appeared to be the most frequently utilized for classification purposes and that cognitive concepts can be identified based on similar representations of these anatomical terms. Optimizing parameters for the automated classification of neuroimaging articles may result in a larger proportion of the neuroimaging literature being annotated with labels supporting the meta-analysis of psychological constructs.

Microbiome Learning Repo (ML Repo): A public repository of microbiome regression and classification tasks

Article

Full-text available

May 2019

The use of machine learning in high-dimensional biological applications, such as the human microbiome, has grown exponentially in recent years, but algorithm developers often lack the domain expertise required for interpretation and curation of the heterogeneous microbiome datasets. We present Microbiome Learning Repo (ML Repo, available at https://knights-lab.github.io/MLRepo/), a public, web-based repository of 33 curated classification and regression tasks from 15 published human microbiome datasets. We highlight the use of ML Repo in several use cases to demonstrate its wide application, and we expect it to be an important resource for algorithm developers.

Discrimination of the Contextual Features of Top Performers in Scientific Literacy Using a Machine Learning Approach

Article

Full-text available

Sep 2021
RES SCI EDUC

Science excellence is associated not only with a student’s inherent aptitude but also a range of contextual factors. The objective of this paper was to identify the most important contextual characteristics of top performers in scientific literacy, by simultaneously considering factors at the PISA questionnaire-based student, family, and school levels. The data were based on the science scores of 380,771 PISA 2015 secondary students from 58 countries/economies, of whom 25,181 were top performers at proficiency level 5 or 6, as well as the responses of students and school principals to PISA questionnaires. Overall, 141 contextual variables (derived from the questionnaire responses) were ranked according to their relevance to top performers through a machine learning algorithm—specifically, support vector machine recursive feature elimination (SVM-RFE). An optimal set of 20 features (factors/variables) was then selected from the ranked list due to the high accuracy of these features in classifying and predicting top performers compared to non-top performers based on the support vector machine (SVM) classifier. The research findings indicate that the quality of teachers’ instructional practices, parents’ educational/occupational status, disciplinary climate, time spent on and involvement in learning, schools’ mass media facilities/equipment, the quantity of teachers in the school, and students’ self-efficacy played the most predictive roles in the target students’ superior performance in science. The features identified in this study may provide important information for the future studies on students’ performance in science literacy.

Social complexity, life-history and lineage influence the molecular basis of castes in vespid wasps

Article

Full-text available

Feb 2023

A key mechanistic hypothesis for the evolution of division of labour in social insects is that a shared set of genes co-opted from a common solitary ancestral ground plan (a genetic toolkit for sociality) regulates caste differentiation across levels of social complexity. Using brain transcriptome data from nine species of vespid wasps, we test for overlap in differentially expressed caste genes and use machine learning models to predict castes using different gene sets. We find evidence of a shared genetic toolkit across species representing different levels of social complexity. We also find evidence of additional fine-scale differences in predictive gene sets, functional enrichment and rates of gene evolution that are related to level of social complexity, lineage and of colony founding. These results suggest that the concept of a shared genetic toolkit for sociality may be too simplistic to fully describe the process of the major transition to sociality.

Issue 8 www.jetir.org (ISSN-2349-5162)

Article

Full-text available

Aug 2018

Machine learning multi-omics analysis reveals cancer driver dysregulation in pan-cancer cell lines compared to primary tumors

Article

Full-text available

Dec 2022

Cancer cell lines have been widely used for decades to study biological processes driving cancer development, and to identify biomarkers of response to therapeutic agents. Advances in genomic sequencing have made possible large-scale genomic characterizations of collections of cancer cell lines and primary tumors, such as the Cancer Cell Line Encyclopedia (CCLE) and The Cancer Genome Atlas (TCGA). These studies allow for the first time a comprehensive evaluation of the comparability of cancer cell lines and primary tumors on the genomic and proteomic level. Here we employ bulk mRNA and micro-RNA sequencing data from thousands of samples in CCLE and TCGA, and proteomic data from partner studies in the MD Anderson Cell Line Project (MCLP) and The Cancer Proteome Atlas (TCPA), to characterize the extent to which cancer cell lines recapitulate tumors. We identify dysregulation of a long non-coding RNA and microRNA regulatory network in cancer cell lines, associated with differential expression between cell lines and primary tumors in four key cancer driver pathways: KRAS signaling, NFKB signaling, IL2/STAT5 signaling and TP53 signaling. Our results emphasize the necessity for careful interpretation of cancer cell line experiments, particularly with respect to therapeutic treatments targeting these important cancer pathways.

Article

Full-text available

Nov 2022

Acute lung injury (ALI) is a serious respiratory disease, which can lead to acute respiratory failure or death. It is closely related to the pathogenesis of New Coronavirus pneumonia (COVID-19). Many researches showed that traditional Chinese medicine (TCM) had a good effect on its intervention, and network pharmacology could play a very important role. In order to construct "disease-gene-target-drug" interaction network more accurately, deep learning algorithm is utilized in this paper. Two ALI-related target genes (REAL and SATA3) are considered, and the active and inactive compounds of the two corresponding target genes are collected as training data, respectively. Molecular descriptors and molecular fingerprints are utilized to characterize each compound. Forest graph embedded deep feed forward network (forgeNet) is proposed to train. The experimental results show that forgeNet performs better than support vector machines (SVM), random forest (RF), logical regression (LR), Naive Bayes (NB), XGBoost, LightGBM and gcForest. forgeNet could identify 19 compounds in Erhuang decoction (EhD) and Dexamethasone (DXMS) more accurately.

AI in Medicine: Some Pertinent History

Chapter

Full-text available

Nov 2022

The history of artificial intelligence in medicine (AIM) is intimately tied to the history of AI itself, since some of the earliest work in applied AI dealt with biomedicine. This chapter first provides a brief overview of the early history of AI, but then focuses on AI in medicine (and in human biology) and provides a summary of how the field has evolved since the earliest recognition of the potential role of computers in the modeling of medical reasoning and in the support of clinical decision making. The growth of medical AI has been influenced not only by the evolution of AI itself, but also by the remarkable changes in computing and communication technologies. Accordingly, this chapter anticipates many of the topics that are covered in subsequent chapters, providing a concise overview that lays out the concepts and progression that are reflected in the rest of this volume.KeywordsArtificial intelligence historyAIM historyAI winterRoles of knowledge and data in AIMModeling expertiseExpert systemsData scienceMachine learningAIM and clinical decision support

Artificial intelligence in clinical applications for lung cancer: diagnosis, treatment and prognosis

Article

Jun 2022

Artificial Intelligence (AI) is a branch of computer science that includes research in robotics, language recognition, image recognition, natural language processing, and expert systems. AI is poised to change medical practice, and oncology is not an exception to this trend. As the matter of fact, lung cancer has the highest morbidity and mortality worldwide. The leading cause is the complexity of associating early pulmonary nodules with neoplastic changes and numerous factors leading to strenuous treatment choice and poor prognosis. AI can effectively enhance the diagnostic efficiency of lung cancer while providing optimal treatment and evaluating prognosis, thereby reducing mortality. This review seeks to provide an overview of AI relevant to all the fields of lung cancer. We define the core concepts of AI and cover the basics of the functioning of natural language processing, image recognition, human-computer interaction and machine learning. We also discuss the most recent breakthroughs in AI technologies and their clinical application regarding diagnosis, treatment, and prognosis in lung cancer. Finally, we highlight the future challenges of AI in lung cancer and its impact on medical practice.

Performance Analysis of Nature‐Inspired Algorithms in Breast Cancer Diagnosis

Chapter

Dec 2021

Nature-inspired computing (NIC) is a fascinating computing paradigm that applies the methodology and approaches of nature that addresses various realtime complex problems ranging from how an organism finds its prey to genetic evolution. One of the unique features is that it has been provisioned with a decentralized control of computational activities naturally. In this chapter, to have a better insight of such NIC-based algorithms, the problem of identifying the breast cancer is used to exhibit their performances. Also, this chapter briefs about the application of stand-alone and hybridized approaches to identify the disease. Finally, it concludes with the experimental results and other statistical measures. The purpose of the chapter is to guide the new researcher in the area to get inspired from the conventional works and to bring out a new advanced approach that can perform further better. The three swarm algorithms are ant colony optimization, firefly, and particle swarm optimization algorithm. These swarm algorithms were used to optimize the support vector machine (SVM) which was trained to classify the malignant and benign images from the Wisconsin breast cancer dataset. With respect to the experimental results, it has been found that naïve algorithm with PSO optimization demonstrates better discriminating property of the underlying conventional classifiers.

Predicting the Gender of Individuals with Tinnitus based on Daily Life Data of the TrackYourTinnitus mHealth Platform

Article

Full-text available

Sep 2021

Tinnitus is an auditory phantom perception in the absence of an external sound stimulation. People with tinnitus often report severe constraints in their daily life. Interestingly, indications exist on gender differences between women and men both in the symptom profile as well as in the response to specific tinnitus treatments. In this paper, data of the TrackYourTinnitus platform (TYT) were analyzed to investigate whether the gender of users can be predicted. In general, the TYT mobile Health crowdsensing platform was developed to demystify the daily and momentary variations of tinnitus symptoms over time. The goal of the presented investigation is a better understanding of gender-related differences in the symptom profiles of users from TYT. Based on two questionnaires of TYT, four machine learning based classifiers were trained and analyzed. With respect to the provided daily answers, the gender of TYT users can be predicted with an accuracy of 81.7%. In this context, worries, difficulties in concentration, and irritability towards the family are the three most important characteristics for predicting the gender. Note that in contrast to existing studies on TYT, daily answers to the worst symptom question were firstly investigated in more detail. It was found that results of this question significantly contribute to the prediction of the gender of TYT users. Overall, our findings indicate gender-related differences in tinnitus and tinnitus-related symptoms. Based on evidence that gender impacts the development of tinnitus, the gathered insights can be considered relevant and justify further investigations in this direction.

Structure and dynamics of financial networks by feature ranking method

Article

Full-text available

Sep 2021

Much research has been done on time series of financial market in last two decades using linear and non-linear correlation of the returns of stocks. In this paper, we design a method of network reconstruction for the financial market by using the insights from machine learning tool. To do so, we analyze the time series of financial indices of S&P 500 around some financial crises from 1998 to 2012 by using feature ranking approach where we use the returns of stocks in a certain day to predict the feature ranks of the next day. We use two different feature ranking approaches—Random Forest and Gradient Boosting—to rank the importance of each node for predicting the returns of each other node, which produces the feature ranking matrix. To construct threshold network, we assign a threshold which is equal to mean of the feature ranking matrix. The dynamics of network topology in threshold networks constructed by new approach can identify the financial crises covered by the monitored time series. We observe that the most influential companies during global financial crisis were in the sector of energy and financial services while during European debt crisis, the companies are in the communication services. The Shannon entropy is calculated from the feature ranking which is seen to increase over time before market crash. The rise of entropy implies the influences of stocks to each other are becoming equal, can be used as a precursor of market crash. The technique of feature ranking can be an alternative way to infer more accurate network structure for financial market than existing methods, can be used for the development of the market.

Early prognosis of respiratory virus shedding in humans

Article

Full-text available

Aug 2021

This paper addresses the development of predictive models for distinguishing pre-symptomatic infections from uninfected individuals. Our machine learning experiments are conducted on publicly available challenge studies that collected whole-blood transcriptomics data from individuals infected with HRV, RSV, H1N1, and H3N2. We address the problem of identifying discriminatory biomarkers between controls and eventual shedders in the first 32 h post-infection. Our exploratory analysis shows that the most discriminatory biomarkers exhibit a strong dependence on time over the course of the human response to infection. We visualize the feature sets to provide evidence of the rapid evolution of the gene expression profiles. To quantify this observation, we partition the data in the first 32 h into four equal time windows of 8 h each and identify all discriminatory biomarkers using sparsity-promoting classifiers and Iterated Feature Removal. We then perform a comparative machine learning classification analysis using linear support vector machines, artificial neural networks and Centroid-Encoder. We present a range of experiments on different groupings of the diseases to demonstrate the robustness of the resulting models.

Predicting Students’ Problem Solving Performance using Support Vector Machine

Article

Mar 2021
J Data Sci

Young-Jin Lee

Feature selection with ensemble learning for prostate cancer diagnosis from microarray gene expression

Article

Full-text available

Feb 2021
Health Informat J

Cancer diagnosis using machine learning algorithms is one of the main topics of research in computer-based medical science. Prostate cancer is considered one of the reasons that are leading to deaths worldwide. Data analysis of gene expression from microarray using machine learning and soft computing algorithms is a useful tool for detecting prostate cancer in medical diagnosis. Even though traditional machine learning methods have been successfully applied for detecting prostate cancer, the large number of attributes with a small sample size of microarray data is still a challenge that limits their ability for effective medical diagnosis. Selecting a subset of relevant features from all features and choosing an appropriate machine learning method can exploit the information of microarray data to improve the accuracy rate of detection. In this paper, we propose to use a correlation feature selection (CFS) method with random committee (RC) ensemble learning to detect prostate cancer from microarray data of gene expression. A set of experiments are conducted on a public benchmark dataset using 10-fold cross-validation technique to evaluate the proposed approach. The experimental results revealed that the proposed approach attains 95.098% accuracy, which is higher than related work methods on the same dataset.

LogSum + L2 penalized logistic regression model for biomarker selection and cancer classification

Article

Full-text available

Dec 2020

Biomarker selection and cancer classification play an important role in knowledge discovery using genomic data. Successful identification of gene biomarkers and biological pathways can significantly improve the accuracy of diagnosis and help machine learning models have better performance on classification of different types of cancer. In this paper, we proposed a LogSum + L2 penalized logistic regression model, and furthermore used a coordinate decent algorithm to solve it. The results of simulations and real experiments indicate that the proposed method is highly competitive among several state-of-the-art methods. Our proposed model achieves the excellent performance in group feature selection and classification problems.

Unsupervised and supervised learning with neural network for human transcriptome analysis and cancer diagnosis

Article

Full-text available

Nov 2020

Deep learning analysis of images and text unfolds new horizons in medicine. However, analysis of transcriptomic data, the cause of biological and pathological changes, is hampered by structural complexity distinctive from images and text. Here we conduct unsupervised training on more than 20,000 human normal and tumor transcriptomic data and show that the resulting Deep-Autoencoder, DeepT2Vec, has successfully extracted informative features and embedded transcriptomes into 30-dimensional Transcriptomic Feature Vectors (TFVs). We demonstrate that the TFVs could recapitulate expression patterns and be used to track tissue origins. Trained on these extracted features only, a supervised classifier, DeepC, can effectively distinguish tumors from normal samples with an accuracy of 90% for Pan-Cancer and reach an average 94% for specific cancers. Training on a connected network, the accuracy is further increased to 96% for Pan-Cancer. Together, our study shows that deep learning with autoencoder is suitable for transcriptomic analysis, and DeepT2Vec could be successfully applied to distinguish cancers, normal tissues, and other potential traits with limited samples.

Machine learning approach to integrated endometrial transcriptomic datasets reveals biomarkers predicting uterine receptivity in cattle at seven days after estrous

Article

Full-text available

Oct 2020

The main goal was to apply machine learning (ML) methods on integrated multi-transcriptomic data, to identify endometrial genes capable of predicting uterine receptivity according to their expression patterns in the cow. Public data from five studies were re-analyzed. In all of them, endometrial samples were obtained at day 6-7 of the estrous cycle, from cows or heifers of four different European breeds, classified as pregnant (n = 26) or not (n = 26). First, gene selection was performed through supervised and unsupervised ML algorithms. Then, the predictive ability of potential key genes was evaluated through support vector machine as classifier, using the expression levels of the samples from all the breeds but one, to train the model, and the samples from that one breed, to test it. Finally, the biological meaning of the key genes was explored. Fifty genes were identified, and they could predict uterine receptivity with an overall 96.1% accuracy, despite the animal's breed and category. Genes with higher expression in the pregnant cows were related to circadian rhythm, Wnt receptor signaling pathway, and embryonic development. This novel and robust combination of computational tools allowed the identification of a group of biologically relevant endometrial genes that could support pregnancy in the cattle.

Overall survival prediction of non-small cell lung cancer by integrating microarray and clinical data with deep learning

Article

Full-text available

Mar 2020

Non-small cell lung cancer (NSCLC) is one of the most common lung cancers worldwide. Accurate prognostic stratification of NSCLC can become an important clinical reference when designing therapeutic strategies for cancer patients. With this clinical application in mind, we developed a deep neural network (DNN) combining heterogeneous data sources of gene expression and clinical data to accurately predict the overall survival of NSCLC patients. Based on microarray data from a cohort set (614 patients), seven well-known NSCLC biomarkers were used to group patients into biomarker- and biomarker+ subgroups. Then, by using a systems biology approach, prognosis relevance values (PRV) were then calculated to select eight additional novel prognostic gene biomarkers. Finally, the combined 15 biomarkers along with clinical data were then used to develop an integrative DNN via bimodal learning to predict the 5-year survival status of NSCLC patients with tremendously high accuracy (AUC: 0.8163, accuracy: 75.44%). Using the capability of deep learning, we believe that our prediction can be a promising index that helps oncologists and physicians develop personalized therapy and build the foundation of precision medicine in the future.

Fractional Anisotropy changes in Parahippocampal Cingulum due to Alzheimer’s Disease

Article

Full-text available

Feb 2020

Current treatments for Alzheimer’s disease are only symptomatic and limited to reduce the progression rate of the mental deterioration. Mild Cognitive Impairment, a transitional stage in which the patient is not cognitively normal but do not meet the criteria for specific dementia, is associated with high risk for development of Alzheimer’s disease. Thus, non-invasive techniques to predict the individual’s risk to develop Alzheimer’s disease can be very helpful, considering the possibility of early treatment. Diffusion Tensor Imaging, as an indicator of cerebral white matter integrity, may detect and track earlier evidence of white matter abnormalities in patients developing Alzheimer’s disease. Here we performed a voxel-based analysis of fractional anisotropy in three classes of subjects: Alzheimer’s disease patients, Mild Cognitive Impairment patients, and healthy controls. We performed Support Vector Machine classification between the three groups, using Fisher Score feature selection and Leave-one-out cross-validation. Bilateral intersection of hippocampal cingulum and parahippocampal gyrus (referred as parahippocampal cingulum) is the region that best discriminates Alzheimer’s disease fractional anisotropy values, resulting in an accuracy of 93% for discriminating between Alzheimer’s disease and controls, and 90% between Alzheimer’s disease and Mild Cognitive Impairment. These results suggest that pattern classification of Diffusion Tensor Imaging can help diagnosis of Alzheimer’s disease, specially when focusing on the parahippocampal cingulum.

Designing and Evaluating Deep Learning Models for Cancer Detection on Gene Expression Data

Chapter

Jan 2020

Transcription profiling enables researchers to understand the activity of the genes in various experimental conditions; in human genomics, abnormal gene expression is typically correlated with clinical conditions. An important application is the detection of genes which are most involved in the development of tumors, by contrasting normal and tumor cells of the same patient. Several statistical and machine learning techniques have been applied to cancer detection; more recently, deep learning methods have been attempted, but they have typically failed in meeting the same performance as classical algorithms. In this paper, we design a set of deep learning methods that can achieve similar performance as the best machine learning methods thanks to the use of external information or of data augmentation; we demonstrate this result by comparing the performance of new methods against several baselines.

Novel Approach for Plant Disease Detection Based on Textural Feature Analysis

Chapter

Jan 2020

The image processing is the technique which can propose the information stored in the form of pixels. The plant disease detection is the technique which can detect the disease from the leaf. The plant disease detection algorithms have various steps like preprocessing, feature extraction, segmentation, and classification. The KNN classifier technique is applied which can classify input data into certain classes. The performance of KNN classifier is compared with the existing techniques and it is analyzed that KNN classifier has high accuracy, less fault detection as compared to other techniques. This paper presents methods that use digital image processing techniques to detect, quantify, and classify plant diseases from digital images in the visible spectrum. In plant leaf classification leaf is classified based on its different morphological features. Some of the classification techniques used are neural network, genetic algorithm, support vector machine, and principal component analysis. In this paper results are compared between KNN classifier and SVM classifier.

A database for using machine learning and data mining techniques for coronary artery disease diagnosis

Article

Full-text available

Oct 2019

We present the coronary artery disease (CAD) database, a comprehensive resource, comprising 126 papers and 68 datasets relevant to CAD diagnosis, extracted from the scientific literature from 1992 and 2018. These data were collected to help advance research on CAD-related machine learning and data mining algorithms, and hopefully to ultimately advance clinical diagnosis and early treatment. To aid users, we have also built a web application that presents the database through various reports.

MetaNN: accurate classification of host phenotypes from metagenomic data using neural networks

Article

Full-text available

Jun 2019
BMC BIOINFORMATICS

Background Microbiome profiles in the human body and environment niches have become publicly available due to recent advances in high-throughput sequencing technologies. Indeed, recent studies have already identified different microbiome profiles in healthy and sick individuals for a variety of diseases; this suggests that the microbiome profile can be used as a diagnostic tool in identifying the disease states of an individual. However, the high-dimensional nature of metagenomic data poses a significant challenge to existing machine learning models. Consequently, to enable personalized treatments, an efficient framework that can accurately and robustly differentiate between healthy and sick microbiome profiles is needed. Results In this paper, we propose MetaNN (i.e., classification of host phenotypes from Metagenomic data using Neural Networks), a neural network framework which utilizes a new data augmentation technique to mitigate the effects of data over-fitting. Conclusions We show that MetaNN outperforms existing state-of-the-art models in terms of classification accuracy for both synthetic and real metagenomic data. These results pave the way towards developing personalized treatments for microbiome related diseases.

Preoperative Prediction of Axillary Lymph Node Metastasis in Breast Cancer Using Mammography-Based Radiomics Method

Article

Full-text available

Mar 2019

It is difficult to accurately assess axillary lymph nodes metastasis and the diagnosis of axillary lymph nodes in patients with breast cancer is invasive and has low-sensitivity preoperatively. This study aims to develop a mammography-based radiomics nomogram for the preoperative prediction of ALN metastasis in patients with breast cancer. This study enrolled 147 patients with clinicopathologically confirmed breast cancer and preoperative mammography. Features were extracted from each patient’s mammography images. The least absolute shrinkage and selection operator regression method was used to select features and build a signature in the primary cohort. The performance of the signature was assessed using support vector machines. We developed a nomogram by incorporating the signature with the clinicopathologic risk factors. The nomogram performance was estimated by its calibration ability in the primary and validation cohorts. The signature was consisted of 10 selected ALN-status-related features. The AUC of the signature from the primary cohort was 0.895 (95% CI, 0.887–0.909) and 0.875 (95% CI, 0.698–0.891) for the validation cohort. The C-Index of the nomogram from the primary cohort was 0.779 (95% CI, 0.752–0.793) and 0.809 (95% CI, 0.794–0.833) for the validation cohort. Our nomogram is a reliable and non-invasive tool for preoperative prediction of ALN status and can be used to optimize current treatment strategy for breast cancer patients.

Using Machine Learning to Predict Sensorineural Hearing Loss Based on Perilymph Micro RNA Expression Profile

Article

Full-text available

Mar 2019

Hearing loss (HL) is the most common neurodegenerative disease worldwide. Despite its prevalence, clinical testing does not yield a cell or molecular based identification of the underlying etiology of hearing loss making development of pharmacological or molecular treatments challenging. A key to improving the diagnosis of inner ear disorders is the development of reliable biomarkers for different inner ear diseases. Analysis of microRNAs (miRNA) in tissue and body fluid samples has gained significant momentum as a diagnostic tool for a wide variety of diseases. In previous work, we have shown that miRNA profiling in inner ear perilymph is feasible and may demonstrate distinctive miRNA expression profiles unique to different diseases. A first step in developing miRNAs as biomarkers for inner ear disease is linking patterns of miRNA expression in perilymph to clinically available metrics. Using machine learning (ML), we demonstrate we can build disease specific algorithms that predict the presence of sensorineural hearing loss using only miRNA expression profiles. This methodology not only affords the opportunity to understand what is occurring on a molecular level, but may offer an approach to diagnosing patients with active inner ear disease.

The perceptron: A probabilistic model for information storage and organization in the brain [J]

Article

Full-text available

Nov 1958

F. Rosenblatt

To answer the questions of how information about the physical world is sensed, in what form is information remembered, and how does information retained in memory influence recognition and behavior, a theory is developed for a hypothetical nervous system called a perceptron. The theory serves as a bridge between biophysics and psychology. It is possible to predict learning curves from neurological variables and vice versa. The quantitative statistical approach is fruitful in the understanding of the organization of cognitive systems. 18 references.

Tissue Classification with Gene Expression Profiles

Article

Full-text available

Apr 2000
J COMPUT BIOL

Constantly improving gene expression profiling technologies are expected to provide understanding and insight into cancer related cellular processes. Gene expression data is also A preliminary version of this work appeared in Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, 2000. y Part this work was done while Amir Ben-Dor was at University of Washington, Seattle, with support from the Program for Mathematics and Molecular Biology (PMMB). Nir Friedman and Iftach Nachman were supported through the generosity of the Michael Sacher Trust and Israeli Science Foundation equipment grant. Part of this work was done while Zohar Yakhini was visiting the Computer Science Department at the Technion. z Contact author. 1 expected to significantly aid in the development of efficient cancer diagnosis and classification platforms. In this work we examine three sets of gene expression data measured across sets of tumor(s) and normal clinical sa...

Knowledge-Based Analysis of Microarray Gene-Expression Data by Using Support Vector Machines

Article

Full-text available

Jan 2000

We introduce a method of functionally classifying genes by using gene expression data from DNA microarray hybridization experiments. The method is based on the theory of support vector machines (SVMs). SVMs are considered a supervised computer learning method because they exploit prior knowledge of gene function to identify unknown genes of similar function from expression data. SVMs avoid several problems associated with unsupervised clustering methods, such as hierarchical clustering and self-organizing maps. SVMs have many mathematical features that make them attractive for gene expression analysis, including their flexibility in choosing a similarity function, sparseness of solution when dealing with large data sets, the ability to handle large feature spaces, and the ability to identify outliers. We test several SVMs that use different similarity metrics, as well as some other supervised learning methods, and find that the SVMs best identify sets of genes with a common function using expression data. Finally, we use SVMs to predict functional roles for uncharacterized yeast ORFs based on their expression data.

Cellular Gene Expression Altered by Human Cytomegalovirus: Global Monitoring with Oligonucleotide Arrays

Article

Full-text available

Dec 1998

Mechanistic insights to viral replication and pathogenesis generally have come from the analysis of viral gene products, either by studying their biochemical activities and interactions individually or by creating mutant viruses and analyzing their phenotype. Now it is possible to identify and catalog the host cell genes whose mRNA levels change in response to a pathogen. We have used DNA array technology to monitor the level of ≈6,600 human mRNAs in uninfected as compared with human cytomegalovirus-infected cells. The level of 258 mRNAs changed by a factor of 4 or more before the onset of viral DNA replication. Several of these mRNAs encode gene products that might play key roles in virus-induced pathogenesis, identifying them as intriguing targets for further study.

Use of a cDNA microarray to analyse gene expression patterns in human cancer

Article

Full-text available

Jan 1997

The development and progression of cancer and the experimental reversal of tumorigenicity are accompanied by complex changes in patterns of gene expression. Microarrays of cDNA provide a powerful tool for studying these complex phenomena. The tumorigenic properties of a human melanoma cell line, UACC-903, can be suppressed by introduction of a normal human chromosome 6, resulting in a reduction of growth rate, restoration of contact inhibition, and suppression of both soft agar clonogenicity and tumorigenicity in nude mice. We used a high density microarray of 1,161 DNA elements to search for differences in gene expression associated with tumour suppression in this system. Fluorescent probes for hybridization were derived from two sources of cellular mRNA [UACC-903 and UACC-903(+6)] which were labelled with different fluors to provide a direct and internally controlled comparison of the mRNA levels corresponding to each arrayed gene. The fluorescence signals representing hybridization to each arrayed gene were analysed to determine the relative abundance in the two samples of mRNAs corresponding to each gene. Previously unrecognized alterations in the expression of specific genes provide leads for further investigation of the genetic basis of the tumorigenic phenotype of these cells.

Gene Expression Profiles in Normal and Cancer Cells

Article

Full-text available

Jun 1997

As a step toward understanding the complex differences between normal and cancer cells in humans, gene expression patterns were examined in gastrointestinal tumors. More than 300,000 transcripts derived from at least 45,000 different genes were analyzed. Although extensive similarity was noted between the expression profiles, more than 500 transcripts that were expressed at significantly different levels in normal and neoplastic cells were identified. These data provide insight into the extent of expression differences underlying malignancy and reveal genes that may prove useful as diagnostic or prognostic markers.

Large-Scale Temporal Gene Expression Mapping of Central Nervous System Development

Article

Full-text available

Feb 1998

We used reverse transcription-coupled PCR to produce a high-resolution temporal map of fluctuations in mRNA expression of 112 genes during rat central nervous system development, focusing on the cervical spinal cord. The data provide a temporal gene expression "fingerprint" of spinal cord development based on major families of inter- and intracellular signaling genes. By using distance matrices for the pair-wise comparison of these 112 temporal gene expression patterns as the basis for a cluster analysis, we found five basic "waves" of expression that characterize distinct phases of development. The results suggest functional relationships among the genes fluctuating in parallel. We found that genes belonging to distinct functional classes and gene families clearly map to particular expression profiles. The concepts and data analysis discussed herein may be useful in objectively identifying coherent patterns and sequences of events in the complex genetic signaling network of development. Functional genomics approaches such as this may have applications in the elucidation of complex developmental and degenerative disorders.

The Transcriptional Program of Sporulation in Budding Yeast

Article

Full-text available

Nov 1998

Diploid cells of budding yeast produce haploid cells through the developmental program of sporulation, which consists of meiosis and spore morphogenesis. DNA microarrays containing nearly every yeast gene were used to assay changes in gene expression during sporulation. At least seven distinct temporal patterns of induction were observed. The transcription factor Ndt80 appeared to be important for induction of a large group of genes at the end of meiotic prophase. Consensus sequences known or proposed to be responsible for temporal regulation could be identified solely from analysis of sequences of coordinately expressed genes. The temporal expression pattern provided clues to potential functions of hundreds of previously uncharacterized genes, some of which have vertebrate homologs that may function during gametogenesis.

Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization

Article

Full-text available

Jan 1999

We sought to create a comprehensive catalog of yeast genes whose transcript levels vary periodically within the cell cycle. To this end, we used DNA microarrays and samples from yeast cultures synchronized by three independent methods: alpha factor arrest, elutriation, and arrest of a cdc15 temperature-sensitive mutant. Using periodicity and correlation algorithms, we identified 800 genes that meet an objective minimum criterion for cell cycle regulation. In separate experiments, designed to examine the effects of inducing either the G1 cyclin Cln3p or the B-type cyclin Clb2p, we found that the mRNA levels of more than half of these 800 genes respond to one or both of these cyclins. Furthermore, we analyzed our set of cell cycle-regulated genes for known and new promoter elements and show that several known elements (or variations thereof) contain information predictive of cell cycle regulation. A full description and complete data sets are available at http://cellcycle-www.stanford.edu

Distinctive gene expression patterns in human mammary epithelial cells and breast cancers

Article

Full-text available

Sep 1999

cDNA microarrays and a clustering algorithm were used to identify patterns of gene expression in human mammary epithelial cells growing in culture and in primary human breast tumors. Clusters of coexpressed genes identified through manipulations of mammary epithelial cells in vitro also showed consistent patterns of variation in expression among breast tumor samples. By using immunohistochemistry with antibodies against proteins encoded by a particular gene in a cluster, the identity of the cell type within the tumor specimen that contributed the observed gene expression pattern could be determined. Clusters of genes with coherent expression patterns in cultured cells and in the breast tumors samples could be related to specific features of biological variation among the samples. Two such clusters were found to have patterns that correlated with variation in cell proliferation rates and with activation of the IFN-regulated signal transduction pathway, respectively. Clusters of genes expressed by stromal cells and lymphocytes in the breast tumors also were identified in this analysis. These results support the feasibility and usefulness of this systematic approach to studying variation in gene expression patterns in human cancers as a means to dissect and classify solid tumors.

Molecular classification of cancer: class discovery and class prediction by gene monitoring

Article

Full-text available

Nov 1999

Although cancer classification has improved over the past 30 years, there has been no general approach for identifying new cancer classes (class discovery) or for assigning tumors to known classes (class prediction). Here, a generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case. A class discovery procedure automatically discovered the distinction between acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) without previous knowledge of these classes. An automatically derived class predictor was able to determine the class of new leukemia cases. The results demonstrate the feasibility of cancer classification based solely on gene expression monitoring and suggest a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.

Using the Fisher Kernel Method to Detect Remote Protein Homologies

Article

Full-text available

Feb 1999

A new method, called the Fisher kernel method, for detecting remote protein homologies is introduced and shown to perform well in classifying protein domains by SCOP superfamily. The method is a variant of support vector machines using a new kernel function. The kernel function is derived from a hidden Markov model. The general approach of combining generative models like HMMs with discriminative methods such as support vector machines may have applications in other areas of biosequence analysis as well.

Engineering Support Vector Machine Kernels That Recognize Translation Initiation Sites

Article

Full-text available

Oct 2000

Motivation: In order to extract protein sequences from nucleotide sequences, it is an important step to recognize points at which regions start that code for proteins. These points are called translation initiation sites (TIS). Results: The task of finding TIS can be modeled as a classification problem. We demonstrate the applicability of support vector machines for this task, and show how to incorporate prior biological knowledge by engineering an appropriate kernel function. With the described techniques the recognition performance can be improved by 26% over leading existing approaches. We provide evidence that existing related methods (e.g. ESTScan) could profit from advanced TIS recognition.

Class Prediction and Discovery Using Gene Expression Data

Article

Full-text available

May 2000

Classification of patient samples is a crucial aspect of cancer diagnosis and treatment. We present a method for classifying samples by computational analysis of gene expression data. We consider the classification problem in two parts: class discovery and class prediction. Class discovery refers to the process of dividing samples into reproducible classes that have similar behavior or properties, while class prediction places new samples into already known classes. We describe a method for performing class prediction and illustrate its strength by correctly classifying bone marrow and blood samples from acute leukemia patients. We also describe how to use our predictor to validate newly discovered classes, and we demonstrate how this technique could have discovered the key distinctions among leukemias if they were not already known. This proof-of-concept experiment paves the way for a wealth of future work on the molecular classification and understanding of disease. Whitehead/MIT C...

Large Margin Classification Using the Perceptron Algorithm

Article

Full-text available

Feb 1999

We introduce and analyze a new algorithm for linear classification which combines Rosenblatt 's perceptron algorithm with Helmbold and Warmuth's leave-one-out method. Like Vapnik 's maximal-margin classifier, our algorithm takes advantage of data that are linearly separable with large margins. Compared to Vapnik's algorithm, however, ours is much simpler to implement, and much more efficient in terms of computation time. We also show that our algorithm can be efficiently used in very high dimensional spaces using kernel functions. We performed some experiments using our algorithm, and some variants of it, for classifying images of handwritten digits. The performance of our algorithm is close to, but not as good as, the performance of maximal-margin classifiers on the same problem, while saving significantly on computation time and programming effort. 1 Introduction One of the most influential developments in the theory of machine learning in the last few years is Vapnik's work on supp...

Support Vector Machine Classification of Microarray Data

Article

Full-text available

May 2000

An effective approach to cancer classification based upon gene expression monitoring using DNA microarrays was introduced by Golub et. al. [3]. The main problem they faced was accurately assigning leukemia samples the class labels acute myeloid leukemia (AML) or acute lymphoblastic leukemia (ALL). We used a Support Vector Machine (SVM) classifier to assign these labels. The motivation for the use of a SVM is that DNA microarray problems can be very high dimensional and have very few training data. This type of situation is particularly well suited for an SVM approach. We achieve slightly better performance on this (simple) classification task than Golub et. al. Copyright c fl Massachusetts Institute of Technology, 1998 This report describes research done within the Center for Biological and Computational Learning in the Department of Brain and Cognitive Sciences and at the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology. This research is sponsor...

Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization

Article

Full-text available

Sep 1999

A Training Algorithm for Optimal Margin Classifier

Article

Full-text available

Aug 1996

A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of classifiaction functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms. 1

Cluster analysis and display of genome-wide expression patterns

Article

Jan 1998

Neural Networks for Pattern Recognition.

Article

Dec 1997

Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues

Article

Jun 1999

Gene Shaving: a new class of clustering methods for expression arrays

Article

Jan 2000
J AM STAT ASSOC

General Convergence Results for Linear Discriminant Updates

Article

Jun 2001

The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge. Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additive-update behavior of Perceptron and the multiplicative-update behavior of Winnow.

An Introduction to Support Vector Machines

Article

Mar 2000

This book is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. The book also introduces Bayesian analysis of learning and relates SVMs to Gaussian Processes and other kernel based learning methods. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc. Their first introduction in the early 1990s lead to a recent explosion of applications and deepening theoretical analysis, that has now established Support Vector Machines along with neural networks as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and application of these techniques. The concepts are introduced gradually in accessible and self-contained stages, though in each stage the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally the book will equip the practitioner to apply the techniques and an associated web site will provide pointers to updated literature, new applications, and on-line software.

Statistical Learning Theory

Book

Jan 1998

Vladimir Vapnik

Neural Networks For Pattern Recognition

Book

Jan 2005

Ch. M. Bishop

This book provides the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts of pattern recognition, the book describes techniques for modelling probability density functions, and discusses the properties and relative merits of the multi-layer perceptron and radial basis function network models. It also motivates the use of various forms of error functions, and reviews the principal algorithms for error function minimization. As well as providing a detailed discussion of learning and generalization in neural networks, the book also covers the important topics of data processing, feature extraction, and prior knowledge. The book concludes with an extensive treatment of Bayesian techniques and their applications to neural networks.

Interpreting Gene Expression with Self-Organizing Maps

Article

Apr 1999

Array technologies have made it straightforward to monitor simultaneously the expression pattern of thousands of genes. The challenge now is to interpret such massive data sets. The first step is to extract the fundamental patterns of gene expression inherent in the data. This paper describes the application of self-organizing maps, a type of mathematical cluster analysis that is particularly well suited for recognizing and classifying features in complex, multidimensional data. The method has been implemented in a publicly available computer package, GENECLUSTER, that performs the analytical calculations and provides easy data visualization. To illustrate the value of such analysis, the approach is applied to hematopoietic differentiation in four well studied models (HL-60, U937, Jurkat, and NB4 cells). Expression patterns of some 6,000 human genes were assayed, and an online database was created. GENECLUSTER was used to organize the genes into biologically relevant clusters that suggest novel hypotheses about hematopoietic differentiation-for example, highlighting certain genes and pathways involved in "differentiation therapy" used in the treatment of acute promyelocytic leukemia.

Monitoring gene expression profile changes in ovarian carcinomas using cDNA microarray

Article

Apr 1999
GENE

The development of cancer is the result of a series of molecular changes occurring in the cell. These events lead to changes in the expression level of numerous genes that result in different phenotypic characteristics of tumors. In this report we describe the assembly and utilization of a 5766 member cDNA microarray to study the differences in gene expression between normal and neoplastic human ovarian tissues. Several genes that may have biological relevance in the process of ovarian carcinogenesis have been identified through this approach. Analyzing the results of microarray hybridizations may provides new leads for tumor diagnosis and intervention.

Comparative hybridization of an array of 21,500 ovarian cDNAs for the discovery of genes overexpressed in ovarian carcinomas

Article

Nov 1999
GENE

Comparative hybridization of cDNA arrays is a powerful tool for the measurement of differences in gene expression between two or more tissues. We optimized this technique and employed it to discover genes with potential for the diagnosis of ovarian cancer. This cancer is rarely identified in time for a good prognosis after diagnosis. An array of 21,500 unknown ovarian cDNAs was hybridized with labeled first-strand cDNA from 10 ovarian tumors and six normal tissues. One hundred and thirty-four clones are overexpressed in at least five of the 10 tumors. These cDNAs were sequenced and compared to public sequence databases. One of these, the gene HE4, was found to be expressed primarily in some ovarian cancers, and is thus a potential marker of ovarian carcinoma.

Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles

Article

Mar 2000

Genome-wide transcript profiling was used to monitor signal transduction during yeast pheromone response. Genetic manipulations allowed analysis of changes in gene expression underlying pheromone signaling, cell cycle control, and polarized morphogenesis. A two-dimensional hierarchical clustered matrix, covering 383 of the most highly regulated genes, was constructed from 46 diverse experimental conditions. Diagnostic subsets of coexpressed genes reflected signaling activity, cross talk, and overlap of multiple mitogen-activated protein kinase (MAPK) pathways. Analysis of the profiles specified by two different MAPKs-Fus3p and Kss1p-revealed functional overlap of the filamentous growth and mating responses. Global transcript analysis reflects biological responses associated with the activation and perturbation of signal transduction pathways.

Further Results on the Margin Distribution

Article

May 1999

A number of results have bounded generalization of a classifier in terms of its margin on the training points. There has been some debate about whether the minimum margin is the best measure of the distribution of training set margin values with which to estimate the generalization. Freund and Schapire [7] have shown how a different function of the margin distribution can be used to bound the number of mistakes of an on-line learning algorithm for a perceptron, as well as an expected error bound. ShaweTaylor and Cristianini [13] showed that a slight generalization of their construction can be used to give a pac style bound on the tail of the distribution of the generalization errors that arise from a given sample size. We show that in the linear case the approach can be viewed as a change of kernel and that the algorithms arising from the approach are exactly those originally proposed by Cortes and Vapnik [4]. We generalise the basic result to function classes with bounded f...

Expression monitoring by hybridization to highdensity oligonucleotide arrays

Jan 1996
16751680

D Lockhart
H Dong
M Byrne
M Follettie
M Gallo
M Chee
M Mittmann
C Wang
M Kobayashi
H Horton
E Brown

Lockhart,D., Dong,H., Byrne,M., Follettie,M., Gallo,M., Chee,M., Mittmann,M., Wang,C., Kobayashi,M., Horton,H. and Brown,E. (1996) Expression monitoring by hybridization to highdensity oligonucleotide arrays. Nature Biotechnol., 14, 16751680.

Exploring the metabolic and genetic control of gene expression on a genomic scale

Jan 1997
680-686

J Derisi
V Iyer
P Brown

DeRisi,J., Iyer,V. and Brown,P. (1997) Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680-686.

Gene Shaving: a new class of clustering methods for expression arrays

Jan 2000

T Hastie
R Tibshirani
M Eisen
P Brown
D Ross
U Scherf
J Weinstein
A Alizadeh
L Staudt
D Botstein

Hastie,T., Tibshirani,R., Eisen,M., Brown,P., Ross,D., Scherf,U., Weinstein,J., Alizadeh,A., Staudt,L. and Botstein,D. (2000) Gene Shaving: a new class of clustering methods for expression arrays. Stanford University Technical report.

Cellular gene expression altered by human cytomegalovirus: global monitoring with oligonucleotide arrays

Jan 1998
14470-14475

H Zhu
J Cong
G Mamtora
T Gingeras
T Schenk

Zhu,H., Cong,J., Mamtora,G., Gingeras,T. and Schenk,T. (1998) Cellular gene expression altered by human cytomegalovirus: global monitoring with oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 95, 14470-14475.

Support vector machine classification and validation of cancer tissue sample using microarray expression data

No full-text available

Recommended publications

Classifying microarray data with association rules