Gaussian kernel density plots of the optimum features number from 50 repetitions. The six datasets are only microarray (o.m), mean imputation (m.i), normal random imputation (nr.i), uniform imputation (u.i), bootstrap imputation (b.i), and only image (o.i). In parentheses are the medians of the obtained performances (auc) for each dataset.

Gaussian kernel density plots of the optimum features number from 50 repetitions. The six datasets are only microarray (o.m), mean imputation (m.i), normal random imputation (nr.i), uniform imputation (u.i), bootstrap imputation (b.i), and only image (o.i). In parentheses are the medians of the obtained performances (auc) for each dataset.

Source publication
Article
Full-text available
In this work the effects of simple imputations are studied, regarding the integration of multimodal data originating from different patients. Two separate datasets of cutaneous melanoma are used, an image analysis (dermoscopy) dataset together with a transcriptomic one, specifically DNA microarrays. Each modality is related to a different set of pa...

Similar publications

Article
Full-text available
Investigating mangrove species composition is a basic and important topic in wetland management and conservation. This study aims to explore the potential of close-range hyperspectral imaging with a snapshot hyperspectral sensor for identifying mangrove species under field conditions. Specifically, we assessed the data pre-processing and transforma...
Article
Full-text available
Background A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maint...
Article
Full-text available
We investigated the use of full-range (400-2,500 nm) hyperspectral data obtained by sampling foliar reflectances to discriminate 46 plant species in a tropical wetland in Jamaica. A total of 47 spectral variables, including derivative spectra, spectral vegetation indices, spectral position variables, normalized spectra and spectral absorption featu...

Citations

... Multimodal Data Fusion (MDF) is appropriate for integrating heterogeneous data such as data driven from clinical data and bio-molecular data including as image and microarray data. In [93], MDF has been used for the diagnosis of melanoma with two diferent types of feature extraction; Combination of Data (COD) and the Combination of Interpretations (COI). COD is applied prior to classiication, and aggregates features extracted from each source into a single feature vector, while in COI, independent classiications are applied based on the individual feature subsets thereafter the outcomes are aggregated using a proper voting mechanism or an algebraic combiner [93]. ...
... In [93], MDF has been used for the diagnosis of melanoma with two diferent types of feature extraction; Combination of Data (COD) and the Combination of Interpretations (COI). COD is applied prior to classiication, and aggregates features extracted from each source into a single feature vector, while in COI, independent classiications are applied based on the individual feature subsets thereafter the outcomes are aggregated using a proper voting mechanism or an algebraic combiner [93]. In a study of [121] on prostate cancer has been told that COD method was shown to be more optimal compared to COI. ...
Article
Full-text available
Ensemble methods try to improve performance via integrating different kinds of input data, features or learning algorithms. In addition to other areas, they are finding their applications in cancer prognosis and diagnosis. However, in this area, the research community is lagging behind the technology. A systematic review along with a taxonomy on ensemble methods used in cancer prognosis and diagnosis, can pave the way for the research community to keep pace with the technology and even lead trend. In this paper, we first present an overview on existing relevant surveys, and highlight their shortcomings, which raise the need for a new survey focusing on Ensemble Classifiers (ECs) used for the diagnosis and prognosis of different cancer types. Then we exhaustively review the existing methods, including the traditional ones as well as those based on deep learning. The review leads to a taxonomy as well as the identification of the bast-studied cancer types, the best ensemble methods used for the related purposes, the prevailing input data types, the most common decision making strategies, and the common evaluating methodologies. Moreover, we establish future directions for researchers interested in following existing research trends or working on less-studied aspects of the area.
... Our previous work [61,62] has led to the discovery of 32 critical genes, whose expression offers key information on melanoma manifestation. Here, we intend to extend this knowledge to mutational data. ...
Article
Full-text available
Electronic health record (EHR) systems improve health care services by allowing the combination of health data with clinical decision support features and clinical image analyses. This study presents a modular and distributed platform that is able to integrate and accommodate heterogeneous, multidimensional (omics, histological images and clinical) data for the multi-angled portrayal and management of skin cancer patients. The proposed design offers a layered analytical framework as an expansion of current EHR systems, which can integrate high-volume molecular -omics data, imaging data, as well as relevant clinical observations. We present a case study in the field of dermatology, where we attempt to combine the multilayered information for the early detection and characterization of melanoma. The specific architecture aspires to lower the barrier for the introduction of personalized therapeutic approaches, towards precision medicine. The paper describes the technical issues of implementation, along with an initial evaluation of the system and discussion.
... Two approaches including a combination of data (COD) and the combination of interpretations (COI) were exploited for feature integration. COD is applied before classi cation, and it aggregates features from each source for producing a single feature vector, but in COI, independent classi cations are done based on the individual feature subsets using a proper voting mechanism [34], and involve aggregating outputs, so it uses algebraic combiners for decision-making strategy. In another study that was done in related to prostate cancer has been told that COD methods are more optimal [35]. ...
... Results demonstrated a random forest approach that was used in classifying of bootstrapped samples gained an AUC score of high. In contrast, obtained performance with other linear methods such as principal component analysis (PCA) and linear discriminant analysis (LDA) was not high [34]. ...
Preprint
Full-text available
Background: Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance. Methods: This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study is to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies as regards EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures. Results: By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values. Conclusions: To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.
... Two approaches including a combination of data (COD) and the combination of interpretations (COI) were exploited for feature integration. COD is applied before classi cation, and it aggregates features from each source for producing a single feature vector, but in COI, independent classi cations are done based on the individual feature subsets using a proper voting mechanism [34], and involve aggregating outputs, so it uses algebraic combiners for decision-making strategy. In another study that was done in related to prostate cancer has been told that COD methods are more optimal [35]. ...
Preprint
Full-text available
Background Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance. Methods This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study was to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies regarding EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures. Results By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values. Conclusions To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.
... Two approaches including a combination of data (COD) and the combination of interpretations (COI) were exploited for feature integration. COD is applied before classi cation, and it aggregates features from each source for producing a single feature vector, but in COI, independent classi cations are done based on the individual feature subsets using a proper voting mechanism [34], and involve aggregating outputs, so it uses algebraic combiners for decision-making strategy. In another study that was done in related to prostate cancer has been told that COD methods are more optimal [35]. ...
Preprint
Full-text available
Background Ensemble methods are supervised learning approaches that integrate different types of data or multiple individual classifiers. It has been shown that these methods can improve professional performance. Methods This study is an attempt to provide an in-depth review on 45 most relevant articles and aims to introduce 42 ensemble classifier (EC) machine learning methods used for the detection of 18 different types of cancer. Compared to other types of cancer, breast cancer, and the 22 ensemble methods introduced for its identification, is extensively investigated. The purpose of this study was to identify, map, and analyze the current academic discourse on EC machine learning methods in order to: 1. identify overarching themes emerging from empirical studies regarding EC methods, 2. determine their input data and decision-making strategies, and 3. evaluate relevant statistical procedures. Results By comparing various approaches, we can introduce Relevance Vector Machine (RVM)-based ensemble learning method that can provide optimal solutions for problems such as curse the dimensionality and high-dimensionality of feature space without missing data values. Conclusions To obtain robust performance and achieve better results, it is tactfully suggested to use multi-omics data integration, which has demonstrated to identify cancers and their subtypes more efficiently.
... Separated reporting document templates and systems have been developed at standalone workstations, hindering the sharing and mining, not mentioning knowledge discovering [4,5] . Besides, since some projects like i2b2 has been launched to integrate genomic data into electronic health record (EHR) to predicting the personal outcome in a personal and precise manner [6,7] , there is a increasing demand to integrate radiomic data into such kind of data repository [8][9][10] . In this study, we proposed a CAD-structured report (CAD-SR) template specifically for fundus image and tried to integrate such CAD-SR with reference image into ePR systems in a standard-based approach. ...
Conference Paper
Diabetic retinopathy (DR) is a serious complication of diabetes that could lead to blindness. Digital fundus camera is often used to detect retinal changes but the diagnosis relies on too much on opthalmologists experience. Based on our previous developed algorithms quantifying retinal vessels and lesions, we developed a computer aided diagnosis-structure report (CAD-SR) and implemented it into picture archiving and communication sysytem (PACS). Furthermore, we mapped our DICOM-SR into HL7 CDA to integrate CAD findings into diabetes patient electronic patient record (ePR) system. Such intergration could provide more objective data from fundus image besides text data in ePR, thus is valueable for further data mining.
... Transcriptomic analyses among different groups allow the exploration and identification of alterations in gene expression profiles between them. The data used in this section were previously analyzed in [16]. Briefly, the microarray dataset was taken from the Gene Expression Omnibus (GEO) [17,18], with accession number GDS1375. ...
... The transcriptomic analysis from [16] revealed 1425 unique differentially expressed genes. Enrichment analysis showed 36 statistically significant biological processes (p-value < 0.05), which are presented in Table 4. ...
Conference Paper
Melanoma is the most lethal type of skin cancer. In this study for the first time we analyze a Greek cohort of primary cutaneous melanoma biopsies, subjected to whole exome sequencing, in order to derive their mutational profile landscape. Moreover, in the context of big data analytical methodologies, we integrated the results of the exome sequencing analysis with transcriptomic data of cutaneous melanoma from GEO, in an attempt to perform a multi-layered analysis and infer a tentative disease network for primary melanoma pathogenesis. The purpose of this research is to incorporate different levels of molecular data, so as to expand our understanding of cutaneous melanoma and the broader molecular network implicated with this type of cancer. Overall, we showed that the results of the integrative analysis offer deeper insight in the underlying mechanisms affected by melanoma and could potentially contribute to the valuable effective epidemiological characterization of this disease.
... A representative of unsupervised dimensionality reduction method is Principal Component Analysis (PCA) Jolliffe, 1986;Zahedi and Sorkhi, 2013 which aims at identifying a lower-dimensional space maximizing the variance among data (Yazdani et al., 2012). PCA is a very effective approach of extracting features (Guz, 2011;Moutselos et al., 2014). ...
... The approach mainly consists of three primary processes such distinction process, binary session and pattern generation (Shen et al., 2011). All these flavours make PCA (Moutselos et al., 2014) more suitable for applying on medical datasets, which typically have these characteristics. The variance coverage factor is playing a significant role in deciding the important features and hence this parameter is tuned so as to capture the classifier model with the best results. ...
Article
Full-text available
Selection of optimal features is an important area of research in medical data mining systems. In this paper we introduce an efficient four-stage procedure – feature extraction, feature subset selection, feature ranking and classification, called as Multi-Filtration Feature Selection (MFFS), for an investigation on the improvement of the detection accuracy and optimal feature subset selection. The proposed method adjusts a parameter named “variance coverage” and builds the model with the value at which maximum classification accuracy is obtained. This facilitates the selection of a compact set of superior features, remarkably at a very low cost. An extensive experimental comparison of proposed method and other methods using four different classifiers (naive Bayes (NB), support vector machine (SVM), multi layer perceptron (MLP) and J48 decision tree) and 22 different medical data sets confirm that the proposed MFFS strategy yields promising results on feature selection and classification accuracy for medical data mining field of research.
Chapter
Cancer is a complex and intricate disease, and the scientific community has been struggling for decades to identify any feebleness or rudimentary characteristics to discover effective treatments. Melanoma continues to be a rare form of skin cancer but causes the majority of skin cancer-related deaths. The most common technique for the detection of melanoma is dermoscopy (or dermatoscopy or epiluminescence microscopy ELM), which performs the examination through an optical system (magnifying glass) with a light source (polarized light), allowing an in-depth visualization of features used for the diagnosis. Over the past decades, efforts have been made to create computer-based systems able to analyze such dermoscopy images, assisting the early detection of skin cancer, while also allowing repeatability of results. One major issue of image dermoscopy is the inability to detect early melanoma or cases that lack optical features. To deal with that issue researchers have focused lately on molecular techniques. The aim of this chapter is to present the state-of-the-art concerning the detection methods of malignant melanoma and describe the contributions made in this area of research.
Article
Full-text available
Background With a wide array of multi-modal, multi-protocol, and multi-scale biomedical data being routinely acquired for disease characterization, there is a pressing need for quantitative tools to combine these varied channels of information. The goal of these integrated predictors is to combine these varied sources of information, while improving on the predictive ability of any individual modality. A number of application-specific data fusion methods have been previously proposed in the literature which have attempted to reconcile the differences in dimensionalities and length scales across different modalities. Our objective in this paper was to help identify metholodological choices that need to be made in order to build a data fusion technique, as it is not always clear which strategy is optimal for a particular problem. As a comprehensive review of all possible data fusion methods was outside the scope of this paper, we have focused on fusion approaches that employ dimensionality reduction (DR). Methods In this work, we quantitatively evaluate 4 non-overlapping existing instantiations of DR-based data fusion, within 3 different biomedical applications comprising over 100 studies. These instantiations utilized different knowledge representation and knowledge fusion methods, allowing us to examine the interplay of these modules in the context of data fusion. The use cases considered in this work involve the integration of (a) radiomics features from T2w MRI with peak area features from MR spectroscopy for identification of prostate cancer in vivo, (b) histomorphometric features (quantitative features extracted from histopathology) with protein mass spectrometry features for predicting 5 year biochemical recurrence in prostate cancer patients, and (c) volumetric measurements on T1w MRI with protein expression features to discriminate between patients with and without Alzheimers’ Disease. Results and conclusions Our preliminary results in these specific use cases indicated that the use of kernel representations in conjunction with DR-based fusion may be most effective, as a weighted multi-kernel-based DR approach resulted in the highest area under the ROC curve of over 0.8. By contrast non-optimized DR-based representation and fusion methods yielded the worst predictive performance across all 3 applications. Our results suggest that when the individual modalities demonstrate relatively poor discriminability, many of the data fusion methods may not yield accurate, discriminatory representations either. In summary, to outperform the predictive ability of individual modalities, methodological choices for data fusion must explicitly account for the sparsity of and noise in the feature space.