Article

A New Ensemble Learning Framework for 3D Biomedical Image Segmentation

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

3D image segmentation plays an important role in biomedical image analysis. Many 2D and 3D deep learning models have achieved state-of-the-art segmentation performance on 3D biomedical image datasets. Yet, 2D and 3D models have their own strengths and weaknesses, and by unifying them together, one may be able to achieve more accurate results. In this paper, we propose a new ensemble learning framework for 3D biomedical image segmentation that combines the merits of 2D and 3D models. First, we develop a fully convolutional network based meta-learner to learn how to improve the results from 2D and 3D models (base-learners). Then, to minimize over-fitting for our sophisticated meta-learner, we devise a new training method that uses the results of the baselearners as multiple versions of “ground truths”. Furthermore, since our new meta-learner training scheme does not depend on manual annotation, it can utilize abundant unlabeled 3D image data to further improve the model. Extensive experiments on two public datasets (the HVSMR 2016 Challenge dataset and the mouse piriform cortex dataset) show that our approach is effective under fully-supervised, semisupervised, and transductive settings, and attains superior performance over state-of-the-art image segmentation methods.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Ensemble learning has been applied in many areas, such as computer vision [14] and bioinformatics [15]. In recent years, the medical image analysis community has also applied ensemble learning to improve the results of deep learning models [14,[16][17][18]. ...
... There have been many applications of ensemble of DNNs to medical image segmentation. For example, [16] used an ensemble of 2D and 3D segmentation models with a metalearner to segment 3D cardiac MRI data, while [42] used an ensemble of 5 CNNs for brain MRI lesion segmentation. In [14], the authors proposed a bagging ensemble of deep segmentation models to train multiple UNets to segment dense nuclei pathological images. ...
... The second dataset is CVC-EndoSceneStill [62], a four-class dataset of 912 images obtained from 44 video sequences acquired from 36 patients for endoluminal scene object segmentation. This dataset contains some information like lumen and specular highlights which are essential (16) ...
Article
Full-text available
One of the most important areas in medical image analysis is segmentation, in which raw image data is partitioned into structured and meaningful regions to gain further insights. By using Deep Neural Networks (DNN), AI-based automated segmentation algorithms can potentially assist physicians with more effective imaging-based diagnoses. However, since it is difficult to acquire high-quality ground truths for medical images and DNN hyperparameters require significant manual tuning, the results by DNN-based medical models might be limited. A potential solution is to combine multiple DNN models using ensemble learning. We propose a two-layer ensemble of deep learning models in which the prediction of each training image pixel made by each model in the first layer is used as the augmented data of the training image for the second layer of the ensemble. The prediction of the second layer is then combined by using a weight-based scheme which is found by solving linear regression problems. To the best of our knowledge, our paper is the first work which proposes a two-layer ensemble of deep learning models with an augmented data technique in medical image segmentation. Experiments conducted on five different medical image datasets for diverse segmentation tasks show that proposed method achieves better results in terms of several performance metrics compared to some well-known benchmark algorithms. Our proposed two-layer ensemble of deep learning models for segmentation of medical images shows effectiveness compared to several benchmark algorithms. The research can be expanded in several directions like image classification.
... A fully convolutional network-based meta-learner trained using the outputs of 3D and 2D base-learners was proposed. 47 The weighting scheme was adopted for the ensemble methods proposed in this study. ...
... The ESEN method involved the meta-learner that was trained using prediction masks from individual modes as an input. Unlike the previous studies in which the meta-learner's outcome was determined as the final output, 47 the ESEN method used it as a pseudo-ground truth for quality control. In the ESEN WO method, the performance of the prediction selected based on the pseudo-ground truth was better than the pseudo-ground truth itself. ...
Article
Full-text available
Background Invasive coronary angiography (ICA) is a primary imaging modality that visualizes the lumen area of coronary arteries for diagnosis and interventional guidance. In the current practice of quantitative coronary analysis (QCA), semi‐automatic segmentation tools require labor‐intensive and time‐consuming manual correction, limiting their application in the catheterization room. Purpose This study aims to propose rank‐based selective ensemble methods that improve the segmentation performance and reduce morphological errors that limit fully automated quantification of coronary artery using deep‐learning segmentation of ICA. Methods Two selective ensemble methods proposed in this work integrated the weighted ensemble approach with per‐image quality estimation. The segmentation outcomes from five base models with different loss functions were ranked either by mask morphology or estimated dice similarity coefficient (DSC). The final output was determined by imposing different weights according to the ranks. The ranking criteria based on mask morphology were formulated from empirical insight to avoid frequent types of segmentation errors (MSEN), while the estimation of DSCs was performed by comparing the pseudo‐ground truth generated from a meta‐learner (ESEN). Five‐fold cross‐validation was performed with the internal dataset of 7426 coronary angiograms from 2924 patients, and prediction model was externally validated with 556 images of 226 patients. Results The selective ensemble methods improved the segmentation performance with DSCs up to 93.07% and provided a better delineation of coronary lesion with local DSCs of up to 93.93%, outperforming all individual models. Proposed methods also minimized the chances of mask disconnection in the most narrowed regions to 2.10%. The robustness of the proposed methods was also evident in the external validation. Inference time for major vessel segmentation was approximately one‐sixth of a second. Conclusion Proposed methods successfully reduced morphological errors in the predicted masks and were able to enhance the robustness of the automatic segmentation. The results suggest better applicability of real‐time QCA‐based diagnostic methods in routine clinical settings.
... It gives a strategy to embed specific designs for the challenges of tasks, such as error correction, classimbalanced data, etc. (Polikar, 2012). The ensemble segmentation models (Xia et al., 2018;Wang et al., 2015;Zheng et al., 2019) fuse the superiorities from the multiple learners improving the integrated segmentation quality. Xia et al. (2018) proposed the Volumetric Fusion Net (VFN) which train three 2D segmentation network in X, Y, and Z-axis and take a 3D network to fuse the results. ...
... Xia et al. (2018) proposed the Volumetric Fusion Net (VFN) which train three 2D segmentation network in X, Y, and Z-axis and take a 3D network to fuse the results. Zheng et al. (2019) further added a learner in the 3D version and proposed a meta-learner for the ensemble process. ...
Article
Full-text available
Three-dimensional (3D) integrated renal structures (IRS) segmentation targets segmenting the kidneys, renal tumors, arteries, and veins in one inference. Clinicians will benefit from the 3D IRS visual model for accurate preoperative planning and intraoperative guidance of laparoscopic partial nephrectomy (LPN). However, no success has been reported in 3D IRS segmentation due to the inherent challenges in grayscale distribution: low contrast caused by the narrow task-dependent distribution range of regions of interest (ROIs), and the networks representation preferences caused by the distribution variation inter-images. In this paper, we propose the Meta Greyscale Adaptive Network (MGANet), the first deep learning framework to simultaneously segment the kidney, re-nal tumors, arteries and veins on CTA images in one inference. It makes innovations in two collaborate aspects: 1) The Grayscale Interest Search (GIS) adaptively focuses seg-mentation networks on task-dependent grayscale distributions via scaling the window width and center with two cross-correlated coefficients for the first time, thus learning the fine-grained representation for fine segmentation. 2) The Meta Grayscale Adaptive (MGA) learning makes an image-level meta-learning strategy. It represents diverse robust features from multiple distributions, perceives the distribution characteristic, and generates the model parameters to fuse features dynamically according to image's distribution , thus adapting the grayscale distribution variation. This study enrolls 123 patients and the average Dice coefficients of the renal structures are up to 87.9%. Fine selection of the task-dependent grayscale distribution ranges and personalized fusion of multiple representations on different distributions will lead to better 3D IRS segmen-tation quality. Extensive experiments with promising results on renal structures reveal powerful segmentation accuracy and great clinical significance in renal cancer treatment .
... Ensemble deep learning segmentation models have been applied to various medical imaging applications. For example, ( Zheng et al., 2019 ) combined 2D and 3D segmentation models with a meta-learner to segment 3D cardiac MRI data. ( Kang and Gwak, 2019 ) combined two ResNet-based models for polyp segmentation in colonoscopy images. ...
... In summary, the majority of the current automated segmentation algorithms in CMR ( Bernard et al., 2018;Bai et al., 2018;Petitjean and Dacher, 2011;Peng et al., 2016;Irving et al., 2017;Zheng et al., 2019;Kang and Gwak, 2019;Winzeck et al., 2019;Huang et al., 2018;Fahmy et al., 2018 ) do not come with segmentation quality control mechanisms suitable for automatic processing pipelines in real-life clinical applications. Moreover, quality prediction algorithms have not progressed to utilize the predicted scores to further improve segmentation accuracy. ...
Article
Full-text available
Recent developments in artificial intelligence have generated increasing interest to deploy automated image analysis for diagnostic imaging and large-scale clinical applications. However, inaccuracy from automated methods could lead to incorrect conclusions, diagnoses or even harm to patients. Manual inspection for potential inaccuracies is labor-intensive and time-consuming, hampering progress towards fast and accurate clinical reporting in high volumes. To promote reliable fully-automated image analysis, we propose a quality control-driven (QCD) segmentation framework. It is an ensemble of neural networks that integrate image analysis and quality control. The novelty of this framework is the selection of the most optimal segmentation based on predicted segmentation accuracy, on-the-fly. Additionally, this framework visualizes segmentation agreement to provide traceability of the quality control process. In this work, we demonstrated the utility of the framework in cardiovascular magnetic resonance T1-mapping - a quantitative technique for myocardial tissue characterization. The framework achieved near-perfect agreement with expert image analysts in estimating myocardial T1 value (r=0.987, p<0.0005; mean absolute error (MAE)=11.3ms), with accurate segmentation quality prediction (Dice coefficient prediction MAE=0.0339) and classification (accuracy=0.99), and a fast average processing time of 0.39 second/image. In summary, the QCD framework can generate high-throughput automated image analysis with speed and accuracy that is highly desirable for large-scale clinical applications.
... By exploring different ensemble methods such as bagging, boosting, and stacking, researchers can effectively address the challenges of architectural diversity and decision aggregation in ensemble deep learning IDS models, leading to more reliable and efficient intrusion detection systems [64]. The integration of meta-learning in ensemble models not only enhances model interpretability and adaptability but also contributes to the advancement of various domains, including healthcare, finance, and environmental science, by providing robust and accurate predictive models [65][66][67]. ...
Article
Full-text available
This paper investigates the application of ensemble learning techniques, specifically meta-learning, in intrusion detection systems (IDS) for the Internet of Medical Things (IoMT). It underscores the existing challenges posed by the heterogeneous and dynamic nature of IoMT environments, which necessitate adaptive, robust security solutions. By harnessing meta-learning alongside various ensemble strategies such as stacking and bagging, the paper aims to refine IDS mechanisms to effectively counter evolving cyber threats. The study proposes a performance-driven weighted meta-learning technique for dynamic assignment of voting weights to classifiers based on accuracy, loss, and confidence levels. This approach significantly enhances the intrusion detection capabilities for the IoMT by dynamically optimizing ensemble IDS models. Extensive experiments demonstrate the proposed model’s superior performance in terms of accuracy, detection rate, F1 score, and false positive rate compared to existing models, particularly when analyzing various sizes of input features. The findings highlight the potential of integrating meta-learning in ensemble-based IDS to enhance the security and integrity of IoMT networks, suggesting avenues for future research to further advance IDS performance in protecting sensitive medical data and IoT infrastructures.
... Ensemble methods use a two-step process: primary classification of training data and a combination of selective members' classifiers for better classification [17]. Examples of ensemble methods can be found in [18][19][20][21][22][23][24][25][26]. Weighted methods assign weights to the classifiers based on specific criteria and then take a vote of the classifiers based on their predictions. ...
Article
Full-text available
This paper proposes two novel approaches based on feature weighting and model selection for building more accurate kNN ensembles. The first approach identifies the nearest observations using a feature weighting scheme concerning the response variable via support vectors. A randomly selected subset of features is used for the feature weighting and model construction. After building a sufficiently large number of base models on bootstrap samples, a subset of the models is selected based on out-of-bag prediction error for the final ensemble. The second approach builds base learners build on random subsamples instead of bootstrap samples with a random subset of features. The method uses feature weighting while building the models. The remaining observations from each sample are used to assess the corresponding base learner and select a subset of the models for the final ensemble. The suggested ensemble methods are assessed on 12 benchmark datasets against other classical methods, including kNN-based models. The analyses reveal that the proposed methods are often better than the others. Ó 2023 The Authors. Published by Elsevier B.V. on behalf of Faculty of Engineering, Alexandria University This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/ licenses/by-nc-nd/4.0/).
... The innovation brought by our project is the development of an AI classifier based on a set of classifying neural networks whose outputs are directed to an ensemble layer [48]. In particular, the networks are self-normalizing neural networks (SNN) [49]. ...
Article
Full-text available
Computer recognition of human activity is an important area of research in computer vision. Human activity recognition (HAR) involves identifying human activities in real-life contexts and plays an important role in interpersonal interaction. Artificial intelligence usually identifies activities by analyzing data collected using different sources. These can be wearable sensors, MEMS devices embedded in smartphones, cameras, or CCTV systems. As part of HAR, computer vision technology can be applied to the recognition of the emotional state through facial expressions using facial positions such as the nose, eyes, and lips. Human facial expressions change with different health states. Our application is oriented toward the detection of the emotional health of subjects using a self-normalizing neural network (SNN) in cascade with an ensemble layer. We identify the subjects’ emotional states through which the medical staff can derive useful indications of the patient’s state of health.
... In recent years, there has been great interest in ensemble of deep learning models. [31] used an ensemble of 2D and 3D segmentation models with a metalearner for 3D cardiac MRI segmentation. In [32], the authors used a number of CNN models to extract the histology image features at different scales, then the optimal subset of CNN models was selected to create the ensemble. ...
... The idea of model ensembles is to train multiple models and select or combine the best of these models [16]. Hereby, the optimization goal typically is the improvement of the accuracy of predictions [8,17,20]. Combining the outputs of several models has proven to be preferable compared to single-model systems. By combining the results of several models, model ensembles can, for example, reduce the risk of choosing a model that performs poorly, which reduces the overall risk of a bad decision, or overcome complex decision borders that may not be able to be implemented by a chosen model because they lie outside the functional space of the model. ...
Article
Full-text available
To obtain accurate predictions of classifiers, model ensembles comprising multiple trained machine learning models are nowadays used. In particular, dynamic model ensembles pick the most accurate model for each query object, by applying the model that performed best on similar data. Dynamic model ensembles may however suffer, similarly to single machine learning models, from bias, which can eventually lead to unfair treatment of certain groups of a general population. To mitigate unfair classification, recent work has thus proposed fair model ensembles , that instead of focusing (solely) on accuracy also optimize global fairness . While such global fairness globally minimizes bias, imbalances may persist in different regions of the data, e.g., caused by some local bias maxima leading to local unfairness . Therefore, we extend our previous work by including a framework that bridges the gap between dynamic model ensembles and fair model ensembles. More precisely, we investigate the problem of devising locally fair and accurate dynamic model ensembles, which ultimately optimize for equal opportunity of similar subjects. We propose a general framework to perform this task and present several algorithms implementing the framework components. In this paper we also present a runtime-efficient framework adaptation that keeps the quality of the results on a similar level. Furthermore, new fairness metrics are presented as well as detailed informations about necessary data preparations. Our evaluation of the framework implementations and metrics shows that our approach outperforms the state-of-the art for different types and degrees of bias present in training data in terms of both local and global fairness, while reaching comparable accuracy.
... The application of 2D&3D network for medical image segmentation has been previously explored in several research works. Both Baldeon 15 and Zheng 16 used the ensemble method to construct hybrid networks for the segmentation of the prostate, the myocardium and the great vessels, respectively, and achieved advanced segmentation performance. Both of their sub networks use an FCN structure, which lacks sufficient channels in the upsampling process compared to the U-Net or the V-net models, making feature propagation to higher resolution layers more difficult. ...
Article
Full-text available
Objective: A stable and accurate automatic tumor delineation method has been developed to facilitate the intelligent design of lung cancer radiotherapy process. The purpose of this paper is to introduce an automatic tumor segmentation network for lung cancer on CT images based on deep learning. Methods: In this paper, a hybrid convolution neural network (CNN) combining 2D CNN and 3D CNN was implemented for the automatic lung tumor delineation using CT images. 3D CNN used V-Net model for the extraction of tumor context information from CT sequence images. 2D CNN used an encoder-decoder structure based on dense connection scheme, which could expand information flow and promote feature propagation. Next, 2D features and 3D features were fused through a hybrid module. Meanwhile, the hybrid CNN was compared with the individual 3D CNN and 2D CNN, and three evaluation metrics, Dice, Jaccard and Hausdorff distance (HD), were used for quantitative evaluation. The relationship between the segmentation performance of hybrid network and the GTV volume size was also explored. Results: The newly introduced hybrid CNN was trained and tested on a dataset of 260 cases, and could achieve a median value of 0.73, with mean and stand deviation of 0.72 ± 0.10 for the Dice metric, 0.58 ± 0.13 and 21.73 ± 13.30 mm for the Jaccard and HD metrics, respectively. The hybrid network significantly outperformed the individual 3D CNN and 2D CNN in the three examined evaluation metrics (p < 0.001). A larger GTV present a higher value for the Dice metric, but its delineation at the tumor boundary is unstable. Conclusions: The implemented hybrid CNN was able to achieve good lung tumor segmentation performance on CT images. Advances in knowledge: The hybrid CNN has valuable prospect with the ability to segment lung tumor.
... For tumor segmentation, we use dice score (Dice) and 95% Hausdorff distance (Hausdorff) to evaluate the segmentation performance over different areas of brain tumor, i.e., the whole tumor (WT),the tumor core (TC) and the enhancing tumor (ET). For convenience, we also use a averaged score S = class (Dice/200 − Hausdorf f /60), similar to [9], to compare the overall performance.For nuclei segmentation, we follow the same way as in [7] to split the dataset into two categories, where the model is train only in the first group. The semantic segmentation performance is evaluated by average dice coefficient in each group. ...
... Recently, Zheng et al. [114] developed an ensemble learning framework for segmenting 3D images, using one 3D CNN and three 2D CNNs (for each of the three orthogonal 2D slices, namely xy, xz and yz). These four CNNs are used jointly as base-learners for training the metalearner. ...
Article
In recent years enterprise imaging (EI) solutions have become a core component of healthcare initiatives, while a simultaneous rise in big data has opened up a number of possibilities in how we can analyze and derive insights from large amounts of medical data. Together they afford us a range of opportunities that can transform healthcare in many fields. This paper provides a review of recent developments in EI and big data in the context of medical physics. It summarizes the key aspects of EI and big data in practice, with discussion and consideration of the steps necessary to implement an EI strategy. It examines the benefits that a healthcare service can achieve through the implementation of an EI solution by looking at it through the lenses of: compliance, improving patient care, maximizing revenue, optimizing workflows, and applications of artificial intelligence that support enterprise imaging. It also addresses some of the key challenges in enterprise imaging, with discussion and examples presented for those in systems integration, governance, and data security and privacy.
... Additionally, recent results in different biomedical image segmentation challenges have shown the effectiveness of DL ensemble models, such as in [12] where an ensemble consisting of 4 UNet-like models and one Deeplabv3+ network was proposed obtaining the second place in the 2019 SIIM-ACR pneumothorax challenge, and in [13] where an ensemble which analyzed single-slices data 3D volumetric data separately was presented, obtaining top performance in the HVSMR 3D Cardiovascular MRI in Congenital Heart Disease 2016 challenge dataset. Inspired by both paradigms, our research hypothesis is that the use of ensembles which use both single-frame and consecutive-frames information could achieve a better generalization in data than models which uses only one of them. ...
Article
Full-text available
Purpose: Ureteroscopy is an efficient endoscopic minimally invasive technique for the diagnosis and treatment of upper tract urothelial carcinoma. During ureteroscopy, the automatic segmentation of the hollow lumen is of primary importance, since it indicates the path that the endoscope should follow. In order to obtain an accurate segmentation of the hollow lumen, this paper presents an automatic method based on convolutional neural networks (CNNs). Methods: The proposed method is based on an ensemble of 4 parallel CNNs to simultaneously process single and multi-frame information. Of these, two architectures are taken as core-models, namely U-Net based in residual blocks ([Formula: see text]) and Mask-RCNN ([Formula: see text]), which are fed with single still-frames I(t). The other two models ([Formula: see text], [Formula: see text]) are modifications of the former ones consisting on the addition of a stage which makes use of 3D convolutions to process temporal information. [Formula: see text], [Formula: see text] are fed with triplets of frames ([Formula: see text], I(t), [Formula: see text]) to produce the segmentation for I(t). Results: The proposed method was evaluated using a custom dataset of 11 videos (2673 frames) which were collected and manually annotated from 6 patients. We obtain a Dice similarity coefficient of 0.80, outperforming previous state-of-the-art methods. Conclusion: The obtained results show that spatial-temporal information can be effectively exploited by the ensemble model to improve hollow lumen segmentation in ureteroscopic images. The method is effective also in the presence of poor visibility, occasional bleeding, or specular reflections.
... In recent years, the stacking method [24] and boosting method have become popular in ensemble learning. Liu et al. [25] proposed the isolation forest algorithm to establish an anomaly index based on the path length from leaf node to root node. ...
Article
Full-text available
This paper focuses on an important research problem of cyberspace security. As an active defense technology, intrusion detection plays an important role in the field of network security. Traditional intrusion detection technologies have problems such as low accuracy, low detection efficiency, and time consuming. The shallow structure of machine learning has been unable to respond in time. To solve these problems, the deep learning-based method has been studied to improve intrusion detection. The advantage of deep learning is that it has a strong learning ability for features and can handle very complex data. Therefore, we propose a deep random forest-based network intrusion detection model. The first stage uses a slide window to segment original features into many small pieces and then trains a random forest to generate the concatenated class vector as rerepresentation. The vector will be used to train the multilevel cascade parallel random forest in the second stage. Finally, the classification of the original data is determined by voting strategy after the last layer of cascade. Meanwhile, the model is deployed in Spark environment and optimizes cache replacement strategy of RDDs by efficiency sorting and partition integrity check. The experiment results indicate that the proposed method can effectively detect anomaly network behaviors, with high F1-measure scores and high accuracy. The results also show that it can cut down the average execution time on different scaled clusters.
... The attention supervision was also trained using manual myocardial segmentation. Automated segmentation algorithms in CMR have been developed and validated, with continual improvements in accuracy and robustness [41][42][43][44]. For future work, we aim to integrate the trained motion detection neural networks with robust LV segmentation models for a fully automated pipeline for CMR T1-mapping quality control and processing. ...
Article
Full-text available
Cardiac magnetic resonance quantitative T1-mapping is increasingly used for advanced myocardial tissue characterisation. However, cardiac or respiratory motion can significantly affect the diagnostic utility of T1-maps, and thus motion artefact detection is critical for quality control and clinically-robust T1 measurements. Manual quality control of T1-maps may provide reassurance, but is laborious and prone to error. We present a deep learning approach with attention supervision for automated motion artefact detection in quality control of cardiac T1-mapping. Firstly, we customised a multi-stream Convolutional Neural Network (CNN) image classifier to streamline the process of automatic motion artefact detection. Secondly, we imposed attention supervision to guide the CNN to focus on targeted myocardial segments. Thirdly, when there was disagreement between the human operator and machine, a second human validator reviewed and rescored the cases for adjudication and to identify the source of disagreement. The multi-stream neural networks demonstrated 89.8% agreement, 87.4% ROC-AUC on motion artefact detection with the human operator in the 2568 T1 maps. Trained with additional supervision on attention, agreements and AUC significantly improved to 91.5% and 89.1%, respectively (p < 0.001). Rescoring of disagreed cases by the second human validator revealed that human operator error was the primary cause of disagreement. Deep learning with attention supervision provides a quick and high-quality assurance of clinical images, and outperforms human operators.
... Therefore, to overcome the memory constraint of performing a 3D CNN, and information loss of performing 2D CNNs, we utilized an ensemble of 2D CNNs to generate a 3D segmentation (Figure 2a). Since deep neural network models generally have high prediction variance, ensemble learning with deep neural networks can reduce the variance and thus better generalize to unseen data [27,28]. We sliced the 3D image volumes along the axial, sagittal or coronal axis to obtain corresponding 2D image datasets. ...
Article
Full-text available
Computational fluid dynamics (CFD) modeling of left ventricle (LV) flow combined with patient medical imaging data has shown great potential in obtaining patient-specific hemodynamics information for functional assessment of the heart. A typical model construction pipeline usually starts with segmentation of the LV by manual delineation followed by mesh generation and registration techniques using separate software tools. However, such approaches usually require significant time and human efforts in the model generation pro-cess, limiting large-scale analysis. In this study, we propose an approach towards fully automating the model generation process for CFD simulation of LV flow to significantly reduce LV CFD model generation time. Our modeling framework leverages a novel combination of techniques including deep-learning based segmentation, geometry processing, and image registration to reliably reconstruct CFD-suitable LV models with little-to-no user intervention. We utilized an ensemble of 2D convolutional neural networks (CNNs) for automatic segmentation of cardiac structures from 3D patient images and our segmentation approach outperformed recent state-of-the-art segmentation techniques when evaluated on benchmark data containing both MR and CT cardiac scans. We demonstrate that through a combination of segmentation and geometry processing, we were able to robustly create CFD-suitable LV meshes from segmentations for 78 out of 80 test cases. Although the focus on this study is on image-to-mesh generation, we demonstrate the feasibility of this framework in supporting LV hemodynamics modeling by performing CFD simulations from two representative time-resolved patient-specific image data sets.
... Following the practice in (Zheng et al. 2019c), our metamodel is designed as a Y-shape DensVoxNet (Yu et al. 2017) (see Fig. 4), which takes two pieces of input, X i and A(Ŷ i ). A(·) is the averaging function that forms a compact representation ofŶ i of the PLs. ...
Article
Image segmentation is critical to lots of medical applications. While deep learning (DL) methods continue to improve performance for many medical image segmentation tasks, data annotation is a big bottleneck to DL-based segmentation because (1) DL models tend to need a large amount of labeled data to train, and (2) it is highly time-consuming and label-intensive to voxel-wise label 3D medical images. Significantly reducing annotation effort while attaining good performance of DL segmentation models remains a major challenge. In our preliminary experiments, we observe that, using partially labeled datasets, there is indeed a large performance gap with respect to using fully annotated training datasets. In this paper, we propose a new DL framework for reducing annotation effort and bridging the gap between full annotation and sparse annotation in 3D medical image segmentation. We achieve this by (i) selecting representative slices in 3D images that minimize data redundancy and save annotation effort, and (ii) self-training with pseudo-labels automatically generated from the base-models trained using the selected annotated slices. Extensive experiments using two public datasets (the HVSMR 2016 Challenge dataset and mouse piriform cortex dataset) show that our framework yields competitive segmentation results comparing with state-of-the-art DL methods using less than ∼20% of annotated data.
... In such framework, 2D models along axial, coronal and sagittal view are trained before fusing their outputs to enforce 3D consistency. A variant consists in fusing 2D models and 3D models [19], [40]. It is also possible to fuse predictions of models working at different scales [41]. ...
Article
Full-text available
Whole brain segmentation of fine-grained structures using deep learning (DL) is a very challenging task since the number of anatomical labels is very high compared to the number of available training images. To address this problem, previous DL methods proposed to use a single convolution neural network (CNN) or few independent CNNs. In this paper, we present a novel ensemble method based on a large number of CNNs processing different overlapping brain areas. Inspired by parliamentary decision-making systems, we propose a framework called AssemblyNet, made of two “assemblies" of U-Nets. Such a parliamentary system is capable of dealing with complex decisions, unseen problem and reaching a relevant consensus. AssemblyNet introduces sharing of knowledge among neighboring U-Nets, an “amendment” procedure made by the second assembly at higher-resolution to refine the decision taken by the first one, and a final decision obtained by majority voting. During our validation, AssemblyNet showed competitive performance compared to state-of-the-art methods such as U-Net, Joint label fusion and SLANT. Moreover, we investigated the scan-rescan consistency and the robustness to disease effects of our method. These experiences demonstrated the reliability of AssemblyNet. Finally, we showed the interest of using semi-supervised learning to improve the performance of our method.
Preprint
In this paper, we propose a novel approach to enhance medical image segmentation during test time. Instead of employing hand-crafted transforms or functions on the input test image to create multiple views for test-time augmentation, we advocate for the utilization of an advanced domain-fine-tuned generative model (GM), e.g., stable diffusion (SD), for test-time augmentation. Given that the GM has been trained to comprehend and encapsulate comprehensive domain data knowledge, it is superior than segmentation models in terms of representing the data characteristics and distribution. Hence, by integrating the GM into test-time augmentation, we can effectively generate multiple views of a given test sample, aligning with the content and appearance characteristics of the sample and the related local data distribution. This approach renders the augmentation process more adaptable and resilient compared to conventional handcrafted transforms. Comprehensive experiments conducted across three medical image segmentation tasks (nine datasets) demonstrate the efficacy and versatility of the proposed TTGA in enhancing segmentation outcomes. Moreover, TTGA significantly improves pixel-wise error estimation, thereby facilitating the deployment of a more reliable segmentation system. Code will be released at: https://github.com/maxiao0234/TTGA.
Chapter
International transport security policy requires the baggage screening. In airports and railway stations, these operations are processed manually with the help of X-ray or computed tomography machines. There is a need for an automatic system which could reduce the time of the screening process and possibly, increase the accuracy of the detections. More than that, there is a demand for developing and evaluating methodologies for learning on 3D image-like data, which has been addressed only recently, mostly in the field of medical imaging. The main objective of this research is to develop a framework for object detection in 3D computed tomography scans for high-throughput security applications. In this paper, a literature review on the topic of 3D image recognition is presented, and a transfer learning approach is evaluated on the security risk detection task in X-ray images.
Article
Full-text available
Recent advancements in two-photon calcium imaging have enabled scientists to record the activity of thousands of neurons with cellular resolution. This scope of data collection is crucial to understanding the next generation of neuroscience questions, but analyzing these large recordings requires automated methods for neuron segmentation. Supervised methods for neuron segmentation achieve state of-the-art accuracy and speed but currently require large amounts of manually generated ground truth training labels. We reduced the required number of training labels by designing a semi-supervised pipeline. Our pipeline used neural network ensembling to generate pseudolabels to train a single shallow U-Net. We tested our method on three publicly available datasets and compared our performance to three widely used segmentation methods. Our method outperformed other methods when trained on a small number of ground truth labels and could achieve state-of-the-art accuracy after training on approximately a quarter of the number of ground truth labels as supervised methods. When trained on many ground truth labels, our pipeline attained higher accuracy than that of state-of-the-art methods. Overall, our work will help researchers accurately process large neural recordings while minimizing the time and effort needed to generate manual labels.
Article
Acquiring pixel-level annotations is often limited in applications such as histology studies that require domain expertise. Various semi-supervised learning approaches have been developed to work with limited ground truth annotations, such as the popular teacher-student models. However, hierarchical prediction uncertainty within the student model (intra-uncertainty) and image prediction uncertainty (inter-uncertainty) have not been fully utilized by existing methods. To address these issues, we first propose a novel inter- and intra-uncertainty regularization method to measure and constrain both inter- and intra-inconsistencies in the teacher-student architecture. We also propose a new two-stage network with pseudo-mask guided feature aggregation (PG-FANet) as the segmentation model. The two-stage structure complements with the uncertainty regularization strategy to avoid introducing extra modules in solving uncertainties and the aggregation mechanisms enable multi-scale and multi-stage feature integration. Comprehensive experimental results over the MoNuSeg and CRAG datasets show that our PG-FANet outperforms other state-of-the-art methods and our semi-supervised learning framework yields competitive performance with a limited amount of labeled data.
Chapter
Convolutional neural networks (CNNs) have shown promising performance in various 2D computer vision tasks due to availability of large amounts of 2D training data. Contrarily, medical imaging deals with 3D data and usually lacks the equivalent extent and diversity of data, for developing AI models. Transfer learning provides the means to use models trained for one application as a starting point to another application. In this work, we leverage 2D pre-trained models as a starting point in 3D medical applications by exploring the concept of Axial-Coronal-Sagittal (ACS) convolutions. We have incorporated ACS as an alternative of native 3D convolutions in the Generally Nuanced Deep Learning Framework (GaNDLF), providing various well-established and state-of-the-art network architectures with the availability of pre-trained encoders from 2D data. Results of our experimental evaluation on 3D MRI data of brain tumor patients for i) tumor segmentation and ii) radiogenomic classification, show model size reduction by \(\sim \)22% and improvement in validation accuracy by \(\sim \)33%. Our findings support the advantage of ACS convolutions in pre-trained 2D CNNs over 3D CNN without pre-training, for 3D segmentation and classification tasks, democratizing existing models trained in datasets of unprecedented size and showing promise in the field of healthcare.KeywordsDeep learningImageNetTransfer learningMRIsegmentationclassification
Chapter
In 3D medical image segmentation, small targets segmentation is crucial for diagnosis but still faces challenges. In this paper, we propose the Axis Projection Attention UNet, named APAUNet, for 3D medical image segmentation, especially for small targets. Considering the large proportion of the background in the 3D feature space, we introduce a projection strategy to project the 3D features into three orthogonal 2D planes to capture the contextual attention from different views. In this way, we can filter out the redundant feature information and mitigate the loss of critical information for small lesions in 3D scans. Then we utilize a dimension hybridization strategy to fuse the 3D features with attention from different axes and merge them by a weighted summation to adaptively learn the importance of different perspectives. Finally, in the APA Decoder, we concatenate both high and low resolution features in the 2D projection process, thereby obtaining more precise multi-scale information, which is vital for small lesion segmentation. Quantitative and qualitative experimental results on two public datasets (BTCV and MSD) demonstrate that our proposed APAUNet outperforms the other methods. Concretely, our APAUNet achieves an average dice score of 87.84 on BTCV, 84.48 on MSD-Liver and 69.13 on MSD-Pancreas, and significantly surpass the previous SOTA methods on small targets.
Article
Full-text available
Accurate bowel segmentation is essential for diagnosis and treatment of bowel cancers. Unfortunately, segmenting the entire bowel in CT images is quite challenging due to unclear boundary, large shape, size, and appearance variations, as well as diverse filling status within the bowel. In this paper, we present a novel two-stage framework, named BowelNet, to handle the challenging task of bowel segmentation in CT images, with two stages of 1) jointly localizing all types of the bowel, and 2) finely segmenting each type of the bowel. Specifically, in the first stage, we learn a unified localization network from both partially- and fully-labeled CT images to robustly detect all types of the bowel. To better capture unclear bowel boundary and learn complex bowel shapes, in the second stage, we propose to jointly learn semantic information (i.e., bowel segmentation mask) and geometric representations (i.e., bowel boundary and bowel skeleton) for fine bowel segmentation in a multi-task learning scheme. Moreover, we further propose to learn a meta segmentation network via pseudo labels to improve segmentation accuracy. By evaluating on a large abdominal CT dataset, our proposed BowelNet method can achieve Dice scores of 0.764, 0.848, 0.835, 0.774, and 0.824 in segmenting the duodenum, jejunum-ileum, colon, sigmoid, and rectum, respectively. These results demonstrate the effectiveness of our proposed BowelNet framework in segmenting the entire bowel from CT images.
Chapter
Snapshot compressive imaging (SCI) can record a 3D datacube by a 2D measurement and algorithmically reconstruct the desired 3D information from that 2D measurement. The reconstruction algorithm thus plays a vital role in SCI. Recently, deep learning (DL) has demonstrated outstanding performance in reconstruction, leading to better results than conventional optimization-based methods. Therefore, it is desirable to improve DL reconstruction performance for SCI. Existing DL algorithms are limited by two bottlenecks: 1) a high-accuracy network is usually large and requires a long running time; 2) DL algorithms are limited by scalability, i.e., a well-trained network cannot generally be applied to new systems. To this end, this paper proposes to use ensemble learning priors in DL to achieve high reconstruction speed and accuracy in a single network. Furthermore, we develop the scalable learning approach during training to empower DL to handle data of different sizes without additional training. Extensive results on both simulation and real datasets demonstrate the superiority of our proposed algorithm. The code and model can be accessed at https://github.com/integritynoble/ELP-Unfolding/tree/master.KeywordsDeep unfoldingEnsembleSnapshot compressive imagingScalable learning
Chapter
In biological image analysis, 3D instance segmentation is a crucial step towards extracting information on objects of interest from microscopy datasets. Existing instance segmentation pipelines are frequently affected by errors such as missing boundary layer cells or poorly segmented regions. In this study, we propose several ensembles as post-processing methods for improving the quality of outputs obtained from deep learning and classical 3D segmentation pipelines. These methods take as input the results from two independent 3D segmentation pipelines and combine them using different fusion algorithms. The first algorithm uses label set intersection, the second one involves adjacency graph composition and the third one works through segmented object boundary fusion followed by 3D watershed. These 3 algorithms are tested on a dataset of 3D confocal microscopy images of floral tissues. The third fusion algorithm is found to perform best and has better global and local accuracies compared to its input segmentations. The specialty of the proposed ensemble methods is that these are model agnostic, i.e., they can be used to combine segmentation results from deep learning as well as non-deep learning or classical pipelines. These methods could be highly beneficial in correcting segmentation errors arising from missing cells in the boundary layer or under segmentation in the inner tissue layers and ultimately provide us robust segmentation results in presence of variable image qualities in biological datasets.KeywordsSegmentationDeep learningBio-imagingMicroscopy
Article
Knee cartilage and bone segmentation is critical for physicians to analyze and diagnose articular damage and knee osteoarthritis (OA). Deep learning (DL) methods for medical image segmentation have largely outperformed traditional methods, but they often need large amounts of annotated data for model training, which is very costly and time-consuming for medical experts, especially on 3D images. In this paper, we report a new knee cartilage and bone segmentation framework, KCB-Net, for 3D MR images based on sparse annotation. KCB-Net selects a small subset of slices from 3D images for annotation, and seeks to bridge the performance gap between sparse annotation and full annotation. Specifically, it first identifies a subset of the most effective and representative slices with an unsupervised scheme; it then trains an ensemble model using the annotated slices; next, it self-trains the model using 3D images containing pseudo-labels generated by the ensemble method and improved by a bi-directional hierarchical earth mover’s distance (bi-HEMD) algorithm; finally, it fine-tunes the segmentation results using the primal–dual Internal Point Method (IPM). Experiments on four 3D MR knee joint datasets (the SKI10 dataset, OAI ZIB dataset, Iowa dataset, and iMorphics dataset) show that our new framework outperforms state-of-the-art methods on full annotation, and yields high quality results for small annotation ratios even as low as 10%.
Article
Medical image segmentation is fundamental and essential for the analysis of medical images. Although prevalent success has been achieved by convolutional neural networks (CNN), challenges are encountered in the domain of medical image analysis by two aspects: 1) lack of discriminative features to handle similar textures of distinct structures and 2) lack of selective features for potential blurred boundaries in medical images. In this paper, we extend the concept of contrastive learning (CL) to the segmentation task to learn more discriminative representation. Specifically, we propose a novel patch-dragsaw contrastive regularization (PDCR) to perform patch-level tugging and repulsing. In addition, a new structure, namely uncertainty-aware feature re- weighting block (UAFR), is designed to address the potential high uncertainty regions in the feature maps and serves as a better feature re- weighting. Our proposed method achieves state-of-the-art results across 8 public datasets from 6 domains. Besides, the method also demonstrates robustness in the limited-data scenario. The code is publicly available at https://github.com/lzh19961031/PDCR_UAFR-MIShttps://github.com/lzh19961031/PDCR_UAFR-MIS .
Article
Accurate and automatic breast tumor segmentation based on dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) plays an important role in breast cancer analysis. However, the background parenchymal enhancement and large variations in tumor size, shape or appearance make the task very challenging, and also the segmentation performance is still not satisfactory, especially for non-mass enhancement (NME) and small size tumors (≤2cm). To address these challenges, we propose a novel 3D affinity learning based multi-branch ensemble network for accurate breast tumor segmentation. Specifically, two different types of subnetworks are built to form a multi-branch network. The two subnetworks are equipped with effective operation components, i.e., residual connection and channel-wise attention or making use of dense connectivity patterns, which can process the input images in parallel. Second, we propose an end-to-end trainable 3D affinity learning based refinement module by calculating the similarities between features of voxels, which is useful in discovering more pixels belonging to breast tumors. Third, two local affinity matrices are constructed by 3D affinity learning, which are used to refine the segmentation outputs of two subnetworks, respectively. Finally, a novel ensemble module is proposed to combine the information derived from the subnetworks, which can hierarchically merge the local and global affinity matrices derived from subnetworks. A large-scale breast DCE-MR images dataset with 420 subjects are built for evaluation, and comprehensive experiments have been conducted to demonstrate that our proposed method achieves superior performance over state-of-the-art medical image segmentation methods.
Article
Accurate 3D segmentation of calf muscle compartments in volumetric MR images is essential to diagnose as well as assess progression of muscular diseases. Recently, good segmentation performance was achieved using state-of-the-art deep learning approaches, which, however, require large amounts of annotated data for training. Considering that obtaining sufficiently large medical image annotation datasets is often difficult, time-consuming, and requires expert knowledge, minimizing the necessary sizes of expert-annotated training datasets is of great importance. This paper reports CMC-Net, a new deep learning framework for calf muscle compartment segmentation in 3D MR images that selects an effective small subset of 2D slices from the 3D images to be labelled, while also utilizing unannotated slices to facilitate proper generalization of the subsequent training steps. Our model consists of three parts: (1) an unsupervised method to select the most representative 2D slices on which expert annotation is performed; (2) ensemble model training employing these annotated as well as additional unannotated 2D slices; (3) a model-tuning method using pseudo-labels generated by the ensemble model that results in a trained deep network capable of accurate 3D segmentations. Experiments on segmentation of calf muscle compartments in 3D MR images show that our new approach achieves good performance with very small annotation ratios, and when utilizing full annotation, it outperforms state-of-the-art full annotation segmentation methods. Additional experiments on a 3D MR thigh dataset further verify the ability of our method in segmenting leg muscle groups with sparse annotation.
Article
Background and objective : Automatic vessel segmentation from X-ray angiography images is an important research topic for the diagnosis and treatment of cardiovascular disease. The main challenge is how to extract continuous and completed vessel structures from XRA images with poor quality and high complexity. Most existing methods predominantly focus on pixel-wise segmentation and overlook the geometric features, resulting in breaking and absence in segmentation results. To improve the completeness and accuracy of vessel segmentation, we propose a recursive joint learning network embedded with geometric features. Methods : The network joins the centerline- and direction-aware auxiliary tasks with the primary task of segmentation, which guides the network to explore the geometric features of vessel connectivity. Moreover, the recursive learning strategy is designed by passing the previous segmentation result into the same network iteratively to improve segmentation. To further enhance connectivity, we present a complementary-task ensemble strategy by fusing the outputs of the three tasks for the final segmentation result with majority voting. Results : To validate the effectiveness of our method, we conduct qualitative and quantitative experiments on the XRA images of the coronary artery and aorta including aortic arch, thoracic aorta, and abdominal aorta. Our method achieves F1 scores of 85.61±3.48% for the coronary artery, 89.02±2.89% for the aortic arch, 88.22±3.33% for the thoracic aorta, and 83.12±4.61% for the abdominal aorta. Conclusions : Compared with six state-of-the-art methods, our method shows the most complete and accurate vessel segmentation results.
Article
Full-text available
Medical image segmentation plays a vital role in disease diagnosis and analysis. However, data-dependent difficulties such as low image contrast, noisy background, and complicated objects of interest render the segmentation problem challenging. These difficulties diminish dense prediction and make it tough for known approaches to explore data-specific attributes for robust feature extraction. In this paper, we study medical image segmentation by focusing on robust data-specific feature extraction to achieve improved dense prediction. We propose a new deep convolutional neural network (CNN), which exploits specific attributes of input datasets to utilize deep supervision for enhanced feature extraction. In particular, we strategically locate and deploy auxiliary supervision, by matching the object perceptive field (OPF) (which we define and compute) with the layer-wise effective receptive fields (LERF) of the network. This helps the model pay close attention to some distinct input data dependent features, which the network might otherwise ‘ignore’ during training. Further, to achieve better target localization and refined dense prediction, we propose the densely decoded networks (DDN), by selectively introducing additional network connections (the ‘crutch’ connections). Using five public datasets (two retinal vessel, melanoma, optic disc/cup, and spleen segmentation) and two in-house datasets (lymph node and fungus segmentation), we verify the effectiveness of our proposed approach in 2D and 3D segmentation.
Article
Automated segmentation of three-dimensional medical images is of great importance for the detection and quantification of certain diseases such as stenosis in the coronary arteries. Many 2D and 3D deep learning models, especially deep convolutional neural networks (CNNs), have achieved state-of-the-art segmentation performance on 3D medical images. Yet, there is a trade-off between the field of view and the utilization of inter-slice information when using pure 2D or 3D CNNs for 3D segmentation, which compromises the segmentation accuracy. In this paper, we propose a two-stage strategy that retains the advantages of both 2D and 3D CNNs and apply the method for the segmentation of the human aorta and coronary arteries, with stenosis, from computed tomography (CT) images. In the first stage, a 2D CNN, which can extract large-field-of-view information, is used to segment the aorta and coronary arteries simultaneously in a slice-by-slice fashion. Then, in the second stage, a 3D CNN is applied to extract the inter-slice information to refine the segmentation of the coronary arteries in certain subregions not resolved well in the first stage. We show that the 3D network of the second stage can improve the continuity between slices and reduce the missed detection rate of the 2D CNN. Compared with directly using a 3D CNN, the two-stage approach can alleviate the class imbalance problem caused by the large non-coronary artery (aorta and background) and the small coronary artery and reduce the training time because the vast majority of negative voxels are excluded in the first stage. To validate the efficacy of our method, extensive experiments are carried out to compare with other approaches based on pure 2D or 3D CNNs and those based on hybrid 2D-3D CNNs.
Chapter
Semi-supervised learning (SSL) uses unlabeled data during training to learn better models. Previous studies on SSL for medical image segmentation focused mostly on improving model generalization to unseen data. In some applications, however, our primary interest is not generalization but to obtain optimal predictions on a specific unlabeled database that is fully available during model development. Examples include population studies for extracting imaging phenotypes. This work investigates an often overlooked aspect of SSL, transduction. It focuses on the quality of predictions made on the unlabeled data of interest when they are included for optimization during training, rather than improving generalization. We focus on the self-training framework and explore its potential for transduction. We analyze it through the lens of Information Gain and reveal that learning benefits from the use of calibrated or under-confident models. Our extensive experiments on a large MRI database for multi-class segmentation of traumatic brain lesions shows promising results when comparing transductive with inductive predictions. We believe this study will inspire further research on transductive learning, a well-suited paradigm for medical image analysis.
Chapter
Active learning aims to address the paucity of labeled data by finding the most informative samples. However, when applying to semantic segmentation, existing methods ignore the segmentation difficulty of different semantic areas, which leads to poor performance on those hard semantic areas such as tiny or slender objects. To deal with this problem, we propose a semantic Difficulty-awarE Active Learning (DEAL) network composed of two branches: the common segmentation branch and the semantic difficulty branch. For the latter branch, with the supervision of segmentation error between the segmentation result and GT, a pixel-wise probability attention module is introduced to learn the semantic difficulty scores for different semantic areas. Finally, two acquisition functions are devised to select the most valuable samples with semantic difficulty. Competitive results on semantic segmentation benchmarks demonstrate that DEAL achieves state-of-the-art active learning performance and improves the performance of the hard semantic areas in particular.
Article
There have been considerable debates over 2D and 3D representation learning on 3D medical images. 2D approaches could benefit from large-scale 2D pretraining, whereas they are generally weak in capturing large 3D contexts. 3D approaches are natively strong in 3D contexts, however few publicly available 3D medical dataset is large and diverse enough for universal 3D pretraining. Even for hybrid (2D + 3D) approaches, the intrinsic disadvantages within the 2D / 3D parts still exist. In this study, we bridge the gap between 2D and 3D convolutions by reinventing the 2D convolutions. We propose ACS (axial-coronal-sagittal) convolutions to perform natively 3D representation learning, while utilizing the pretrained weights on 2D datasets. In ACS convolutions, 2D convolution kernels are split by channel into three parts, and convoluted separately on the three views (axial, coronal and sagittal) of 3D representations. Theoretically, ANY 2D CNN (ResNet, DenseNet, or DeepLab) is able to be converted into a 3D ACS CNN, with pretrained weight of a same parameter size. Extensive experiments on several medical benchmarks (including classification, segmentation and detection tasks) validate the consistent superiority of the pretrained ACS CNNs, over the 2D / 3D CNN counterparts with / without pretraining. Even without pretraining, the ACS convolution can be used as a plug-and-play replacement of standard 3D convolution, with smaller model size and less computation.
Article
Objective: Three-dimensional (3D) blood vessel structure information is important for diagnosis and treatment in various clinical scenarios. We present a fully automatic method for the extraction and differentiation of the arterial and venous vessel trees from abdominal contrast enhanced computed tomography (CE-CT) volumes using convolutional neural networks (CNNs). Methods: We used a novel ratio-based sampling method to train 2D and 3D versions of the U-Net, the V-Net and the DeepVesselNet. Networks were trained with a combination of the Dice and cross entropy loss. Performance was evaluated on 20 IR-CAD subjects. Best performing networks were combined into an ensemble. We investigated seven different weight-ing schemes. Trained networks were additionally applied to 26 BTCV cases to validate the generalizability. Results: Based on our experiments, the optimal configuration is an equally weighted ensemble of 2D and 3D U-and V-Nets. Our method achieved Dice similarity coefficients of 0.758 ± 0.050 (veins) and 0.838 ± 0.074 (arteries) on the IRCAD data set. Application to the BTCV data set showed a high transfer ability. Conclusion: Abdominal vascular structures can be segmented more accurately using ensembles than individual CNNs. 2D and 3D networks have complementary strengths and weaknesses. Our ensemble of 2D and 3D U-Nets and V-Nets in combination with ratio-based sampling achieves a high agreement with manual annotations for both artery and vein segmentation. Our results surpass other state-of-the-art methods. Significance: Our segmen-tation pipeline can provide valuable information for the planning of living donor organ transplantations.
Chapter
There has been a debate of using 2D and 3D convolution on volumetric medical image segmentation. The problem is that 2D convolution loses 3D spatial relationship of image features, while 3D convolution layers are hard to train from scratch due to the limited size of medical image dataset. Employing more trainable parameters and complicated connections may improve the performance of 3D CNN, however, inducing extra computational burden at the same time. It is meaningful to improve performance of current 3D medical image processing without requiring extra inference computation and memory resources. In this paper, we propose a general solution, Division-Fusion (DF)-CNN for free performance improvement on any available 3D medical image segmentation approach. During the division phase, different view-based kernels are divided from a single 3D kernel to extract multi-view context information that strengthens the spatial information of feature maps. During the fusion phase, all kernels are fused into one 3D kernel to reduce the parameters of deployed model. We extensively evaluated our DF mechanism on prostate ultrasound volume segmentation. The results demonstrate a consistent improvement over different benchmark models with a clear margin.
Article
Background and objective: The deep neural network model can learn complex non-linear relationships in the data and has superior flexibility and adaptability. A downside of this flexibility is that they are sensitive to initial conditions, both in terms of the initial random weights and in terms of the statistical noise in the training dataset. And the disadvantage caused by adaptability is that deep convolutional networks usually have poor robustness or generalization when the models are trained using the extremely limited amount of labeled data, especially in the biomedical imaging informatics field. Methods: In this paper, we propose to develop and test a stacked generalization U-shape network (SG-UNet) based on the zoom strategy applying to biomedical image segmentation. SG-UNet is essentially a stacked generalization architecture consisting of multiple sub-modules, which takes multi-resolution images as input and uses hybrid features to segment regions of interest and detect diseases under the multi-supervision. The proposed new SG-UNet applies the zoom of multi-supervision to do optimization search in global feature space without pre-training. Besides, the zoom loss function can gradually enhance the focus training on a sparse set of hard samples. Results: We evaluated the proposed algorithm in comparison with several popular U-shape ensemble network architectures across multi-modal biomedical image segmentation tasks to segment malignant rectal cancers, polyps and glands from the three imaging modalities of computed tomography (CT), digital colonoscopy and histopathology images. Applying the proposed algorithm improves 3.116%, 2.676%, 2.356% on Dice coefficients, and 3.044%, 2.420%, 1.928% on F2-score for the three imaging modality datasets, respectively. The comparison results using different amounts of rectal cancer CT data show that the proposed algorithm has a slower tendency of diminishing marginal efficiency. And glands segmentation study results also support the feasibility of yielding comparable performance with other state-of-the-art methods. Conclusions: The proposed algorithm can be trained more efficiently by using the small image datasets without using additional techniques such as fine-tuning, and achieves higher accuracy with less computational complexity than other stacked ensemble networks for biomedical image segmentation.
Article
With the development of machine learning and artificial intelligence, many convolutional neural networks (CNNs) based segmentation methods have been proposed for 3D cardiac segmentation. In this paper, we propose the category attention boosting (CAB) module, which combines the deep network calculation graph with the boosting method. On the one hand, we add the attention mechanism into the gradient boosting process, which enhances the information of coarse segmentation without high computation cost. On the other hand, we introduce the CAB module into the 3D U-Net segmentation network and construct a new multi-scale boosting model CAB U-Net which strengthens the gradient flow in the network and makes full use of the low resolution feature information. Thanks to the advantage that end-to-end networks can adaptively adjust the internal parameters, CAB U-Net can make full use of the complementary effects among different base learners. Extensive experiments on public datasets show that our approach can achieve superior performance over the state-of-the-art methods.
Article
Full-text available
Recent work has shown that convolutional networks can be substantially deeper, more accurate and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper we embrace this observation and introduce the Dense Convolutional Network (DenseNet), where each layer is directly connected to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections, one between each layer and its subsequent layer (treating the input as layer 0), our network has L(L+1)/2 direct connections. For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. Our proposed connectivity pattern has several compelling advantages: it alleviates the vanishing gradient problem and strengthens feature propagation; despite the increase in connections, it encourages feature reuse and leads to a substantial reduction of parameters; its models tend to generalize surprisingly well. We evaluate our proposed architecture on five highly competitive object recognition benchmark tasks. The DenseNet obtains significant improvements over the state-of-the-art on all five of them (e.g., yielding 3.74% test error on CIFAR-10, 19.25% on CIFAR-100 and 1.59% on SVHN).
Article
Full-text available
Automatic and accurate whole-heart and great vessel segmentation from 3D cardiac magnetic resonance (MR) images plays an important role in the computer-assisted diagnosis and treatment of cardiovascular disease. However, this task is very challenging due to ambiguous cardiac borders and large anatomical variations among different subjects. In this paper, we propose a novel densely-connected volumetric convolutional neural network, referred as \emph{DenseVoxNet}, to automatically segment the cardiac and vascular structures from 3D cardiac MR images. The DenseVoxNet adopts the 3D fully convolutional architecture for effective volume-to-volume prediction. From the learning perspective, our DenseVoxNet has three compelling advantages. First, it preserves the maximum information flow between layers by a densely-connected mechanism and hence eases the network training. Second, it avoids learning redundant feature maps by encouraging feature reuse and hence requires fewer parameters to achieve high performance, which is essential for medical applications with limited training data. Third, we add auxiliary side paths to strengthen the gradient propagation and stabilize the learning process. We demonstrate the effectiveness of DenseVoxNet by comparing it with the state-of-the-art approaches from HVSMR 2016 challenge in conjunction with MICCAI, and our network achieves the best dice coefficient. We also show that our network can achieve better performance than other 3D ConvNets but with fewer parameters.
Article
Full-text available
Artificial neural networks have been successfully applied to a variety of machine learning tasks, including image recognition, semantic segmentation, and machine translation. However, few studies fully investigated ensembles of artificial neural networks. In this work, we investigated multiple widely used ensemble methods, including unweighted averaging, majority voting, the Bayes Optimal Classifier, and the (discrete) Super Learner, for image recognition tasks, with deep neural networks as candidate algorithms. We designed several experiments, with the candidate algorithms being the same network structure with different model checkpoints within a single training process, networks with same structure but trained multiple times stochastically, and networks with different structure. In addition, we further studied the over-confidence phenomenon of the neural networks, as well as its impact on the ensemble methods. Across all of our experiments, the Super Learner achieved best performance among all the ensemble methods in this study.
Conference Paper
Full-text available
We present an interactive algorithm to segment the heart chambers and epicardial surfaces, including the great vessel walls, in pediatric cardiac MRI of congenital heart disease. Accurate whole-heart segmentation is necessary to create patient-specific 3D heart models for surgical planning in the presence of complex heart defects. Anatomical variability due to congenital defects precludes fully automatic atlas-based segmentation. Our interactive segmentation method exploits expert segmentations of a small set of short-axis slice regions to automatically delineate the remaining volume using patch-based segmentation. We also investigate the potential of active learning to automatically solicit user input in areas where segmentation error is likely to be high. Validation is performed on four subjects with double outlet right ventricle, a severe congenital heart defect. We show that strategies asking the user to manually segment regions of interest within short-axis slices yield higher accuracy with less user input than those querying entire short-axis slices.
Article
Full-text available
Segmentation of 3D images is a fundamental problem in biomedical image analysis. Deep learning (DL) approaches have achieved the state-of-the-art segmentation performance. To exploit the 3D contexts using neural networks, known DL segmen- tation methods, including 3D convolution, 2D convolution on the planes orthogonal to 2D slices, and LSTM in multiple directions, all suffer incompatibility with the highly anisotropic dimensions in common 3D biomedical images. In this paper, we propose a new DL framework for 3D image segmentation, based on a com- bination of a fully convolutional network (FCN) and a recurrent neural network (RNN), which are responsible for exploiting the intra-slice and inter-slice contexts, respectively. To our best knowledge, this is the first DL framework for 3D image segmentation that explicitly leverages 3D image anisotropism. Evaluating using a dataset from the ISBI Neuronal Structure Segmentation Challenge and in-house image stacks for 3D fungus segmentation, our approach achieves promising results, comparing to the known DL-based 3D segmentation approaches.
Article
Full-text available
Automatic liver segmentation from CT volumes is a crucial prerequisite yet challenging task for computer-aided hepatic disease diagnosis and treatment. In this paper, we present a novel 3D deeply supervised network (3D DSN) to address this challenging task. The proposed 3D DSN takes advantage of a fully convolutional architecture which performs efficient end-to-end learning and inference. More importantly, we introduce a deep supervision mechanism during the learning process to combat potential optimization difficulties, and thus the model can acquire a much faster convergence rate and more powerful discrimination capability. On top of the high-quality score map produced by the 3D DSN, a conditional random field model is further employed to obtain refined segmentation results. We evaluated our framework on the public MICCAI-SLiver07 dataset. Extensive experiments demonstrated that our method achieves competitive segmentation results to state-of-the-art approaches with a much faster processing speed.
Article
Full-text available
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous "parameter server" designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with particularly strong support for training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model in contrast to existing systems, and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.
Conference Paper
Full-text available
Segmentation of anatomical structures in medical images is often based on a voxel/pixel classification approach. Deep learning systems, such as convolutional neural networks (CNNs), can infer a hierarchical representation of images that fosters categorization. We propose a novel system for voxel classification integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D image, respectively. We applied our method to the segmentation of tibial cartilage in low field knee MRI scans and tested it on 114 unseen scans. Although our method uses only 2D features at a single scale, it performs better than a state-of-the-art method using 3D multi-scale features. In the latter approach, the features and the classifier have been carefully adapted to the problem at hand. That we were able to get better results by a deep learning architecture that autonomously learns the features from the images is the main insight of this study.
Article
Full-text available
This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.
Article
Full-text available
When trying to learn a model for the prediction of an outcome given a set of covariates, a statistician has many estimation procedures in their toolbox. A few examples of these candidate learners are: least squares, least angle regression, random forests, and spline regression. Previous articles (van der Laan and Dudoit (2003); van der Laan et al. (2006); Sinisi et al. (2007)) theoretically validated the use of cross validation to select an optimal learner among many candidate learners. Motivated by this use of cross validation, we propose a new prediction method for creating a weighted combination of many candidate learners to build the super learner. This article proposes a fast algorithm for constructing a super learner in prediction which uses V-fold cross-validation to select weights to combine an initial set of candidate learners. In addition, this paper contains a practical demonstration of the adaptivity of this so called super learner to various true data generating distributions. This approach for construction of a super learner generalizes to any parameter which can be defined as a minimizer of a loss function.
Chapter
In the last years, neural networks have proven to be a powerful framework for various image analysis problems. However, some application domains have specific limitations. Notably, digital pathology is an example of such fields due to tremendous image sizes and quite limited number of training examples available. In this paper, we adopt state-of-the-art convolutional neural networks (CNN) architectures for digital pathology images analysis. We propose to classify image patches to increase effective sample size and then to apply an ensembling technique to build prediction for the original images. To validate the developed approaches, we conducted experiments with Breast Cancer Histology Challenge dataset and obtained 90% accuracy for the 4-class tissue classification task.
Conference Paper
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
Conference Paper
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.
Conference Paper
Image segmentation is a fundamental problem in biomedical image analysis. Recent advances in deep learning have achieved promising results on many biomedical image segmentation benchmarks. However, due to large variations in biomedical images (different modalities, image settings, objects, noise, etc.), to utilize deep learning on a new application, it usually needs a new set of training data. This can incur a great deal of annotation effort and cost, because only biomedical experts can annotate effectively, and often there are too many instances in images (e.g., cells) to annotate. In this paper, we aim to address the following question: With limited effort (e.g., time) for annotation, what instances should be annotated in order to attain the best performance? We present a deep active learning framework that combines fully convolutional network (FCN) and active learning to significantly reduce annotation effort by making judicious suggestions on the most effective annotation areas. We utilize uncertainty and similarity information provided by FCN and formulate a generalized version of the maximum set cover problem to determine the most representative and uncertain areas for annotation. Extensive experiments using the 2015 MICCAI Gland Challenge dataset and a lymph node ultrasound image segmentation dataset show that, using annotation suggestions by our method, state-of-the-art segmentation performance can be achieved by using only 50% of training data.
Article
Segmentation of key brain tissues from 3D medical images is of great significance for brain disease diagnosis, progression assessment and monitoring of neurologic conditions. While manual segmentation is time-consuming, laborious, and subjective, automated segmentation is quite challenging due to the complicated anatomical environment of brain and the large variations of brain tissues. We propose a novel voxelwise residual network (VoxResNet) with a set of effective training schemes to cope with this challenging problem. The main merit of residual learning is that it can alleviate the degradation problem when training a deep network so that the performance gains achieved by increasing the network depth can be fully leveraged. With this technique, our VoxResNet is built with 25 layers, and hence can generate more representative features to deal with the large variations of brain tissues than its rivals using hand-crafted features or shallower networks. In order to effectively train such a deep network with limited training data for brain segmentation, we seamlessly integrate multi-modality and multi-level contextual information into our network, so that the complementary information of different modalities can be harnessed and features of different scales can be exploited. Furthermore, an auto-context version of the VoxResNet is proposed by combining the low-level image appearance features, implicit shape information, and high-level context together for further improving the segmentation performance. Extensive experiments on the well-known benchmark (i.e., MRBrainS) of brain segmentation from 3D magnetic resonance (MR) images corroborated the efficacy of the proposed VoxResNet. Our method achieved the first place in the challenge out of 37 competitors including several state-of-the-art brain segmentation methods. Our method is inherently general and can be readily applied as a powerful tool to many brain-related studies, where accurate segmentation of brain structures is critical.
Article
In the field of connectomics, neuroscientists seek to identify cortical connectivity comprehensively. Neuronal boundary detection from the Electron Microscopy (EM) images is often done to assist the automatic reconstruction of neuronal circuit. But the segmentation of EM images is a challenging problem, as it requires the detector to be able to detect both filament-like thin and blob-like thick membrane, while suppressing the ambiguous intracellular structure. In this paper, we propose multi-stage multi-recursive-input fully convolutional networks to address this problem. The multiple recursive inputs for one stage, i.e., the multiple side outputs with different receptive field sizes learned from the lower stage, provide multi-scale contextual boundary information for the consecutive learning. This design is biologically-plausible, as it likes a human visual system to compare different possible segmentation solutions to address the ambiguous boundary issue. Our multi-stage networks are trained end-to-end. It achieves promising results on a public available mouse piriform cortex dataset, which significantly outperforms other competitors.
Conference Paper
We propose an automatic method using dilated convolutional neural networks (CNNs) for segmentation of the myocardium and blood pool in cardiovascular MR (CMR) of patients with congenital heart disease (CHD). Ten training and ten test CMR scans cropped to an ROI around the heart were provided in the MICCAI 2016 HVSMR challenge. A dilated CNN with a receptive field of \(131\times 131\) voxels was trained for myocardium and blood pool segmentation in axial, sagittal and coronal image slices. Performance was evaluated within the HVSMR challenge. Automatic segmentation of the test scans resulted in Dice indices of \(0.80\,\pm \,0.06\) and \(0.93\,\pm \,0.02\), average distances to boundaries of \(0.96\,\pm \,0.31\) and \(0.89\,\pm \,0.24\) mm, and Hausdorff distances of \(6.13\,\pm \,3.76\) and \(7.07\,\pm \,3.01\) mm for the myocardium and blood pool, respectively. Segmentation took \(41.5\,\pm \,14.7\) s per scan. In conclusion, dilated CNNs trained on a small set of CMR images of CHD patients showing large anatomical variability provide accurate myocardium and blood pool segmentations.
Conference Paper
Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
Article
Efforts to automate the reconstruction of neural circuits from 3D electron microscopic (EM) brain images are critical for the field of connectomics. An important computation for reconstruction is the detection of neuronal boundaries. Images acquired by serial section EM, a leading 3D EM technique, are highly anisotropic, with inferior quality along the third dimension. For such images, the 2D max-pooling convolutional network has set the standard for performance at boundary detection. Here we achieve a substantial gain in accuracy through three innovations. Following the trend towards deeper networks for object recognition, we use a much deeper network than previously employed for boundary detection. Second, we incorporate 3D as well as 2D filters, to enable computations that use 3D context. Finally, we adopt a recursively trained architecture in which a first network generates a preliminary boundary map that is provided as input along with the original image to a second network that generates a final boundary map. Backpropagation training is accelerated by ZNN, a new implementation of 3D convolutional networks that uses multicore CPU parallelism for speed. Our hybrid 2D-3D architecture could be more generally applicable to other types of anisotropic 3D images, including video, and our recursive framework for any image labeling problem.
Article
Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.
Article
We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions. The method is straightforward to implement and is based an adaptive estimates of lower-order moments of the gradients. The method is computationally efficient, has little memory requirements and is well suited for problems that are large in terms of data and/or parameters. The method is also ap- propriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The method exhibits invariance to diagonal rescaling of the gradients by adapting to the geometry of the objective function. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. We demonstrate that Adam works well in practice when experimentally compared to other stochastic optimization methods.
Article
We address a central problem of neuroanatomy, namely, the automatic segmen-tation of neuronal structures depicted in stacks of electron microscopy (EM) im-ages. This is necessary to efficiently map 3D brain structure and connectivity. To segment biological neuron membranes, we use a special type of deep artificial neural network as a pixel classifier. The label of each pixel (membrane or non-membrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succes-sion of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a 512 × 512 × 30 stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific post-processing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. rand error, warping error and pixel error. For pixel error, our approach is the only one outperforming a second human observer.
Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks
  • D.-H Lee
Lee, D.-H. 2013. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on Challenges in Representation Learning, ICML, volume 3, 2.
  • S Ioffe
  • C Szegedy
Ioffe, S., and Szegedy, C. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
  • Y Xia
  • L Xie
  • F Liu
  • Z Zhu
  • E K Fishman
  • A L Yuille
Xia, Y.; Xie, L.; Liu, F.; Zhu, Z.; Fishman, E. K.; and Yuille, A. L. 2018. Bridging the gap between 2D and 3D organ segmentation. arXiv preprint arXiv:1804.00392.