Conference PaperPDF Available

Whole Brain Segmentation and Labeling from CT Using Synthetic MR Images

Authors:

Abstract and Figures

To achieve whole-brain segmentation—i.e., classifying tissues within and immediately around the brain as gray matter (GM), white matter (WM), and cerebrospinal fluid—magnetic resonance (MR) imaging is nearly always used. However, there are many clinical scenarios where computed tomography (CT) is the only modality that is acquired and yet whole brain segmentation (and labeling) is desired. This is a very challenging task, primarily because CT has poor soft tissue contrast; very few segmentation methods have been reported to date and there are no reports on automatic labeling. This paper presents a whole brain segmentation and labeling method for non-contrast CT images that first uses a fully convolutional network (FCN) to synthesize an MR image from a CT image and then uses the synthetic MR image in a standard pipeline for whole brain segmentation and labeling. The FCN was trained on image patches derived from ten co-registered MR and CT images and the segmentation and labeling method was tested on sixteen CT scans in which co-registered MR images are available for performance evaluation. Results show excellent MR image synthesis from CT images and improved soft tissue segmentation and labeling over a multi-atlas segmentation approach.
Content may be subject to copyright.
Whole brain segmentation and labeling from CT
using synthetic MR images
Can Zhao1, Aaron Carass1, Junghoon Lee2, Yufan He1, and Jerry L. Prince1
1Dept. of Electrical and Computer Engineering,
The Johns Hopkins University, Baltimore, MD 21218
2Dept. of Radiation Oncology,
The Johns Hopkins School of Medicine, Baltimore, MD 21287
1czhao20@jhu.edu
Abstract.
To achieve whole-brain segmentation—i.e., classifying tissues
within and immediately around the brain as gray matter (GM), white
matter (WM), and cerebrospinal fluid—magnetic resonance (MR) imaging
is nearly always used. However, there are many clinical scenarios where
computed tomography (CT) is the only modality that is acquired and
yet whole brain segmentation (and labeling) is desired. This is a very
challenging task, primarily because CT has poor soft tissue contrast;
very few segmentation methods have been reported to date and there
are no reports on automatic labeling. This paper presents a whole brain
segmentation and labeling method for non-contrast CT images that first
uses a fully convolutional network (FCN) to synthesize an MR image
from a CT image and then uses the synthetic MR image in a standard
pipeline for whole brain segmentation and labeling. The FCN was trained
on image patches derived from ten co-registered MR and CT images
and the segmentation and labeling method was tested on sixteen CT
scans in which co-registered MR images are available for performance
evaluation. Results show excellent MR image synthesis from CT images
and improved soft tissue segmentation and labeling over a multi-atlas
segmentation approach.
Keywords:
synthesis, MR, CT, deep learning, CNN, FCN U-net, seg-
mentation
1 Introduction
Computed tomography (CT) imaging of the head has many clinical and scientific
uses including visualization and assessment of head injuries, intracranial bleeding,
aneurysms, tumors, headaches, and dizziness as well as for use in surgical planning.
Yet due to the poor soft tissue contrast in CT images, magnetic resonance imag-
ing (MRI) is almost exclusively used for localizing, characterizing, and labeling
gray matter (GM) and white matter (WM) structures in the brain. Unfortunately,
there are many scenarios in which only CT images are available—e.g., emergency
2 C. Zhao, A. Carass, J. Lee, Y. He, and J. L. Prince
situations, lack of an MR scanner, patient implants or claustrophobia, and cost
of obtaining an MR scan—and there is no approach to provide whole brain
segmentation and labeling from these data.
There has been very limited work on GM/WM segmentation from CT images.
A whole brain segmentation method for 4D contrast-enhanced CT based on a
nonlinear support vector machine was recently published [12]. The authors point
out that a key part of their method is the formation of a 3D image derived
from all of the temporal acquisitions. The segmentation result is impressive, but
it is not clear that their method will work on conventional 3D CT data. As
well, their method only provides classification of GM, WM, and CSF and does
not label the sub-cortical GM or cortical gyri. The authors of [12] provide an
excellent summary of much of the previous work on GM/WM segmentation from
non-contrast CT (cf. [8, 15, 6, 10]), and also point out the limitations of past
approaches. It is clearly an area of investigation that deserves more research. In
contrast to the situation in CT, GM/WM segmentation and labeling from MRI
has been well studied and several excellent approaches exist (cf. [18, 9, 5, 14]).
Thus, it is natural to wonder whether images that are synthesized from CT to
look like MR images could be used for automatic segmentation and labeling; this
is precisely what we propose.
Image synthesis methods provide intensity transformations between two image
contrasts or modalities (cf. [17, 1, 3, 11, 2]). Previously reported image synthesis
work has synthesized CT from MRI [1],
T2
-weighted (
T2
-w) from
T1
-weighted (
T1
-
w) [3], and positron emission tomography (PET) from MRI [11]. In very recent
work, Cao et al. [2] synthesized pelvic
T1
-w images from CT using a random
forest and showed improvement in cross-modal registration. Some researchers
have applied convolutional neural networks (CNNs) to synthesis (cf. [11]) yet
Cao et al. [2] claimed that robust and accurate synthesis of MR from CT using
a CNN is not feasible. We believe that because CNNs are resilient to intensity
variations [4] and they can model highly nonlinear mappings, they are ideal for
CT-to-MR synthesis. In fact, we demonstrate in this paper that such synthesis
is indeed possible and that whole brain segmentation and labeling from these
synthetic images is very effective.
2 Methods
Training and testing data.
Twenty six patients had (
T1
-w) MR images ac-
quired using a Siemens Magnetom Espree 1.5 T scanner (Siemens Medical
Solutions, Malvern, PA) with geometric distortions corrected within the Siemens
Syngo console workstation. The MR images were processed with N4 to remove
any bias field and subsequently had their intensity scales adjusted to align their
WM peaks. Contemporaneous CT images were obtained on a Philips Brilliance
Big Bore scanner (Philips Medical Systems, Netherlands) under a routine clinical
protocol for brain cancer patients treated with either stereotactic-body radiation
therapy (SBRT) or radiosurgery (SRS). The CT images were resampled to have
CT segmentation and labeling from synthetic MR images 3
Fig. 1. Our modified U-net with four levels of contraction and expansion.
the same digital resolution as the MR images, which is 0.7×0.7×1 mm. Then
the MR images were rigidly registered to the CT images.
We use ten patient image pairs as training data for our CNN (see below).
For each axial slice in the image domain, twenty-five 128 ×128 paired (CT and
MR) image patches are extracted. The 128
×
128 patches can be thought of as
subdividing the slice into a 5
×
5 grid with overlap between the image patches.
These patch pairs are used to train an FCN based on a modified U-net [16] that
will synthesize MR patches from CT patches. The synthetic MR patches are then
used to construct an axial slice of the synthetic MR image. Our FCN, with 128
x 128 CT patches as input and 128 x 128 synthetic MR patches as output, is
shown in Fig. 1.
FCN algorithm for CT-to-MR synthesis
The mapping between CT and
MR is too nonlinear to be modeled accurately by the shallow features used in a
random forest, which is why we explore a CNN based approach. As the mapping
between CT and MR is dependent on anatomical structures, it makes intuitive
sense that any CNN synthesis model should incorporate the ideas of semantic
segmentation, for which fully convolutional networks (FCNs) were designed.
Additionally, having already sacrificed some resolution in bringing the CT into
alignment with MR, we want to be careful to not further degrade the image
quality. Thus, we have selected as the basis of our FCN the U-net [16], which
can achieve state-of-the-art performance for semantic segmentation and preserve
high resolution information throughout the contraction-expansion layers of the
network.
The encoder follows the typical architecture of a CNN. Each step contains
two 3
×
3 convolutional layers, activated by a rectified linear unit (ReLU), and a
2
×
2 max pooling operation for downsampling. In the decoder, each step contains
a 2
×
2 upsampling layer followed by a 5
×
5 convolutional layer and a 3
×
3
4 C. Zhao, A. Carass, J. Lee, Y. He, and J. L. Prince
convolutional layer. The two convolutional layers are activated by ReLU. And
the final layer is a 1 ×1 convolutional layer.
This FCN has four differences from the standard U-net. Modification 1: the
U-net decoder has two 3
×
3 layers, whereas we use one 5
×
5 layer and one 3
×
3
layer. We do this because the upsampling layer is simply repeating values in a
2
×
2 window. Thus, a 3
×
3 layer in the encoder can involve its eight connected
neighbors, whereas a 3
×
3 layer after an upsampling layer only includes three-
connected neighbors. By replacing this with a 5
×
5 layer, we can still involve all
eight connected neighbors. There is a slight increase in the number of parameters
to estimate, but the result has better accuracy.
Modification 2: CNN vision tasks benefit from increasing model depth; how-
ever, deeper models can have vanishing or exploding gradients [7]. In the original
U-net, the decoder contains an upsampling layer, a convolutional layer, a layer
merging it with high resolution representations, and another convolutional layer.
Thus, the upsampled layer is convolved twice while the high resolution repre-
sentation is convolved only once. We therefore exchange the order of the first
convolutional layer and the merging layer so that both are convolved twice. With
this change, we retain the same number of layers but our FCN can model greater
non-linearity without introducing additional obstacles for back-propagation.
Modification 3: Every convolution loses border pixels; thus, the border of the
predicted patch may not be as reliable as the center. The standard U-net crops
each patch after each convolutional layer so that the predicted patch is smaller
than the input patch. Our FCN keeps the boundary pixels instead of cropping
them. However, when reconstructing a slice we use only the central 90
×
90 region
of the image patches (except at the boundaries of the image, where we retain the
side of the patch that touches the boundary).
Modification 4: U-net was used for solving segmentation, while synthesis is
a regression task. That is, U-net only needed labels to distinguish edges, while
we need to predict intensity values. Thus the batch normalization layers which
are throughout U-net are a concern; there is no effect on image contrast but
absolute intensity values are lost and CT numbers have a physical meaning. In
order to include this information, we merge the original CT patches before the
last convolutional layer. Also, U-net used softmax to activate the last layer for
segmentation, while we use ReLU for regression.
Automatic Whole-brain Segmentation and Labeling
We use MALP-EM [9]
to provide whole-brain segmentation and labeling from the synthetic MR images.
Since the synthetic MR images are naturally registered with the CT images,
the result is a segmentation and labeling of the CT images. MALP-EM uses an
atlas cohort of 30 subjects having both MR images and labels from the OASIS
database [13]. These atlases are deformably registered to the target and the
labels are combined using joint label fusion (JLF) [18]. Finally, these labels are
adjusted using an intensity based EM method to provide additional robustness
to pathology, especially traumatic brain injury. We used the code that has been
made freely available by the original authors of the method.
CT segmentation and labeling from synthetic MR images 5
(a) (b) (c)
(d) (e) (f)
Fig. 2.
For one subject, we show the
(a)
input CT image, the
(b)
output synthetic
T
1-w, and the
(c)
ground truth
T
1-w image.
(d)
is the dynamic range of
(a)
. Shown
in
(e)
and
(f)
are the MALP-EM segmentations of the synthetic and ground truth
T1-w images, respectively.
3 Experiments and Results
Image Synthesis
Our FCN was trained on 45,575 128
×
128 image patch
pairs derived from ten of the co-registered MR and CT images. It took two
days to train and 1 min to synthesize one MR image from the input CT on a
NVIDIA GPU GTX1070SC. Figs. 2(a)–(c) show an example input CT image,
the resulting synthetic T1-w, and the ground truth T1-w, respectively.
Experiment 1: MALP-EM segmentation
We applied MALP-EM on both
synthetic and ground truth
T
1-w images. Fig. 2(e) shows the segmentation
result from the synthetic
T
1-w in Fig. 2(b), while Fig. 2(f) shows the result
from the ground truth
T
1-w in Fig. 2(c). There are differences between the two
results, but this is the first result showing such a detailed labeling of CT brain
images. We compute Dice coefficients between segmentation results obtained
using synthetic
T
1-w and those obtained using the true
T
1-w. Here are mean
Dice coefficients for a few brain structures. For hippocampus, they are 0.62 (right)
and 0.59 (left); for precentral gyrus, they are 0.52 (right) and 0.55 (left); for
postcentral gyrus, they are 0.51 (right) and 0.52 (left); and for caudate, they
6 C. Zhao, A. Carass, J. Lee, Y. He, and J. L. Prince
Fig. 3.
With MALP-EM processing of the ground truth
T
1-w as the reference, we
compute the Dice coefficient between multi-atlas segmentations using either the subject
CT images with MV label fusion (red), or synthetic T1-w with MV (green) or JLF (blue),
as the label fusion, and MALP-EM (yellow). Note that MALP-EM uses the OASIS
atlas with manually delineated labels, while the other three use the remaining 15 images
with MALP-EM computed labels from the true T1-w images as atlases.
are 0.70 (right) and 0.73 (left). After merging the labels, box plots of the Dice
coefficients for four labels: non-cortical GM, cortical GM, ventricles, and WM,
are shown in Fig. 3 (yellow).
Experiment 2: Comparison to direct multi-atlas segmentation.
To demon-
strate the benefits of our approach, we carried out a set of algorithm comparisons.
Ideally, we would like to evaluate how well our CT images could be labeled di-
rectly from the OASIS atlases; but there are no CT data associated with OASIS.
Instead, we used the 16 subjects (which do not overlap the 10 subjects used to
train our FCN) in a set of leave-one-out experiments and let the MALP-EM labels
act as the “ground truth”. For each of the 16 subjects, we used the remaining 15
(having T1-w and MALP-EM labels) as atlases. To mimic the desired experiment,
we first carried out multi-modal registration from each of the 15 T1-w atlases
to the target CT using mutual information (MI) as the registration cost metric.
Because this is a multi-modal registration task, JLF is not available to combine
labels, so we used majority voting (MV) instead. We next computed a synthetic
T
1-w image from the target CT image and registered each atlas to this target
using mean squared error (MSE) as the registration metric. To provide a richer
comparison, we combined these 15 labels using both MV and JLF.
Given these three leave-one-out results, we computed Dice coefficient on four
labels: non-cortical GM, cortical GM, ventricles, and WM. The results are shown
in Fig. 3 (using the red, green, and blue graphics). We can see that use of the
synthetic
T
1-w is significantly better than using the original CT images whether
labels are combined with either MV or JLF. JLF seems to provide somewhat
better performance.
CT segmentation and labeling from synthetic MR images 7
4 Discussion and Conclusion
The synthetic images that we achieve with FCN are quite good visually as
demonstrated by the single (typical) example shown here (Fig. 2(b)), visually
much better than those shown in Cao et al. [2] (their Fig. 7). This speaks very well
to the potential of the FCN architecture for estimating synthetic cross-modality
images. Besides whole-brain segmentation and labeling, there are a host of other
potential applications for these images.
A limitation of our evaluation is our lack of manual brain labels in a CT
dataset, as it would be interesting to compare our approach with a top multi-atlas
segmentation algorithm that would use only CT data. The fact that our method
appears to perform worse than the straight multi-atlas results in Fig. 3 is because
the MALP-EM result is using manually delineated OASIS labels to estimate
automatically generated MALP-EM labels, whereas the other two approaches
are estimating MALP-EM labels from MALP-EM atlases. In the future, a more
thorough evaluation including a quantitative comparison with Cao et al. [2] is
warranted.
Recent research on contrast-enhanced 4D CT brain segmentation achieves
slightly higher mean Dice than ours, with 0.81 and 0.79 for WM and GM [12],
compared to ours as 0.77 and 0.76. However, because their data was 4D CT,
its combined 3D image probably has lower noise and also enables them to use
temporal features which we do not have. Furthermore, theirs was a contrast CT
study while ours is a non-contrast study.
In summary, we have used a modified U-net to synthesize
T
1-w images from
CT, and then directly segmented the synthetic
T
1-w using either MALP-EM or
a multi-atlas label fusion scheme. Our results show that using synthetic MR can
significantly improve the segmentation over using the CT image directly. This
is the first paper to provide GM anatomical labels on a CT neuroimage. Also,
despite previous assertions that CT-to-MR synthesis is impossible from CNNs,
we show that it is not only possible but it can be done with sufficient quality to
open up new clinical and scientific opportunities in neuroimaging.
Acknowledgments.
This work was supported by NIH/NIBIB under grant R01
EB017743.
References
1.
Burgos, N., Cardoso, M.J., Thielemans, K., Modat, M., Pedemonte, S., Dickson, J.,
Barnes, A., Ahmed, R., Mahoney, C.J., Schott, J.M., et al.: Attenuation correction
synthesis for hybrid PET-MR scanners: application to brain studies. IEEE Trans.
Med. Imag. 33(12), 2332–2341 (2014)
2.
Cao, X., Yang, J., Gao, Y., Guo, Y., Wu, G., Shen, D.: Dual-core steered non-rigid
registration for multi-modal images via bi-directional image synthesis. Medical
Image Analysis p. In Press (2017)
8 C. Zhao, A. Carass, J. Lee, Y. He, and J. L. Prince
3.
Chen, M., Carass, A., Jog, A., Lee, J., Roy, S., Prince, J.L.: Cross contrast multi-
channel image registration using image synthesis for mr brain images. Medical
Image Analysis 36, 2–14 (2017)
4.
Dodge, S., Karam, L.: Understanding how image quality affects deep neural net-
works. In: Quality of Multimedia Experience (QoMEX), 2016 Eighth International
Conference on. pp. 1–6. IEEE (2016)
5. Fischl, B.: Freesurfer. Neuroimage 62(2), 774–781 (2012)
6.
Gupta, V., Ambrosius, W., Qian, G., Blazejewska, A., Kazmierski, R., Urbanik,
A., Nowinski, W.L.: Automatic segmentation of cerebrospinal fluid, white and gray
matter in unenhanced computed tomography images. Academic radiology 17(11),
1350–1358 (2010)
7.
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
pp. 770–778 (2016)
8.
Hu, Q., Qian, G., Aziz, A., Nowinski, W.L.: Segmentation of brain from computed
tomography head images. In: Engineering in Medicine and Biology Society, 2005.
IEEE-EMBS 2005. 27th Annual International Conference of the. pp. 3375–3378.
IEEE (2006)
9.
Kamnitsas, K., Ledig, C., Newcombe, V.F., Simpson, J.P., Kane, A.D., Menon,
D.K., Rueckert, D., Glocker, B.: Efficient multi-scale 3D CNN with fully connected
CRF for accurate brain lesion segmentation. Medical Image Analysis 36, 61–78
(2017)
10.
Kemmling, A., Wersching, H., Berger, K., Knecht, S., Groden, C., N¨olte, I.: De-
composing the hounsfield unit. Clinical Neuroradiology 22(1), 79–91 (2012)
11.
Li, R., Zhang, W., Suk, H.I., Wang, L., Li, J., Shen, D., Ji, S.: Deep learning based
imaging data completion for improved brain disease diagnosis. In: International
Conference on Medical Image Computing and Computer-Assisted Intervention. pp.
305–312. Springer (2014)
12.
Manniesing, R., Oei, M.T., Oostveen, L.J., Melendez, J., Smit, E.J., Platel, B.,
anchez, C.I., Meijer, F.J., Prokop, M., van Ginneken, B.: White matter and gray
matter segmentation in 4d computed tomography. Scientific Reports 7 (2017)
13.
Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner,
R.L.: Open access series of imaging studies (oasis): cross-sectional mri data in
young, middle aged, nondemented, and demented older adults. Journal of cognitive
neuroscience 19(9), 1498–1507 (2007)
14.
Moeskops, P., Viergever, M.A., Mendrik, A.M., de Vries, L.S., Benders, M.J., Iˇsgum,
I.: Automatic segmentation of MR brain images with a convolutional neural network.
IEEE Trans. Med. Imag. 35(5), 1252–1261 (2016)
15.
Ng, C.R., Than, J.C.M., Noor, N.M., Rijal, O.M.: Preliminary brain region seg-
mentation using fcm and graph cut for CT scan images. In: BioSignal Analysis,
Processing and Systems (ICBAPS), 2015 International Conference on. pp. 52–56.
IEEE (2015)
16.
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical
image segmentation. In: International Conference on Medical Image Computing
and Computer-Assisted Intervention. pp. 234–241. Springer (2015)
17.
Roy, S., Wang, W.T., Carass, A., Prince, J.L., Butman, J.A., Pham, D.L.: PET
attenuation correction using synthetic CT from ultrashort echo-time MR imaging.
Journal of Nuclear Medicine 55(12), 2071–2077 (2014)
18.
Wang, H., Suh, J.W., Das, S.R., Pluta, J.B., Craige, C., Yushkevich, P.A.: Multi-
atlas segmentation with joint label fusion. IEEE Trans. Patt. Anal. Mach. Intell.
35(3), 611–623 (2013)
... The basis for selecting a dictionary is the ability to use sparse representations to generate features. Another popular approach to synthesize medical images is the atlas-based model [6,7,[39][40][41], which begins by creating an atlas from a set of co-registered pictures [42], and assumes that within-domain images from different themes are connected to one another by a geometric warp. However, this approach is highly relied on the accuracy of the registration. ...
Article
Full-text available
Recently, medical image synthesis has attracted the attention of an increasing number of researchers. However, most of current approaches suffer from the loss of multi-modal complementary information and thus fail to preserve the property of each modality, resulting in image distortion and texture detail loss. To alleviate this issue, a multi-modal magnetic resonance (MR) image synthesis algorithm based on a mixed attention fusion module in hybrid generative adversarial network is proposed. Firstly, a novel mixed attention fusion (MAF) module aggregating an adaptive fusion strategy (AFS) and a soft attention module is proposed to fuse the high-level semantic information and the low-level fine-grained feature at different scales between different layers to exploit rich representative complementary information adaptively. Subsequently, Resnet-bottlenect attention mechanism (Res-BAM) is designed to perform adaptive optimization and exploit mutual information while preserving the original property of each modality. Thereafter, the attention weight is inferred by a 1D channel feature map and a 2D spatial feature map, and multiplied with the original feature map in order to get the adaptive feature map, which is integrated with the original feature map in a residual connection to preserve the original property of each modality and prevent network degradation. Finally, the structural similarity (SSIM) and L1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\text{L}}_{1}$$\end{document}-norm are point-wise combined by an optimal weighting impact factor to preserve the high frequency information, brightness, color and SSIM, which are viewed as the original property of each modality. The experimental results demonstrate the superiority of our model on the state of the art in quantitative measures, reasonable visual quality and clinic significance.
Article
Full-text available
Whole-brain radiotherapy (WBRT) plays an irreplaceable role in the treatment of brain metastases (BMs), but cognitive decline after WBRT seriously affects patients’ quality of life. The development of cognitive dysfunction is closely related to hippocampal injury, but standardized criteria for predicting hippocampal injury and dose limits for hippocampal protection have not yet been developed. This review systematically reviews the clinical efficacy of hippocampal avoidance - WBRT (HA-WBRT), the controversy over dose limits, common methods and characteristics of hippocampal imaging and segmentation, differences in hippocampal protection by common radiotherapy (RT) techniques, and the application of artificial intelligence (AI) and radiomic techniques for hippocampal protection. In the future, the application of new techniques and methods can improve the consistency of hippocampal dose limit determination and the prediction of the occurrence of cognitive dysfunction in WBRT patients, avoiding the occurrence of cognitive dysfunction in patients and thus benefiting more patients with BMs.
Article
Imputation of missing images via source-to-target modality translation can improve diversity in medical imaging protocols. A pervasive approach for synthesizing target images involves one-shot mapping through generative adversarial networks (GAN). Yet, GAN models that implicitly characterize the image distribution can suffer from limited sample fidelity. Here, we propose a novel method based on adversarial diffusion modeling, SynDiff, for improved performance in medical image translation. To capture a direct correlate of the image distribution, SynDiff leverages a conditional diffusion process that progressively maps noise and source images onto the target image. For fast and accurate image sampling during inference, large diffusion steps are taken with adversarial projections in the reverse diffusion direction. To enable training on unpaired datasets, a cycle-consistent architecture is devised with coupled diffusive and non-diffusive modules that bilaterally translate between two modalities. Extensive assessments are reported on the utility of SynDiff against competing GAN and diffusion models in multi-contrast MRI and MRI-CT translation. Our demonstrations indicate that SynDiff offers quantitatively and qualitatively superior performance against competing baselines.
Chapter
Abdominal multi-organ segmentation is fast becoming a key instrument in preoperative diagnosis. Using the results of abdominal CT image segmentation for three-dimensional reconstruction is an intuitive and accurate method for surgical planning. In this paper, we propose a stable three-stage fast automatic segmentation method for abdominal 13 organs: liver, spleen, pancreas, right kidney, left kidney, stomach, gallbladder, esophagus, aorta, inferior vena cava, right adrenal gland, left adrenal gland, and duodenum. Our method includes preprocessing the CT data, segmenting the multi-organ and post-processing the segmentation outputs. The results on the test set show that the average DSC performance is about 0.766. The average time and GPU memory consumption for each case is 81.42 s and 1953 MB.
Article
Full-text available
Modern Computed Tomography (CT) scanners are capable of acquiring contrast dynamics of the whole brain, adding functional to anatomical information. Soft tissue segmentation is important for subsequent applications such as tissue dependent perfusion analysis and automated detection and quantification of cerebral pathology. In this work a method is presented to automatically segment white matter (WM) and gray matter (GM) in contrast- enhanced 4D CT images of the brain. The method starts with intracranial segmentation via atlas registration, followed by a refinement using a geodesic active contour with dominating advection term steered by image gradient information, from a 3D temporal average image optimally weighted according to the exposures of the individual time points of the 4D CT acquisition. Next, three groups of voxel features are extracted: intensity, contextual, and temporal. These are used to segment WM and GM with a support vector machine. Performance was assessed using cross validation in a leave-one-patient-out manner on 22 patients. Dice coefficients were 0.81 ± 0.04 and 0.79 ± 0.05, 95% Hausdorff distances were 3.86 ± 1.43 and 3.07 ± 1.72 mm, for WM and GM, respectively. Thus, WM and GM segmentation is feasible in 4D CT with good accuracy.
Article
Full-text available
We propose a dual pathway, 11-layers deep, three-dimensional Convolutional Neural Network for the challenging task of brain lesion segmentation. The devised architecture is the result of an in-depth analysis of the limitations of current networks proposed for similar applications. To overcome the computational burden of processing 3D medical scans, we have devised an efficient and effective dense training scheme which joins the processing of adjacent image patches into one pass through the network while automatically adapting to the inherent class imbalance present in the data. Further, we analyze the development of deeper, thus more discriminative 3D CNNs. In order to incorporate both local and larger contextual information, we employ a dual pathway architecture that processes the input images at multiple scales simultaneously. For post-processing of the network's soft segmentation, we use a 3D fully connected Conditional Random Field which effectively removes false positives. Our pipeline is extensively evaluated on three challenging tasks of lesion segmentation in multi-channel MRI patient data with traumatic brain injuries, brain tumours, and ischemic stroke. We improve on the state-of-the-art for all three applications, with top ranking performance on the public benchmarks BRATS 2015 and ISLES 2015. Our method is computationally efficient, which allows its adoption in a variety of research and clinical settings. The source code of our implementation is made publicly available.
Article
Full-text available
We propose a dual pathway, 11-layers deep, three-dimensional Convolutional Neural Network for the challenging task of brain lesion segmentation. The devised architecture is the result of an in-depth analysis of the limitations of current networks proposed for similar applications. To overcome the computational burden of processing 3D medical scans, we have devised an efficient and effective dense training scheme which joins the processing of adjacent image patches into one pass through the network while automatically adapting to the inherent class imbalance present in the data. Further, we analyze the development of deeper, thus more discriminative 3D CNNs. In order to incorporate both local and larger contextual information, we employ a dual pathway architecture that processes the input images at multiple scales simultaneously. For post-processing of the network's soft segmentation, we use a 3D fully connected Conditional Random Field which effectively removes false positives. Our pipeline is extensively evaluated on three challenging tasks of lesion segmentation in multi-channel MRI patient data with traumatic brain injuries, brain tumors, and ischemic stroke. We improve on the state-of-the-art for all three applications, with top ranking performance on the public benchmarks BRATS 2015 and ISLES 2015. Our method is computationally efficient, which allows its adoption in a variety of research and clinical settings. The source code of our implementation is made publicly available.
Article
In prostate cancer radiotherapy, computed tomography (CT) is widely used for dose planning purposes. However, because CT has low soft tissue contrast, it makes manual contouring difficult for major pelvic organs. In contrast, magnetic resonance imaging (MRI) provides high soft tissue contrast, which makes it ideal for accurate manual contouring. Therefore, the contouring accuracy on CT can be significantly improved if the contours in MRI can be mapped to CT domain by registering MRI with CT of the same subject, which would eventually lead to high treatment efficacy. In this paper, we propose a bi-directional image synthesis based approach for MRI-to-CT pelvic image registration. First, we use patch-wise random forest with auto-context model to learn the appearance mapping from CT to MRI domain, and then vice versa. Consequently, we can synthesize a pseudo-MRI whose anatomical structures are exactly same with CT but with MRI-like appearance, and a pseudo-CT as well. Then, our MRI-to-CT registration can be steered in a dual manner, by simultaneously estimating two deformation pathways: 1) one from the pseudo-CT to the actual CT and 2) another from actual MRI to the pseudo-MRI. Next, a dual-core deformation fusion framework is developed to iteratively and effectively combine these two registration pathways by using complementary information from both modalities. Experiments on a dataset with real pelvic CT and MRI have shown improved registration performance of the proposed method by comparing it to the conventional registration methods, thus indicating its high potential of translation to the routine radiation therapy.
Article
Multi-modal deformable registration is important for many medical image analysis tasks such as atlas alignment, image fusion, and distortion correction. Whereas a conventional method would register images with different modalities using modality independent features or information theoretic metrics such as mutual information, this paper presents a new framework that addresses the problem using a two-channel registration algorithm capable of using mono-modal similarity measures such as sum of squared differences or cross-correlation. To make it possible to use these same-modality measures, image synthesis is used to create proxy images for the opposite modality as well as intensity-normalized images from each of the two available images. The new deformable registration framework was evaluated by performing intra-subject deformation recovery, intra-subject boundary alignment, and inter-subject label transfer experiments using multi-contrast magnetic resonance brain imaging data. Three different multi-channel registration algorithms were evaluated, revealing that the framework is robust to the multi-channel deformable registration algorithm that is used. With a single exception, all results demonstrated improvements when compared against single channel registrations using the same algorithm with mutual information.
Conference Paper
Brain segmentation is important in the field of neuropsychiatric disorders. With Computed Tomography (CT) scan being the gold standard in brain scan, brain segmentation in CT images is also very important in the detection of many pathology related to the brain. Fuzzy c-Means (FCM) is a popular method in data clustering and also in image segmentation due to it being robust. Graph cut is a segmentation algorithm that is able to separate the image into several partitions based on the similarity between each nodes in the image. In this paper, the CT scan images were first processed with FCM optimization and are separated into clusters based on pixel intensity. After that the post-FCM images were then loaded into the graph cut algorithm to separate the images into partitions, allowing users to manually select the appropriate partitions that best represent the brain region. The results showed that the images are less erroneous when they are clustered first with FCM before going through the graph cut algorithm.
Conference Paper
There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .
Article
Image quality is an important practical challenge that is often overlooked in the design of machine vision systems. Commonly, machine vision systems are trained and tested on high quality image datasets, yet in practical applications the input images can not be assumed to be of high quality. Recently, deep neural networks have obtained state-of-the-art performance on many machine vision tasks. In this paper we provide an evaluation of 4 state-of-the-art deep neural network models for image classification under quality distortions. We consider five types of quality distortions: blur, noise, contrast, JPEG, and JPEG2000 compression. We show that the existing networks are susceptible to these quality distortions, particularly to blur and noise. These results enable future work in developing deep neural networks that are more invariant to quality distortions.