PreprintPDF Available

Robustness of Probabilistic U-Net for Automated Segmentation of White Matter Hyperintensity in Different Datasets of Brain MRI

November 2021

November 2021

DOI:10.13140/RG.2.2.30131.76320

Authors:

Rizal Maulana

University of Indonesia

Muhammad Febrian Rachmadi

University of Indonesia

Laksmita Rahadianti

University of Indonesia

Preprints and early-stage research may not have been peer reviewed yet.

Content uploaded by Rizal Maulana

Content may be subject to copyright.

ROBUSTNESS OF PROBABILISTIC U-NET

FOR AUTOMATED SE G M E N TATION OF WHITE MATTER

HYPERINTENSITIES IN DIFFERENT DATASETS OF BRAIN MRI

Rizal Maulana

Faculty of Computer Science

Universitas Indonesia

Depok, Indonesia

rizal.maulana01@ui.ac.id

Muhammad Febrian Rachmadi

Brain Image Analysis Unit

RIKEN Center for Brain Science

Wako, Japan

Laksmita Rahadianti

Faculty of Computer Science

Universitas Indonesia

Depok, Indonesia

November 18, 2021

ABS TRAC T

White Matter Hyperintensities (WMHs) are neuroradiological features often seen in T2-FLAIR

brain MRI as white regions (i.e., hyperintensities) and characteristic of small vessel disease (SVD).

Detailed measurements of WMHs (e.g., their volumes, locations, distributions) are vital for clinical

research, but segmenting WMHs is challenging due to WMHs’ ill-posed boundaries. In this study, we

investigate the robustness of Probabilistic U-Net and other deterministic deep learning models (i.e.,

U-Net and its variations) for automatic segmentation of WMHs. In particular, we are interested in the

robustness of U-Net based deep learning models, especially the Probabilistic U-Net, for segmenting

WMHs in brain MRI from different datasets. Thus, we performed two different experiments, which

are

-fold cross validation experiment (i.e., training and testing using the same dataset) and cross

dataset experiment (i.e., testing in different dataset). Based on our experiments, Probabilistic U-Net

outperformed other tested models in

-fold cross validation experiment. On the other hand, we found

that Probabilistic U-Net captured different types of uncertainty when tested in different dataset.

Keywords White Matter Hyperintensities (WMHs)

segmentation of WMHs

probabilistic model

U-Net

Probabilistic U-Net ·uncertainty ·robustness.

1 Introduction

White Matter Hyperintensities (WMHs) are neuroradiological features often seen in T2-FLAIR brain MRI and

characteristic of small vessel disease (SVD) [

]. In T2-FLAIR brain MRI, WMHs appear as white regions (i.e.,

hyperintensities) which makes it easier to discern and differentiate WMHs from normal tissues of the brain which

usually appear in darker (i.e., grey) colours [

]. Clinically, WMHs have been associated with neurodegenerative

diseases such as Alzheimer’s disease, stroke, dementia, and mood disorder [4].

Detailed measurements of WMHs (e.g., their volumes, locations, distributions) are required for ﬁnding best treatment in

clinical research [

], but manually segmenting and assessing WMHs for each patient are very expensive. Furthermore,

segmentation of WMHs is challenging due to WMHs’ ill-posed boundaries. Instead of clear boundary between WMHs

and non-WMHs regions, WMHs have gradual changes of intensity along their borders commonly referred to as the

“penumbra” of WMHs [

]. The penumbra of WMHs has been a subject of many studies which debate criteria to

correctly identify the WMHs borders [

]. In some cases, the penumbra of WMHs might appear very similar to MRI’s

artefacts and non-WMHs regions [

]. Thus, manual assessment of WMHs is known to have lower rate of inter-rater

reliability among raters.

There have been many studies that propose automatic image segmentation models for biomedical image using deep

learning [

]. However, these studies mostly evaluated their models using test set that comes from the

same dataset for training set [

]. This treatment does not answer the question of how effective their proposed models

APREPRINT - NOV EM BER 18, 2021

(a) U-Net [8] (b) Attention U-Net [10] (c) U-Net++ [13]

(d) Attention U-Net++ [11] (e) Attention gate [10]

(f) Illustra-

tion details

Figure 1: Illustrations of deterministic deep learning U-Net based models used in this study.

on a different dataset. Robustness of automatic segmentation models in different datasets is utmost important in

medical image analysis because test image/data can come from different hospitals, health care centers, or equipment

manufacturers.

In this study, we investigate the robustness of U-Net based deep learning models for WMHs segmentation in two different

datasets. Furthermore, we also investigate the robustness of two deep learning approaches, which are deterministic and

probabilistic deep learning models, for WMHs segmentation. All codes and trained model are available on our GitHub

page (https://github.com/rizalmaulanaa/Robustness_of_Prob_U_Net).

2 Related Works

U-Net [

] has been widely used in many segmentation tasks, especially in biomedical image segmentation tasks,

because it can work efﬁciently with limited training data [

]. There are three components in the original U-Net, which

are encoder, decoder, and skip connections (see Fig. 1a). Skip connections have an important role in U-Net, which

combine coarse-grained feature maps from each decoder with ﬁne-grained feature maps from each encoder [

]. Many

studies tried to improved U-Net by adding a new module or redesign the networks themselves [9, 10, 13, 14].

Attention U-Net was proposed in [

] (see Fig. 1b) by adding attention gate to the U-Net. The purpose of attention gate

is to make the U-Net focuses on places that have high relevance to the target labels [

]. Attention gate (see Fig. 1e)

was ﬁrst introduced in Natural Language Processing (NLP) and it is now commonly used in Computer Vision [

Fig. 1e shows the ﬂow of attention gate where it has two inputs that come from skip connection and up-sampling signal.

Up-sampling signal is used to enrich information from lower level while skip connection is used to retained information

from the encoder. Lastly, the output from attention gate is obtained by performing element-wise multiplication between

Attention Coefﬁcient (α) and skip connection [10].

U-Net++ was proposed in [

] by redesigning U-Net’s skip connections. Instead of sending the semantic information

from encoder to decoder directly, U-Net++ uses another set of convolutional blocks for extracting more features (see

Fig. 1c). Thus, U-Net++ is basically a nested U-Net in different levels of semantic information. U-Net++ also employs

Deep Supervision (DS) (seen as yellow lines in Fig. 1c) which averages all segmentation results from different branches

of semantic information [

]. However, DS is optional and the output segmentation is produced by the last decoder if

DS is not used. U-Net++ also can be combined with attention gate to become Attention U-Net++ [11] (see Fig. 1d).

Probabilistic U-Net [

] was ﬁrstly proposed for semantic segmentation of ambiguous images such as in medical

imaging. For example, different experts can produce different manual labels from one lung nodule in CT scan

[

]. Probabilistic U-Net employs conditional variational autoencoder (CVAE) for obtaining complex prior/posterior

distribution to capture and model uncertainties from images. In the inference, a random sample from the learned

distribution is used to produce variations of semantic segmentation.

The main difference between Probabilistic U-Net (probabilistic model) and the U-Net (deterministic model) is Proba-

bilistic U-Net has an additional process to learn useful embedding in latent space for capturing variations of semantic

segmentation (see Fig. 2). During training, Probabilistic U-Net’s Posterior Net will learn to produce a latent space that

APREPRINT - NOV EM BER 18, 2021

(a) Training process of Probabilistic U-Net [14] (b) Sampling process of Probabilistic U-Net [14]

Figure 2: Illustrations of (a) training process and (b) sampling process (i.e., inference after training process is ﬁnished)

of Probabilistic U-Net [14].

(a) Histogram of WMHs vol-

umes in different datasets.

(b) Histogram of WMHs vol-

umes in different institutions.

sities in different datasets.

(d) Histogram of WMHs inten-

sities in different institutions.

Figure 3: Distributions of volumes of WMHs clusters in every slices (i.e., (a) and (b)) and distributions of WMHs’

intensities (i.e., (c) and (d)) for each dataset. In (b) and (d), the Challenge dataset is divided into institutions that make it

up which are Singapore, GE3T, and Utrecht.

can capture variations of segmentation from ground truths and medical image. On the other hand, Probabilistic U-Net’s

Prior Net will try to produce the same latent space but only using the medical image. Kullback-Leibler Divergence score

is used to minimize differences between posterior distribution (from the Posterior Net) and prior distribution (from the

Prior Net). In sampling process (i.e., inference after training), the Prior Net is used to sample multiple

where each of

them generates a variation of semantic segmentation. Each sample

is broadcasted to the same height and width of

U-Net’s last feature maps, and then it is concatenated with the U-Net’s last feature maps before feed-forwarded to the

segmentation layer.

Most previous studies produced best performances when deep learning models are trained and tested by using train and

test sets that are from the same dataset [

]. To tackle the problem of robustness in different datasets, recent studies

utilised ensemble [

] or multiple branches [

] deep learning models. It was indicated that performance of deep

learning models can be affected by different intensity distribution of images obtained by different MRI scanners [17].

3 Methodology

3.1 Datasets and Pre-processing Methods

For testing the robustness of U-Net based models in different datasets, two different datasets from the Alzheimer’s

Disease Neuroimaging Initiative (ADNI)

[

] and WMH Segmentation Challenge [

] were chosen. Note that each

dataset has unique characteristics, e.g., they have different resolutions, amounts of slices, etc. In this study, we only

used T2-FLAIR brain MRI scans from both datasets for training.

For the ADNI dataset, we used a subset of ADNI dataset that has been used in a lot of previous studies [

]

which contains data from 20 patients where each patient have 3 MRI scans from different time points (the total is

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database

(

http://adni.loni.usc.edu/

). As such, the investigators within the ADNI contributed to the design and implementation of

ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can

be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf

APREPRINT - NOV EM BER 18, 2021

60 MRI scans). All T2-FLAIR MRI scans have the same dimension of

256 ×256 ×35

pixels where each voxel is

3.69 mm3

. For more details on this dataset (e.g., data acquisition protocol parameters, ground truth creation, etc.),

please see [21] and data-share page2.

For the WMH Segmentation Challenge

(hereinafter referred to as Challenge dataset), it contains data from three

different institutions (i.e., Singapore, GE3T, and Utrecht) where each institution has 20 patients (the total is 60 MRI

scans). MRI scans from Singapore, GE3T, and Utrecht have dimension of

232×256×48

132×256×83

240×240×48

pixels where each voxel is

3mm3

3.51 mm3

2.75 mm3

respectively. The Challenge dataset is used for testing the

robustness of tested models when segmenting WMHs in brain MRI from different dataset. Data acquisition protocol

parameters for the Challenge datasets can be found in [19].

To further highlight differences between each dataset, we created histograms which shows distributions of volumes of

WMHs clusters in every slices and distributions of WMHs’ intensities for each dataset in Fig. 3. We can see that the

Challenge dataset has WMHs with higher intensities and larger WMHs clusters in every slices than the ADNI dataset

(see Fig. 3c and Fig. 3a respectively). Furthermore, if the Challenge dataset divided into institutions that make it up,

then we can see that each institution has its own distributions of WMHs intensity and volumes of WMHs clusters in

every slices (see Fig. 3d and Fig. 3b respectively). These kinds of difference have been found to be be affecting the

performance of deep learning model [21, 22].

All T2-FLAIR brain MRI scans were pre-processed and augmented before used in training process of U-Net based

models. Firstly, Bias Field Correction (BFC) [

] was used to normalize frequency on MRI scans. Secondly, skull

strip method namely Brain Extraction Tool (BET) [

] was used to extract brain tissues from skull. Lastly, each slice

from all MRI scans was normalized by using Z-Score Normalization. MRI scans in the Challenge dataset that are from

different institutions have different dimensions, so zero padding was done on the edges of every slices before used in

training process. The ﬁnal dimension is

240 ×256

. For data augmentation, horizontal ﬂip and rotation were used with

probabilities of 0.5 and 0.8 respectively by using Albumentations [25].

Table 1: Evaluation on

-fold cross validation using ADNI dataset (left) and Challenge dataset (right) in Dice similarity

coefﬁcient (DSC), mean square error (MSE), and Bland-Altman plot’s criteria (i.e., mean of volume error (MVE) and

lower and upper limits of agreement (LoA)). Higher DSC value is better (

↑

), lower MSE value is better (

↓

), and closer

to 0 is better for MVE and lower/upper LoA (

→0

). The best result for each column is shown in bold and the second

best is underlined.

Model

ADNI Dataset Challenge Dataset

DSC (std) ↑MSE (std) [ml] ↓MVE Lower / Upper DSC (std) ↑MSE (std) [ml] ↓MVE Lower / Upper

[ml] (→0)LoA [ml] (→0)[ml] (→0)LoA [ml] (→0)

U-Net [8] 0.5332 (0.118) 11.6746 (14.465) -0.8423 -7.3873 / 5.7028 0.6150 (0.200) 80.9358 (142.906) -4.1343 -19.9273 / 11.6586

Attention U-Net [10] 0.5040 (0.120) 21.6698 (27.763) -1.6193 -10.2456 / 7.0070 0.6341 (0.183) 59.0076 (97.101) -4.0066 -16.9608 / 8.9476

U-Net++ [13] 0.5469 (0.123) 10.8643 (14.960) -0.4957 -6.9365 / 5.9451 0.6414 (0.179) 57.0659 (121.059) -2.6014 -16.6193 / 11.4166

Attention U-Net++ 0.5333 (0.143) 13.3631 (29.532) 0.6475 -6.4637 / 7.7586 0.6370 (0.181) 59.6467 (132.204) -2.5966 -16.9730 / 11.7799

U-Net++ w/ DS [13] 0.4479 (0.138) 30.2769 (36.275) -3.2570 -12.0229 / 5.5088 0.6302 (0.168) 65.4240 (131.189) -3.0973 -17.8660 / 11.6714

Attention U-Net++ w/ DS 0.4338 (0.167) 13.2083 (12.342) -2.1904 -7.9225 / 3.5417 0.5993 (0.207) 66.7030 (102.184) -4.3073 -18.0226 / 9.4080

Probabilistic U-Net [14] 0.5597 (0.115) 5.8978 (13.682) 0.0466 -4.7526 / 4.8458 0.6831 (0.164) 48.0402 (142.490) 0.9380 -12.6356 / 14.5116

3.2 Experimental Setup

All methods that have been previously introduced in Section 2, which are U-Net (baseline model), Attention U-Net,

U-Net++, Attention U-Net++, and Probabilistic U-Net, were used in this study. However, Attention U-Net++ in this

study is not the same as the Attention U-Net++ proposed in [

] where down-sampling signals from ensemble block

to decoder block (red arrows in Fig. 1d) are not used in this study’s Attention U-Net++. In other words, this study’s

Attention U-Net++ is constructed by adding attention gate only to the U-Net++.

For all experiments, we used Adam optimizer with learning rate 0.001, 16 for the batch size, Focal Loss (FL) for cost

function [

], and 50 epochs for training each model. FL was introduced as a solution for extremely imbalance data

between foreground and background (e.g., 1:1000). Due to the nature of WMHs, FL is more effective than Cross

Entropy (CE) for segmenting small clusters of WMHs. Note that, in this study, FL was chosen over CE based on

preliminary experiments where FL outperformed CE. FL is deﬁned as:

pt =pif y=1

1−potherwise (1)

FL(pt) = −αt(1 −pt)γlog(pt)(2)

2https://datashare.ed.ac.uk/handle/10283/2214

3https://wmh.isi.uu.nl/

APREPRINT - NOV EM BER 18, 2021

where

is predicted result and

is the ground truth. Parameters

and

have values from 0 to 1 and 0 to 5 respectively.

γ= 0

, then FL is equivalent to CE. Whereas,

is an array of weights used for balancing the data. Based on our

preliminary experiments, we found that FL with

γ= 1.0

and

α= 0.5

performed best on deterministic models while

γ= 0.25

and

α= 0.5

performed best on the probabilistic model. For the Probabilistic U-Net, an additional Adam

optimizer was used for optimizing the Prior Net and Posterior Net.

In this study, we performed two different experiments namely

-fold cross validation and cross dataset experiments.

-fold cross validation experiment was performed to evaluate the performances of all tested models where training

and testing sets are from the same dataset. On the other hand, cross dataset experiment was performed to evaluate the

performances of all tested models on segmenting WMHs in T2-FLAIR brain MRI scans from different dataset. In the

-fold cross validation experiment, we performed patient level cross validation with

k= 2

(i.e., 10 patients are used for

both training and testing in each fold for the ADNI dataset and 30 patients are used for both training and testing for the

Challenge dataset). On the other hand, in the cross dataset experiment, all T2-FLAIR brain MRI scans of a dataset are

used for training while all T2-FLAIR brain MRI scans from the other dataset are used for testing.

3.3 Evaluation Measurements

To evaluate the performance of models tested in this study, Dice Similarity Coefﬁcient (DSC), Mean Square Error

(MSE), and Bland-Altman [

] criteria were used. DSC is used to measure spatial similarity between the ground truth

and predicted segmentation. On the other hand, MSE is used for calculating errors between the true volume of WMHs

and predicted volume of WMHs. Whereas, Bland-Altman criteria and plot are used to evaluate the agreement/reliability

in predicting the volumes of WMHs and commonly used in clinical setting. The Bland-Altman criteria are the mean

volumetric difference between the true volume of WMHs and predicted volume of WMhs (hereinafter referred to

as mean volume error (MVE)) and lower and upper limit of agreements (LoA). Lower/upper LoA can be calculated

by using the following equation: MVE

1.96

standard deviation (std) of MVE. For the Probabilistic U-Net, the

ﬁnal predicted segmentation of WMHs is an average of 30 variations of predicted segmentation (i.e., by sampling 30

different zfrom the Prior Net).

Figure 4: Visualisations of WMHs ground truth (GT) and predicted WMHs segmentation by tested models from

-fold cross validation experiment (above) and cross dataset experiment (bellow) after binarisation. Red regions are

true/predicted WMHs. Volume of WMHs in the particular slice and Dice Similarity Coefﬁcient (DSC) are written at the

bottom left side.

4 Results and Discussions

4.1 K-Fold Cross Validation Experiment

Table 1 shows the results for

-fold cross validation experiment for both datasets used in this study (i.e., ADNI dataset

(left) and Challenge dataset (right)). From the table, it is clear that Probabilistic U-Net produced the best results in DSC,

MSE, and Bland-Altman criteria evaluation measurements in both datasets. However, the Probabilistic U-Net did not

performed best for the Upper LoA evaluation measurement.

Also based on the Table 1, adding attention gate did not improve the performances of U-Net, U-Net++, and U-Net++

with DS in general. However, it is worth mentioning that attention gate improved the performance of U-Net in DSC

and MSE measurements in the Challenge dataset. does not yield a higher DSC from baseline model (e.g., U-Net++ to

APREPRINT - NOV EM BER 18, 2021

Figure 5: Bland-Altman plots produced by U-Net, U-Net++, and Probabilistic U-Net in the

-fold cross validation

experiment tested on ADNI dataset (left) and Challenge dataset (right). Yellow lines are Upper limit of agreement

(LoA), green lines are mean of volume error (MVE), and red lines are Lower LoA. The closer the lines are to the value

0 on the y-axis, the better.

Attention U-Net++). Furthermore, DS module did not improve the performances of U-Net++ and Attention U-Net++

models in almost all evaluation measurements. These ﬁndings indicate that attention gate and DS module are not very

effective for automatic WMHs segmentation.

Qualitative/visual assessment can be seen in Fig. 4 (upper side is for the

-fold cross validation experiment while the

lower side is for the cross dataset experiment), and DSC measurements produced by the tested models are also shown.

Bland-Altman criteria for U-Net, U-Net++, and Probabilistic U-Net in both datasets listed in Table 1 are plotted as

plots shown in Fig. 5. The green line is for the MVE while red and orange lines are for the lower LoA and upper

LoA respectively. Each dots represent each patient in the ADNI dataset. The red line is lower LoA, the green is mean

(between ground truth and prediction), and the yellow line is upper LoA. The closer the lines are to the value 0 on the

-axis, the better. Thus, we can see that the Probability U-Net is better than the U-Net and the U-Net++ on estimating

the volumes of WMHs in

-fold cross validation experiment in both datasets. However, it is worth mentioning that

there are still some outliers in the estimation (i.e., outside the interval of lower and upper LoAs).

Table 2: Evaluation on cross dataset experiment using ADNI dataset for training and Challenge dataset for testing in

Dice similarity coefﬁcient (DSC), mean square error (MSE), and Bland-Altman plot’s criteria (i.e., mean of volume

error (MVE)). Higher DSC value is better (

↑

), lower MSE value is better (

↓

), and closer to 0 is better for MVE (

→0

The best result for each column is shown in bold and the second best is underlined.

Model DSC (std) ↑Average DSC MSE [ml] ↓MVE (std)

Singapore GE3T Utrecht (std) ↑[ml] (→0)

U-Net [8] 0.6459 (0.194) 0.6368 (0.128) 0.5964 (0.208) 0.6264 (0.178) 57.6554 (40.698) 0.1454 (7.140)

Attention U-Net [10] 0.6567 (0.185) 0.6283 (0.123) 0.5798 (0.210) 0.6216 (0.176) 64.6727 (43.113) -1.2613 (7.065)

U-Net++ [13] 0.6584 (0.183) 0.6159 (0.120) 0.5876 (0.208) 0.6206 (0.174) 74.8881 (44.480) -1.7248 (7.202)

Attention U-Net++ 0.6653 (0.178) 0.6551 (0.092) 0.5592 (0.207) 0.6265 (0.170) 85.5528 (52.614) -2.8858 (7.607)

U-Net++ w/ DS [13] 0.6520 (0.171) 0.6508 (0.101) 0.5637 (0.224) 0.6222 (0.175) 79.7682 (65.307) -0.4982 (7.636)

Attention U-Net++ w/ DS 0.6273 (0.199) 0.5596 (0.136) 0.5467 (0.224) 0.5779 (0.190) 82.9248 (39.542) -2.9455 (7.479)

Probabilistic U-Net [14] 0.6430 (0.170) 0.6934 (0.086) 0.5641 (0.200) 0.6335 (0.166) 114.5996 (107.905) 4.2042 (8.486)

Table 3: Evaluation on cross dataset experiment using Challenge dataset for training and ADNI dataset for testing in

Dice similarity coefﬁcient (DSC), mean square error (MSE), and Bland-Altman plot’s criteria (i.e., mean of volume

error (MVE) and lower/upper limits of agreement (LoA)). Higher DSC is better (

↑

), lower MSE is better (

↓

), and closer

to 0 is better for MVE and lower/upper LoA (

→0

). The best result for each column is shown in bold and the second

best is underlined.

Model DSC ↑MSE [ml] ↓MVE Lower / Upper

[ml] (→0)LoA [ml] (→0)

U-Net [8] 0.5346 (0.164) 16.9374 (34.075) 2.1448 -4.7976 / 9.0873

Attention U-Net [10] 0.4999 (0.156) 24.4821 (56.104) 0.7700 -8.8906 / 10.4307

U-Net++ [13] 0.5285 (0.158) 17.1473 (30.529) 1.2444 -6.5620 / 9.0508

Attention U-Net++ 0.5021 (0.164) 20.3446 (39.623) 1.7858 -6.4009 / 9.9725

U-Net++ w/ DS [13] 0.4616 (0.179) 21.9043 (37.378) 1.7875 -6.7618 / 10.3369

Attention U-Net++ w/ DS 0.4885 (0.186) 18.7165 (35.286) 2.1043 -5.3670 / 9.5756

Probabilistic U-Net [14] 0.4809 (0.187) 25.1539 (44.206) 2.7703 -5.4933 / 11.0339

4.2 Cross Dataset Experiment

Table 2 shows the results for cross dataset experiment where ADNI dataset was used for training while Challenge

dataset was used for testing. Whereas, Table 3 shows the results for cross dataset experiment where Challenge dataset

was used for training while ADNI dataset was used for testing.

APREPRINT - NOV EM BER 18, 2021

Figure 6: Ambiguity maps of the same slice from

-fold cross validation experiment (middle) and cross dataset

experiment (right), which are calculated by using Cross Entropy (CE) between the mean sample and all samples

(γ(s) = E[CE(¯s,s)]).

From Table 2, we can see that the Probabilistic U-Net performed the best in the average DSC and DSC for GE3T

sub-dataset. However, the Probabilistic U-Net failed to produce the best results for other evaluation measurements.

We can see that the best results for each evaluation measurement produced by different models, and, speciﬁcally, the

original U-Net produced the best results for most of them, which are in DSC for Utrecht sub-dataset, MSE, and MVE.

From Table 3, we can see that the Probabilistic U-Net failed to produce the best results in every evaluation measurements.

Instead, the original U-Net successfully produced the best results in almost all evaluation measurements except for the

MSE. It is also worth mentioning that U-Net++ is the second best performer in almost all evaluation measurements.

4.3 The Performance of Probabilistic U-Net

From the cross dataset experiment, we found that the Probabilistic U-Net performed worse than the other models

especially when data from different institutions are put together (i.e., the Challenge dataset) and used for training

process. We hypothesise that the Probabilistic U-Net captures different uncertainties/ambiguities when trained in

-fold

cross validation and cross dataset experiment. To prove this, we created ambiguity maps for each experiment (Fig. 6).

Ambiguity map can be created by generating variances of predicted segmentation using Probabilistic U-Net and then

calculating the CE between the mean predicted segmentation and all variances of predicted segmentation (see Fig. 6).

From the second column of Fig. 6, we can see that ambiguity map produced by Probabilistic U-Net in the

-fold cross

validation experiment has high uncertainties in normal tissues of the brain that appear like or have similar textures to

the WMHs. In contrast, from the third column of Fig. 6, we can see that ambiguity map produced by Probabilistic

U-Net in the cross dataset experiment has high uncertainties around the borders of WMHs. This shows that the

Probabilistic U-Net captured ambiguity of WMHs’ manual labels produced by different raters, like in the original study

of Probabilistic U-Net [

], in the cross dataset experiment. We believe this happened due to different distributions of

WMHs intensity and volumes of WMHs clusters in every slices between ADNI dataset and Challenge dataset as seen in

Fig. 3. Also, note that each institution in the Challenge dataset has different distributions as well which makes a lot of

uncertainties in the dataset.

5 Conclusion and Future Work

In this study, we investigated the robustness of different deep learning models based on U-Net for automatic segmentation

of WMHs in different datasets of brain MRI. We also investigated the robustness of Probabilistic U-Net (i.e., a

probabilistic model) and compared its performance to the original U-Net, Attention U-Net, U-Net++, Attention U-

Net++, and their variances (i.e., deterministic models). It is worth mentioning that, all models were tested by using their

best hyper-parameters found in the preliminary experiments.

Based on

-fold cross validation experiment, Probabilistic U-Net outperformed all other tested models in all evaluation

measurements (i.e., DSC, MSE, and Bland-Altman criteria/plot). However, we found that Probabilistic U-Net was

outperformed by the original U-Net in cross dataset experiment in some evaluation measurements, especially when the

Challenge dataset was used for training.

Based on the ambiguity maps produced by the Probabilistic U-Net, we found that it captures different types of

uncertainty in different experiments. In the

-fold cross validation experiment, uncertainties between WMHs and

non-WMHs were captured by the Probabilistic U-Net. On the other hand, uncertainties that are concentrated around the

borders of WMHs were captured in the cross dataset experiment. Thus, in the future, we would like to ﬁnd a way to

improve the robustness of Probabilistic U-Net in different dataset.

APREPRINT - NOV EM BER 18, 2021

Acknowledgment

We gratefully acknowledge the support from Tokopedia-UI AI Center, Faculty of Computer Science, University of

Indonesia, for the NVIDIA DGX-1 that we used for running the experiments. MFR is with the Special Postdoctoral

Researchers Program, RIKEN.

Data collection and sharing for this project was partially funded by the Alzheimer’s Disease Neuroimaging Initiative

(ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number

W81XWH-12-2-0012). The grantee organization is the Northern California Institute for Research and Education, and

the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California.

ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

References

[1]

Joanna M. Wardlaw, Francesca M. Chappell, Maria del Carmen Valdés Hernández, Stephen D.J. Makin, Julie

Staals, Kirsten Shuler, Michael J. Thrippleton, Paul A. Armitage, Susana Muñoz-Maniega, Anna K. Heye, Eleni

Sakka, and Martin S. Dennis. White matter hyperintensity reduction and outcomes after minor stroke. Neurology,

89(10):1003–1010, 2017.

[2]

Muhammad Febrian Rachmadi, Maria del C. Valdés-Hernández, Stephen Makin, Joanna Wardlaw, and Taku

Komura. Automatic spatial estimation of white matter hyperintensities evolution in brain mri using disease

evolution predictor deep neural networks. Medical Image Analysis, 63:101712, 2020.

[3]

Karen Misquitta, Mahsa Dadar, D. Louis Collins, and Maria Carmela Tartaglia. White matter hyperintensities

and neuropsychiatric symptoms in mild cognitive impairment and alzheimer’s disease. NeuroImage: Clinical,

28:102367, 2020.

[4]

Ramya Balakrishnan, Maria del C. Valdés Hernández, and Andrew J. Farrall. Automatic segmentation of white

matter hyperintensities from brain magnetic resonance images in the era of deep learning and big data – a

systematic review. Computerized Medical Imaging and Graphics, 88:101867, 2021.

[5]

Pauline Maillard, Evan Fletcher, Danielle Harvey, Owen Carmichael, Bruce Reed, Dan Mungas, and Charles

DeCarli. White matter hyperintensity penumbra. Stroke, 42(7):1917–1922, 2011.

[6]

Maria del C Valdés Hernández, Karen J Ferguson, Francesca M Chappell, and Joanna M Wardlaw. New

multispectral mri data fusion technique for white matter lesion segmentation: method and comparison with

thresholding in ﬂair images. European radiology, 20(7):1684–1691, 2010.

[7]

Muhammad Febrian Rachmadi, Maria del C Valdés-Hernández, Hongwei Li, Ricardo Guerrero, Rozanna Mei-

jboom, Stewart Wiseman, Adam Waldman, Jianguo Zhang, Daniel Rueckert, Joanna Wardlaw, et al. Limited

one-time sampling irregularity map (lots-im) for automatic unsupervised assessment of white matter hyperintensi-

ties and multiple sclerosis lesions in structural brain magnetic resonance images. Computerized Medical Imaging

and Graphics, 79:101685, 2020.

[8]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image

segmentation. CoRR, abs/1505.04597, 2015.

[9]

Jiong Wu, Yue Zhang, Kai Wang, and Xiaoying Tang. Skip connection u-net for white matter hyperintensities

segmentation from mri. IEEE Access, 7:155194–155202, 2019.

[10]

Ozan Oktay, Jo Schlemper, Loïc Le Folgoc, Matthew C. H. Lee, Mattias P. Heinrich, Kazunari Misawa, Kensaku

Mori, Steven G. McDonagh, Nils Y. Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. Attention

u-net: Learning where to look for the pancreas. CoRR, abs/1804.03999, 2018.

[11]

Chen Li, Yusong Tan, Wei Chen, Xin Luo, Yuanming Gao, Xiaogang Jia, and Zhiying Wang. Attention unet++: A

nested attention-aware u-net for liver ct image segmentation. In 2020 IEEE International Conference on Image

Processing (ICIP), pages 345–349, 2020.

[12]

Yunhee Jeong, Muhammad Febrian Rachmadi, Maria del C Valdés-Hernández, and Taku Komura. Dilated saliency

u-net for white matter hyperintensities segmentation using irregularity age map. Frontiers in aging neuroscience,

11:150, 2019.

[13]

Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang. Unet++: Redesigning

skip connections to exploit multiscale features in image segmentation. IEEE Transactions on Medical Imaging,

2019.

APREPRINT - NOV EM BER 18, 2021

[14]

Simon AA Kohl, Bernardino Romera-Paredes, Clemens Meyer, Jeffrey De Fauw, Joseph R Ledsam, Klaus H

Maier-Hein, SM Eslami, Danilo Jimenez Rezende, and Olaf Ronneberger. A probabilistic u-net for segmentation

of ambiguous images. arXiv preprint arXiv:1806.05034, 2018.

[15]

Samuel G Armato III, Geoffrey McLennan, Luc Bidaut, Michael F McNitt-Gray, Charles R Meyer, Anthony P

Reeves, Binsheng Zhao, Denise R Aberle, Claudia I Henschke, Eric A Hoffman, et al. The lung image database

consortium (lidc) and image database resource initiative (idri): a completed reference database of lung nodules on

ct scans. Medical physics, 38(2):915–931, 2011.

[16]

Konstantinos Kamnitsas, Wenjia Bai, Enzo Ferrante, Steven G. McDonagh, Matthew Sinclair, Nick Pawlowski,

Martin Rajchl, Matthew C. H. Lee, Bernhard Kainz, Daniel Rueckert, and Ben Glocker. Ensembles of multiple

models and architectures for robust brain tumour segmentation. CoRR, abs/1711.01468, 2017.

[17]

Naoya Furuhashi, Shiho Okuhata, and Tetsuo Kobayashi. A robust and accurate deep-learning-based method

for the segmentation of subcortical brain: Cross-dataset evaluation of generalization performance. Magnetic

Resonance in Medical Sciences, 20(2):166–174, 2021.

[18]

Susanne G Mueller, Michael W Weiner, Leon J Thal, Ronald C Petersen, Clifford Jack, William Jagust, John Q

Trojanowski, Arthur W Toga, and Laurel Beckett. The alzheimer’s disease neuroimaging initiative. Neuroimaging

Clinics of North America, 15(4):869, 2005.

[19]

Hugo J. Kuijf, J. Matthijs Biesbroek, Jeroen De Bresser, Rutger Heinen, Simon Andermatt, Mariana Bento, Matt

Berseth, Mikhail Belyaev, M. Jorge Cardoso, Adrià Casamitjana, D. Louis Collins, Mahsa Dadar, Achilleas

Georgiou, Mohsen Ghafoorian, Dakai Jin, April Khademi, Jesse Knight, Hongwei Li, Xavier Lladó, Miguel

Luna, Qaiser Mahmood, Richard McKinley, Alireza Mehrtash, Sébastien Ourselin, Bo-Yong Park, Hyunjin Park,

Sang Hyun Park, Simon Pezold, Elodie Puybareau, Leticia Rittner, Carole H. Sudre, Sergi Valverde, Verónica

Vilaplana, Roland Wiest, Yongchao Xu, Ziyue Xu, Guodong Zeng, Jianguo Zhang, Guoyan Zheng, Christopher

Chen, Wiesje van der Flier, Frederik Barkhof, Max A. Viergever, and Geert Jan Biessels. Standardized assessment

of automatic segmentation of white matter hyperintensities and results of the wmh segmentation challenge. IEEE

Transactions on Medical Imaging, 38(11):2556–2568, 2019.

[20]

Maria del C Valdés-Hernández. Reference segmentations of white matter hyperintensities from a subset of 20

subjects scanned three consecutive years, Dec 2016.

[21]

Muhammad Febrian Rachmadi, Maria del C Valdés-Hernández, Maria Leonora Fatimah Agan, and Taku Komura.

Deep learning vs. conventional machine learning: Pilot study of wmh segmentation in brain mri with absence or

mild vascular pathology. Journal of Imaging, 3(4):66, 2017.

[22]

Muhammad Febrian Rachmadi, Maria del C Valdes-Hernandez, Maria Leonora Fatimah Agan, Carol Di Perri,

Taku Komura, Alzheimer’s Disease Neuroimaging Initiative, et al. Segmentation of white matter hyperintensities

using convolutional neural networks with global spatial information in routine clinical brain mri with none or mild

vascular pathology. Computerized Medical Imaging and Graphics, 66:28–43, 2018.

[23]

J.G. Sled, A.P. Zijdenbos, and A.C. Evans. A nonparametric method for automatic correction of intensity

nonuniformity in mri data. IEEE Transactions on Medical Imaging, 17(1):87–97, 1998.

[24]

Mark Jenkinson, Mickael Pechaud, Stephen Smith, et al. Bet2: Mr-based estimation of brain, skull and scalp

surfaces. In Eleventh annual meeting of the organization for human brain mapping, volume 17, page 167. Toronto.,

2005.

[25]

Alexander Buslaev, Vladimir I. Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin, and Alexandr A.

Kalinin. Albumentations: Fast and ﬂexible image augmentations. Information, 11(2), 2020.

[26]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection,

2018.

[27]

J. Martin Bland and DouglasG. Altman. Statistical methods for assessing agreement between two methods of

clinical measurement. The Lancet, 327(8476):307–310, 1986. Originally published as Volume 1, Issue 8476.

ResearchGate has not been able to resolve any citations for this publication.

Automatic segmentation of white matter hyperintensities from brain magnetic resonance images in the era of deep learning and big data – A systematic review

Article

Full-text available

Jan 2021
COMPUT MED IMAG GRAP

Background White matter hyperintensities (WMH), of presumed vascular origin, are visible and quantifiable neuroradiological markers of brain parenchymal change. These changes may range from damage secondary to inflammation and other neurological conditions, through to healthy ageing. Fully automatic WMH quantification methods are promising, but still, traditional semi-automatic methods seem to be preferred in clinical research. We systematically reviewed the literature for fully automatic methods developed in the last five years, to assess what are considered state-of-the-art techniques, as well as trends in the analysis of WMH of presumed vascular origin. Method We registered the systematic review protocol with the International Prospective Register of Systematic Reviews (PROSPERO), registration number - CRD42019132200. We conducted the search for fully automatic methods developed from 2015 to July 2020 on Medline, Science direct, IEE Explore, and Web of Science. We assessed risk of bias and applicability of the studies using QUADAS 2. Results The search yielded 2327 papers after removing 104 duplicates. After screening titles, abstracts and full text, 37 were selected for detailed analysis. Of these, 16 proposed a supervised segmentation method, 10 proposed an unsupervised segmentation method, and 11 proposed a deep learning segmentation method. Average DSC values ranged from 0.538 to 0.91, being the highest value obtained from an unsupervised segmentation method. Only four studies validated their method in longitudinal samples, and eight performed an additional validation using clinical parameters. Only 8/37 studies made available their methods in public repositories. Conclusions We found no evidence that favours deep learning methods over the more established k-NN, linear regression and unsupervised methods in this task. Data and code availability, bias in study design and ground truth generation influence the wider validation and applicability of these methods in clinical research.

White matter hyperintensities and neuropsychiatric symptoms in mild cognitive impairment and Alzheimer’s disease

Article

Full-text available

Jul 2020

Neuropsychiatric symptoms (NPS), such as apathy, irritability and depression, are frequently encountered in patients with Alzheimer’s disease (AD). Focal grey matter atrophy has been linked to NPS development. Cerebrovascular disease is common among AD patients and can be detected on MRI as white matter hyperintensities (WMH). In this longitudinal study, the relative contribution of WMH burden and GM atrophy to NPS was evaluated in a cohort of mild cognitive impairment (MCI), AD and normal controls. This study included 121 AD, 315 MCI and 225 normal control subjects from the Alzheimer’s Disease Neuroimaging Initiative. NPS were assessed using the Neuropsychiatric Inventory and grouped into hyperactivity, psychosis, affective and apathy subsyndromes. WMH were measured using an automatic segmentation technique and mean deformation-based morphometry (DBM) was used to measure atrophy of grey matter regions. Linear mixed-effects models found focal grey matter atrophy and WMH volume both contributed significantly to NPS subsyndromes in MCI and AD subjects, however, WMH burden played a greater role. This study could provide a better understanding of the pathophysiology of NPS in AD and support the monitoring and control of vascular risk factors.

A Robust and Accurate Deep-Learning-based Method for the Segmentation of Subcortical Brain: Cross-dataset Evaluation of Generalization Performance

Article

Full-text available

May 2020

Purpose: To analyze subcortical brain volume more reliably, we propose a deep learning segmentation method of subcortical brain based on magnetic resonance imaging (MRI) having high generalization performance, accuracy, and robustness. Methods: First, local images of three-dimensional (3D) bounding boxes were extracted for seven subcortical structures (thalamus, putamen, caudate, pallidum, hippocampus, amygdala, and accumbens) from a whole brain MR image as inputs to the neural network. Second, dilated convolution layers, which input information of variable scope, were introduced to the blocks that make up the neural network. These blocks were connected in parallel to simultaneously process global and local information obtained by the dilated convolution layers. To evaluate generalization performance, different datasets were used for training and testing sessions (cross-dataset evaluation) because subcortical brain segmentation in clinical analysis is assumed to be applied to unknown datasets. Results: The proposed method showed better generalization performance that can obtain stable accuracy for all structures, whereas the state-of-the-art deep learning method obtained extremely low accuracy for some structures. The proposed method performed segmentation for all samples without failing with significantly higher accuracy (P < 0.005) than conventional methods such as 3D U-Net, FreeSurfer, and Functional Magnetic Resonance Imaging of the Brain’s (FMRIB’s) Integrated Registration and Segmentation Tool in the FMRIB Software Library (FSL-FIRST). Moreover, when applying this proposed method to larger datasets, segmentation was robustly performed for all samples without producing segmentation results on the areas that were apparently different from anatomically relevant areas. On the other hand, FSL-FIRST produced segmentation results on the area that were apparently and largely different from the anatomically relevant area for about one-third to one-fourth of the datasets. Conclusion: The cross-dataset evaluation showed that the proposed method is superior to existing methods in terms of generalization performance, accuracy, and robustness.

Automatic Spatial Estimation of White Matter Hyperintensities Evolution in Brain MRI using Disease Evolution Predictor Deep Neural Networks

Article

Full-text available

Apr 2020
MED IMAGE ANAL

Previous studies have indicated that white matter hyperintensities (WMH), the main radiological feature of small vessel disease, may evolve (i.e., shrink, grow) or stay stable over a period of time. Predicting these changes are challenging because it involves some unknown clinical risk factors that leads to a non-deterministic prediction task. In this study, we propose a deep learning model to predict the evolution of WMH from baseline to follow-up (i.e., 1-year later), namely “Disease Evolution Predictor” (DEP) model, which can be adjusted to become a non-deterministic model. The DEP model receives a baseline image as input and produces a map called “Disease Evolution Map” (DEM), which represents the evolution of WMH from baseline to follow-up. Two DEP models are proposed, namely DEP-UResNet and DEP-GAN, which are representatives of the supervised (i.e., need expert-generated manual labels to generate the output) and unsupervised (i.e., do not require manual labels produced by experts) deep learning algorithms respectively. To simulate the non-deterministic and unknown parameters involved in WMH evolution, we modulate a Gaussian noise array to the DEP model as auxiliary input. This forces the DEP model to imitate a wider spectrum of alternatives in the prediction results. The alternatives of using other types of auxiliary input instead, such as baseline WMH and stroke lesion loads are also proposed and tested. Based on our experiments, the fully supervised machine learning scheme DEP-UResNet regularly performed better than the DEP-GAN which works in principle without using any expert-generated label (i.e., unsupervised). However, a semi-supervised DEP-GAN model, which uses probability maps produced by a supervised segmentation method in the learning process, yielded similar performances to the DEP-UResNet and performed best in the clinical evaluation. Furthermore, an ablation study showed that an auxiliary input, especially the Gaussian noise, improved the performance of DEP models compared to DEP models that lacked the auxiliary input regardless of the model’s architecture. To the best of our knowledge, this is the first extensive study on modelling WMH evolution using deep learning algorithms, which deals with the non-deterministic nature of WMH evolution.

Albumentations: Fast and Flexible Image Augmentations

Article

Full-text available

Feb 2020

Data augmentation is a commonly used technique for increasing both the size and the diversity of labeled training sets by leveraging input transformations that preserve corresponding output labels. In computer vision, image augmentations have become a common implicit regularization technique to combat overfitting in deep learning models and are ubiquitously used to improve performance. While most deep learning frameworks implement basic image transformations, the list is typically limited to some variations of flipping, rotating, scaling, and cropping. Moreover, image processing speed varies in existing image augmentation libraries. We present Albumentations, a fast and flexible open source library for image augmentation with many various image transform operations available that is also an easy-to-use wrapper around other augmentation libraries. We discuss the design principles that drove the implementation of Albumentations and give an overview of the key features and distinct capabilities. Finally, we provide examples of image augmentations for different computer vision tasks and demonstrate that Albumentations is faster than other commonly used image augmentation tools on most image transform operations.

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

Article

Full-text available

Dec 2019

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision ; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects-an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

Skip Connection U-Net for White Matter Hyperintensities Segmentation From MRI

Article

Full-text available

Oct 2019

White matter hyperintensity (WMH) is associated with various aging and neurodegenerative diseases. In this paper, we proposed and validated a fully automatic system which integrates classical image processing and deep neural network for segmenting WHM from fluid attenuation inversion recovery (FLAIR) and T1 magnetic resonance (MR) images. In this system, a novel skip connection U-net (SC Unet) was proposed. In addition, an atlas-based method was introduced in the preprocessing stage to remove non-brain tissues (namely skull-stripping) and thus to improve the segmentation accuracy. Effectiveness of the proposed system was validated on a dataset of 60 paired images based on cross-scanner validation. Our experimental results revealed the effectiveness of the skull-stripping strategy. More importantly, compared to two existing state-of-the-art methods for segmenting WHM, including a U-net-like method and another deep learning method, the proposed SC U-net had a faster convergence, a lower loss and a higher segmentation accuracy. Both quantitative and qualitative analyses (via visual examinations) revealed the superior performance of our proposed SC U-net. The mean dice score of the proposed SC U-net was 78.36% which was much higher than those of a U-net-like method (74.99%) and an alternative deep learning method (74.80%). The software environment and model of the proposed system were made publicly accessible at Dockerhub.

Dilated Saliency U-Net for White Matter Hyperintensities Segmentation Using Irregularity Age Map

Article

Full-text available

Jun 2019

White matter hyperintensities (WMH) appear as regions of abnormally high signal intensity on T2-weighted magnetic resonance image (MRI) sequences. In particular, WMH have been noteworthy in age-related neuroscience for being a crucial biomarker for all types of dementia and brain aging processes. The automatic WMH segmentation is challenging because of their variable intensity range, size and shape. U-Net tackles this problem through the dense prediction and has shown competitive performances not only on WMH segmentation/detection but also on varied image segmentation tasks. However, its network architecture is high complex. In this study, we propose the use of Saliency U-Net and Irregularity map (IAM) to decrease the U-Net architectural complexity without performance loss. We trained Saliency U-Net using both: a T2-FLAIR MRI sequence and its correspondent IAM. Since IAM guides locating image intensity irregularities, in which WMH are possibly included, in the MRI slice, Saliency U-Net performs better than the original U-Net trained only using T2-FLAIR. The best performance was achieved with fewer parameters and shorter training time. Moreover, the application of dilated convolution enhanced Saliency U-Net by recognizing the shape of large WMH more accurately through multi-context learning. This network named Dilated Saliency U-Net improved Dice coefficient score to 0.5588 which was the best score among our experimental models, and recorded a relatively good sensitivity of 0.4747 with the shortest training time and the least number of parameters. In conclusion, based on our experimental results, incorporating IAM through Dilated Saliency U-Net resulted an appropriate approach for WMH segmentation.

Standardized Assessment of Automatic Segmentation of White Matter Hyperintensities; Results of the WMH Segmentation Challenge

Article

Full-text available

Mar 2019

Quantification of cerebral white matter hyperintensities (WMH) of presumed vascular origin is of key importance in many neurological research studies. Currently, measurements are often still obtained from manual segmentations on brain MR images, which is a laborious procedure. Automatic WMH segmentation methods exist, but a standardized comparison of the performance of such methods is lacking. We organized a scientific challenge, in which developers could evaluate their method on a standardized multi-center/-scanner image dataset, giving an objective comparison: the WMH Segmentation Challenge (https://wmh.isi.uu.nl/). Sixty T1+FLAIR images from three MR scanners were released with manual WMH segmentations for training. A test set of 110 images from five MR scanners was used for evaluation. Segmentation methods had to be containerized and submitted to the challenge organizers. Five evaluation metrics were used to rank the methods: (1) Dice similarity coefficient, (2) modified Hausdorff distance (95th percentile), (3) absolute log-transformed volume difference, (4) sensitivity for detecting individual lesions, and (5) F1-score for individual lesions. Additionally, methods were ranked on their inter-scanner robustness. Twenty participants submitted their method for evaluation. This paper provides a detailed analysis of the results. In brief, there is a cluster of four methods that rank significantly better than the other methods, with one clear winner. The inter-scanner robustness ranking shows that not all methods generalize to unseen scanners. The challenge remains open for future submissions and provides a public platform for method evaluation.

Attention Unet++: A Nested Attention-Aware U-Net for Liver CT Image Segmentation

Conference Paper

Oct 2020

Robustness of Probabilistic U-Net for Automated Segmentation of White Matter Hyperintensity in Different Datasets of Brain MRI

Recommended publications

Robust Segmentation of 3D Brain MRI Images in Cross Datasets by Integrating Supervised and Unsupervi...

Robustness of Probabilistic U-Net for Automated Segmentation of White Matter Hyperintensities in Dif...

White Matter Hyperintensities Segmentation Using Probabilistic TransUNet

White Matter Hyperintensities Segmentation Using Probabilistic TransUNet

A new family of instance-level loss functions for improving instance-level segmentation and detectio...