ArticlePDF AvailableLiterature Review

Auto‐segmentation of organs at risk for head and neck radiotherapy planning: From atlas‐based to deep learning methods

Authors:

Abstract and Figures

Radiotherapy (RT) is one of the basic treatment modalities for cancer of the head and neck (H&N), which requires a precise spatial description of the target volumes and organs at risk (OARs) to deliver a highly conformal radiation dose to the tumor cells while sparing the healthy tissues. For this purpose, target volumes and OARs have to be delineated and segmented from medical images. As manual delineation is a tedious and time‐consuming task subjected to intra/interobserver variability, computerized auto‐segmentation has been developed as an alternative. The field of medical imaging and RT planning has experienced an increased interest in the past decade, with new emerging trends that shifted the field of H&N OAR auto‐segmentation from atlas‐based to deep learning‐based approaches. In this review, we systematically analyzed 78 relevant publications on auto‐segmentation of OARs in the H&N region from 2008 to date, and provided critical discussions and recommendations from various perspectives: image modality — both computed tomography and magnetic resonance image modalities are being exploited, but the potential of the latter should be explored more in the future; OAR — the spinal cord, brainstem, and major salivary glands are the most studied OARs, but additional experiments should be conducted for several less studied soft tissue structures; image database — several image databases with the corresponding ground truth are currently available for methodology evaluation, but should be augmented with data from multiple observers and multiple institutions; methodology — current methods have shifted from atlas‐based to deep learning auto‐segmentation, which is expected to become even more sophisticated; ground truth — delineation guidelines should be followed and participation of multiple experts from multiple institutions is recommended; performance metrics — the Dice coefficient as the standard volumetric overlap metrics should be accompanied with at least one distance metrics, and combined with clinical acceptability scores and risk assessments; segmentation performance — the best performing methods achieve clinically acceptable auto‐segmentation for several OARs, however, the dosimetric impact should be also studied to provide clinically relevant endpoints for RT planning.
This content is subject to copyright. Terms and conditions apply.
Auto-segmentation of organs at risk for head and neck radiotherapy
planning: From atlas-based to deep learning methods
Toma
z Vrtovec
a)
and Domen Mo
cnik
Faculty Electrical Engineering, University of Ljubljana, Trza
ska cesta 25, Ljubljana SI-1000, Slovenia
Primo
z Strojan
Institute of Oncology Ljubljana, Zalo
ska cesta 2, Ljubljana SI-1000, Slovenia
Franjo Pernu
s
Faculty Electrical Engineering, University of Ljubljana, Trza
ska cesta 25, Ljubljana SI-1000, Slovenia
Bulat Ibragimov
Faculty Electrical Engineering, University of Ljubljana, Trza
ska cesta 25, Ljubljana SI-1000, Slovenia
Department of Computer Science, University of Copenhagen, Universitetsparken 1, Copenhagen D-2100, Denmark
(Received 26 October 2019; revised 27 May 2020; accepted for publication 29 May 2020;
published 28 July 2020)
Radiotherapy (RT) is one of the basic treatment modalities for cancer of the head and neck (H&N),
which requires a precise spatial description of the target volumes and organs at risk (OARs) to deliver
a highly conformal radiation dose to the tumor cells while sparing the healthy tissues. For this pur-
pose, target volumes and OARs have to be delineated and segmented from medical images. As man-
ual delineation is a tedious and time-consuming task subjected to intra/interobserver variability,
computerized auto-segmentation has been developed as an alternative. The field of medical imaging
and RT planning has experienced an increased interest in the past decade, with new emerging trends
that shifted the field of H&N OAR auto-segmentation from atlas-based to deep learning-based
approaches. In this review, we systematically analyzed 78 relevant publications on auto-segmentation
of OARs in the H&N region from 2008 to date, and provided critical discussions and recommenda-
tions from various perspectives: image modality both computed tomography and magnetic reso-
nance image modalities are being exploited, but the potential of the latter should be explored more in
the future; OAR the spinal cord, brainstem, and major salivary glands are the most studied OARs,
but additional experiments should be conducted for several less studied soft tissue structures; image
database several image databases with the corresponding ground truth are currently available for
methodology evaluation, but should be augmented with data from multiple observers and multiple
institutions; methodology current methods have shifted from atlas-based to deep learning auto-
segmentation, which is expected to become even more sophisticated; ground truth delineation
guidelines should be followed and participation of multiple experts from multiple institutions is rec-
ommended; performance metrics the Dice coefficient as the standard volumetric overlap metrics
should be accompanied with at least one distance metrics, and combined with clinical acceptability
scores and risk assessments; segmentation performance the best performing methods achieve clin-
ically acceptable auto-segmentation for several OARs, however, the dosimetric impact should be also
studied to provide clinically relevant endpoints for RT planning. © 2020 American Association of
Physicists in Medicine [https://doi.org/10.1002/mp.14320]
Key words: auto-segmentation, deep learning, head and neck, organs at risk, radiotherapy planning
1. INTRODUCTION
Cancer in the region of the head and neck (H&N), compris-
ing malignancies of the lips, oral cavity, pharynx, larynx,
nasal cavity and paranasal sinuses, salivary glands, and thy-
roid has a yearly incidence of approximately 1.5 million
worldwide,
1
making it one of the most prominent cancers. In
addition to surgery and chemotherapy, radiotherapy (RT) is
an important treatment modality for the H&N cancer, with an
optimal utilization rate in patients presented with this malig-
nancy of around 80%.
2
The aim of RT is to deliver a high
radiation dose to the targeted cancerous cells to ensure clini-
cally required tumor control probability and, at the same
time, spare the nearby healthy tissues to prevent acute radia-
tion toxicity and serious late complications for the treated
patient. The optimal radiation dose distribution is calculated
in an optimization process using the inverse planning
approach, which requires a precise spatial description of the
target volumes as well as of the organs at risk (OARs). This
knowledge is commonly obtained by trained radiation oncol-
ogists and, in some instances, also other experts from the
field performing manual delineation, or segmentation, of the
target volumes and OARs from the acquired three-dimen-
sional (3D) images of the patient.
Medical image segmentation as the process of partitioning
an image into multiple anatomical structures is, in general, a
e929 Med. Phys. 47 (9), September 2020 0094-2405/2020/47(9)/e929/22 © 2020 American Association of Physicists in Medicine e929
challenging task that is hampered by the high variability of
medical images. The source of variability is commonly repre-
sented by different imaging modalities revealing different charac-
teristics of the human anatomy, for example, conventional
radiographic (x rays), computed tomography (CT), and magnetic
resonance (MR) imaging, various imaging artifacts causing weak
or missing boundaries, for example, noise, intensity inhomogene-
ity, partial volume effect and motion, and variable image appear-
ance of anatomical structures under segmentation, for example,
due to pathological changes or the natural biological variability of
the human anatomy. Nevertheless, image segmentation is impor-
tant from the perspective of analyzing the properties of the
obtained structures, and while manual delineation may still be the
approach of choice, it is a time-consuming and tedious task sub-
jected to intra/interobserver variability.
3
Alternatively, computer-
ized techniques based on medical image processing and analysis
have been developed that replace manual with automated seg-
mentation, or auto-segmentation,
4,5
which eliminates the subjec-
tive bias of the observer, accelerates the whole process and, as a
result, reduces the total workload in terms of human resources.
In the past decade, the field of computerized medical
imaging has experienced an increased interest, with new
emerging trends that are largely focused on deep learning
(DL)
6
as a subset of machine learning that mimics the data
processing of the human brain for the purpose of decision-
making. In comparison to traditional approaches based on
conventional atlases, shape models and feature classification,
DL has shown superior image segmentation performance that
was conveyed by several milestone auto-segmentation frame-
works,
7
for example, the U-Net,
8
3D U-Net,
9
V-Net,
10
Seg-
Net,
11
DeepMedic,
12
DeepLab,
13
VoxResNet
14
and Mask R-
CNN.
15
Several ideas have been adopted for RT,
16,17
includ-
ing for image segmentation and detection, image phenotyp-
ing, radiomic signature discovery, clinical outcome
prediction, image dose quantification, dose-response model-
ing, radiation adaptation, and image generation,
18
and there-
fore also impacted the area of auto-segmentation of OARs in
the H&N region
1921
so as to provide a qualitative support for
guiding critical treatment planning and delivery decisions. In
this review, we provide a detailed overview of the existing
studies for auto-segmentation of OARs in the H&N region by
systematically outlining, analyzing, and categorizing the rele-
vant publications in the field from 2008 to date.
2. METHODOLOGY
In May 2020, a search was conducted on the Web of
Science (https://apps.webofknowledge.com) and PubMed
(https://www.ncbi.nlm.nih.gov/pubmed/) on-line citation
indexing services, with the topic keyword (auto OR auto-
matic) AND (segmentation OR contouring OR delineation)
AND (head AND neck) with a time span from 2008 to date.
Studies not concerned with OAR auto-segmentation in the
H&N region, as well as longitudinal studies and dosimetric
studies without geometric validations were excluded. The
obtained relevant publications were further supplemented
with selected publications found in their list of references. A
detailed analysis of the resulting publications was then con-
ducted from the perspective of image modality,OAR,image
database,methodology,ground truth,performance metrics,
and segmentation performance.
3. RESULTS
In the field of OAR auto-segmentation for RT planning in
the H&N region, the search on the Web of Science and
PubMed yielded, respectively, 281 and 257 results. After
reviewing their abstracts, 49 were considered to be relevant
and were further supplemented with selected publications
from their list of references. In total, we collected 75 publica-
tions
2296
focused on RT planning and three studies focused
on hyperthermia therapy planning
9799
from 2008 to date
(Fig. 1), along with three review papers related to auto-seg-
mentation in the H&N region.
1921
The results of analyzing
these publications from different perspectives are presented
in the following subsections.
3.A. Image modality
The RT planning is primarily performed using CT imag-
ing information because the data on electron density, required
for the calculation of the radiation beam energy absorption
and dose distribution, is derived directly from the CT image
intensities
100,101
. As a result, segmentation of the target vol-
umes and OARs has to be generated from the planning CT
images, therefore making CT the prevailing image modality
also for auto-segmentation approaches (Table I). While CT
images provide a good visibility of the bony anatomy, the
contrast differences between various soft tissues are relatively
low, and can be to a certain degree improved by using an
intravenous contrast enhancement agent.
68,84,95,98,99
On the other hand, MR imaging gained a broad adoption
because of its superior soft tissue contrast resolution
FIG. 1. The chronological distribution of 78 reviewed publications in the
field of organ at risk auto-segmentation in the head and neck region.
Medical Physics, 47 (9), September 2020
e930 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e930
compared to CT images and various imaging setups. In the
recent consensus for CT-based manual delineation guidelines
for OARs in the H&N region,
102
it is strongly recommended
to use, besides CT, also MR images to facilitate the delin-
eation of several soft tissue OARs. Auto-segmentation of
OARs from MR images can be also performed indepen-
dently,
58,63,68,74,94,97
and the resulting segmentation masks
are then propagated to the planning CT images by applying
the geometric transformations of the corresponding MR-to-
CT image registration. Alternatively, image registration can
be performed first, and auto-segmentation is then performed
simultaneously on both image modalities.
57,88,89
While the
obtained results combine the information of the CT and MR
image modality, both approaches rely on an accurate intrapa-
tient multimodal image registration.
10310 5
Similar challenges are present in the case of adaptive RT,
when cone beam CT (CBCT) images are often obtained
between sessions for verifying the patient setup or adjusting
the treatment plan to anatomical changes, as they can be
acquired faster and at lower radiation doses in comparison
to classical CT images. As a pretreatment planning CT
image is always acquired and segmented to plan the dose
distribution, auto-segmentation of CBCT images can be
obtained by CBCT-to-CT registration followed by propaga-
tion of presegmented OARs back to CBCT images.
106,107
Other image modalities can be optionally provided to
obtain complementary information, for example, positron
emission tomography (PET) images can be acquired simulta-
neously with CT or MR images, however, they are not used
for OAR but rather for target volume auto-segmentation.
68
On the other hand, specific OARs (e.g., the carotid artery)
can be successfully auto-segmented only from ultrasound
(US) images,
25
while the feasibility of using dual-energy CT
(DECT) has been recently explored from the perspective of
selecting the optimal energy level for generating the virtual
monoenergetic image,
108
in which different H&N OARs can
be segmented.
29
3.B. Organ at risk
Auto-segmentation is commonly performed for OARs
whose RT-induced damage proved to be linked to late
complications that may endanger the life of the patient or
considerably reduce its quality (Table II).
10911 1
Major sali-
vary glands, that is, the parotid and submandibular glands,
are among the most frequently delineated OARs because of
their importance for a sufficient secretion and proper com-
position of saliva, and therefore for the prevention of xeros-
tomia, and associated problems with swallowing, speech,
and oral health. The eyeballs,vitreous humor,optic chi-
asm,optic nerves,lens,sclera,cornea, and lacrimal glands
have to be spared to prevent optic neuropathy leading to an
impaired vision or even blindness, while the commonly
delineated nervous tissues are the spinal cord and brain,
including the brainstem,cerebrum,cerebellum, and pitu-
itary gland. In particular, segmentation of the former is of
critical importance due to potentially devastating conse-
quences (i.e., tetraplegia) of its over-irradiation. The pha-
ryngeal constrictor muscles and cervical esophagus with
the cricopharyngeal inlet have to be spared to prevent the
swallowing dysfunction.
Other relevant OARs include the thyroid,larynx,trachea,
cochlea,chewing muscles,oral cavity,mastoids,temporo-
mandibular joints,mandible, and brachial plexus, as their
malfunction is connected with a variety of problems (e.g.,
hypothyroidism, swallowing problems, including aspiration
with resulted pulmonary morbidity, hearing decrease, osteo-
radionecrosis, brachial plexopathy). Although the lips and
carotid arteries are commonly delineated for the purpose of
RT planning, reports on auto-segmentation of these OARs
are very limited.
25
3.C. Image database
Auto-segmentation methods are validated on a wide range
of image databases (Table III). Several methods utilize a sub-
set of all available samples as an atlas or as a training set,
while the remaining samples then constitute the test set,
which serves to evaluate the auto-segmentation performance
and accuracy. When the set of all available samples is rela-
tively small, cross-validation (k-fold or, when kequals the
number of samples, leave-one-out) is commonly employed to
enable all available samples to be used for testing.
Among the reviewed publications, one database
36
stands
out as it was devised from CT images of 3495 patients result-
ing in 8251702 training set samples for each studied OAR.
On the other hand, there are several databases of H&N
images that are publicly available. The Cancer Imaging
Archive (TCIA) (https://www.cancerimagingarchive.net/), an
open-access resource platform of medical images for cancer
research,
112 ,113
currently contains 12 databases of the H&N
region, for example, the Head-Neck Cetuximab (https://doi.
org/10.7937/K9/TCIA.2015.7AKGJUPZ),
22,30,46,60,66
Head-
Neck-PET-CT (https://doi.org/10.7937/K9/TCIA.2017.8oje
5q00),
22,30,46,114
TCGA-HNSC (https://doi.org/10.7937/K9/
TCIA.2016.LXKQ47MS)
22,60
and Data from Head and Neck
Cancer CT Atlas (https://doi.org/10.7937/K9/TCIA.2017.
umz8dv6s)
22,115
CT image databases, the RT-MAC (https://d
oi.org/10.7937/tcia.2019.bcfjqfqb)
116
MR image database, or
TABLE I. Image modalities used for auto-segmentation of organs at risk in the
head and neck region for the purpose of radiotherapy planning, and the corre-
sponding references.
Image modality
Computed tomography (CT)
Conventional CT
2224,2628,3042,4457,5962,6470,72,73,7682,8493,95,96,98,99
Dual-energy CT (DECT)
29
Magnetic resonance (MR)
T1-weighted MR
38,40,43,5759,63,68,74,88,89,94
T2-weighted MR
38,63,75,83,97
Ultrasound (US)
25
Medical Physics, 47 (9), September 2020
e931 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e931
the QIN-HEADNECK (https://doi.org/10.7937/K9/TCIA.
2015.K0F5CGLI)
117,118
PET-CT image database.
Although many TCIA databases include reference H&N
OAR delineations, they are associated with considerable variabil-
ity because of the lack of a standardized delineation protocol. As
a result, some of them were augmented and/or combined into
new publicly available databases, for example, the manual delin-
eations of 28 OARs in 140 CT images from the Head-Neck
Cetuximab and Head-Neck-PET-CT databasesaswellasin175
CT images from an in-house database (https://github.com/uci-
cbcl/UaNet#Data),
30
the manual delineations of 21 OARs in 31
CT images from the Head-Neck Cetuximab and TCGA-HNSC
databases forming the TCIA test & validation radiotherapy CT
planning scan dataset (TCIA-RT) (https://github.com/deepmind/
tcia-ct-scan-dataset) database,
60
or the manual delineations of
nine OARs in 48 CT images from the Head-Neck Cetuximab
database forming the Public domain database for computational
anatomy (PDDCA) (http://www.imagenglab.com/newsite/pddca/
) database.
66
Examples of publicly available databases that do not origi-
nate from TCIA include the StructSeg (https://structseg2019.gra
nd-challenge.org/Dataset/) database consisting of 50 CT images
with 22 manually delineated OARs, and the MRI-RT (https://f
igshare.com/s/a5e09113f5c07b3047df) database
105
consisting
of 15 CT and 15 MR images of the same patients with 23 man-
ually delineated OARs from the H&N region.
3.D. Methodology
The most common approach for segmenting OARs from
H&N images is atlas-based auto-segmentation (ABAS),
which has been frequently implemented in commercial
tools.
5,66,119
In ABAS, the image undergoing segmentation is
first registered to images with known reference segmentation
masks that form the atlas, and then these reference masks are,
according to the geometrical transformations obtained from
the registration, propagated back and fused into the final seg-
mentation. To improve the results of ABAS, contour and level
set refinement methods were applied to enhance the bound-
aries of the segmented OARs. Also, models of intensity or
models of shape and appearance were generated to restrain
the registration, and machine learning techniques were used
to improve feature classification (Table IV).
Recently, DL techniques have been applied to various
steps of the RT workflow, including auto-segmen-
tation,
17,18, 12 0
resulting in a superior performance in compar-
ison to other classification and regression methods. The most
popular architecture for DL-based auto-segmentation of med-
ical images is the U-Net,
9
which originates from the fully
convolutional neural networks (CNNs) and consists of a con-
tracting path and an expansive path in the shape of the letter
U. Through convolution, activation, and pooling, the con-
tracting path reduces spatial while increasing feature informa-
tion, and the expansive path performs up-convolutions of the
feature and spatial information with lateral concatenations of
low- and high-level feature maps. The architecture was
released as open-source (https://lmb.informatik.uni-freib
urg.de/resources/opensource/unet/) and was, with additional
augmentations, extended to the 3D U-Net,
10
V-Net
11
and
AnatomyNet.
46
On the other hand, the DeepMedic
13
frame-
work is based on 3D CNNs and consists of two parallel con-
volutional paths for processing the input at multiple scales to
achieve a large receptive field for classification while using
small convolutional kernels that are associated with relatively
low computational costs. Although it was originally devel-
oped for segmenting brain lesions, it was also released as
open-source (https://biomedia.doc.ic.ac.uk/software/deepmed
ic/) and consequently applied in many different fields, includ-
ing H&N OAR auto-segmentation, as well as augmented into
new architectures, such as the DeepVoxNet.
15
TABLE II. Organs at risk in the head and neck region involved in auto-seg-
mentation for the purpose of radiotherapy planning, and the corresponding
references.
Organ at risk
Parotid glands
2224,2632,3437,40,42,4558,60,6366,6870,72,73,7578,80,82
84,86,87,90,91,93,95
Submandibular glands
2224,26,30
32,34,35,40,42,46,50,51,53,55,60,65,66,69,70,77,78,80,82,86,87,95
Brainstem
2224,26,27,2932,35,36,38,40,42,43,4650,52
56,59,60,66,68,69,73,76,80,82,84,86,87,89,90,9295,9799
Brain, cerebrum and cerebellum
23,36,60,82,94,9799
Temporal lobes
27,30
Hippocampus
38
Pituitary gland
30,33,94
Spinal cord and spinal canal
22,23,2628,30,32,3436,42,47,48,51
53,58,60,63,65,68,73,80,82,87,90,95,9799
Cerebrospinal fluid
97
Eyeballs and vitreous humor
22,29,30,33,36,38,43,47,48,59,60,62,65,68,73,79,82,89,94,96
99
Optic chiasm
22,24,27,30,31,36,38,40,43,46,49,54,55,59,65,66,70,73,80,88,89,94
Optic nerves
22,24,27,2931,33,34,36
38,40,43,46,47,49,54,55,59,60,62,65,66,69,74,79,80,88,89,94,96,98,99
Lens
29,30,33,36,47,59,60,9699
Sclera
9799
Cornea
99
Lacrimal glands
60
Extraocular muscle
62
Mandible
23,24,26,28,3032,3436,3942,44,4649,51
56,58,60,65,66,69,78,80,82,86,90,92,93,95
Oral cavity
23,26,28,30,32,35,42,47,50,52,53,80
Temporo-mandibular joints
30,42,47
Mastoids
47
Chewing muscles
87,95
Pharyngeal constrictor muscles
23,26,28,32,40,5053,65,77,80,87
Cervical esophagus and cricopharyngeal inlet
23,26,28,32,36,42,5053,61
Thyroid
23,30,37,44,85,98,99
Larynx
26,28,30,32,35,40,42,47,5053,65,77,80
Trachea
30,52,63
Cochlea
26,32,36,53,60,77,80
Brachial plexus
30,67,71,81
Carotid artery
23,25
Medical Physics, 47 (9), September 2020
e932 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e932
Other DL architectures adopt specific mechanisms to improve
the auto-segmentation of OARs in the H&N region. For exam-
ple, the self-channel-and-spatial-attention neural network
(SCSA-Net)
24
is equipped with attention learning, a technique
for strengthening the discriminative ability of the segmentation
network with minimal or no additional layers, the DenseNet
40
employs adversarial learning, a technique where two CNNs com-
pete in generating more accurate predictions, while the regional
CNN (R-CNN)
28
can be used for rapidly detecting the location
of OARs before actual segmentation.
3.E. Ground truth
The quality of the resulting auto-segmentation is evalu-
ated by the comparison against the corresponding refer-
ence segmentation, often referred to as the ground truth.
Manual delineation (contouring) of OARs in images per-
formed by human experts (e.g., radiation oncologists,
diagnostic radiologists) is the main approach for generat-
ing the ground truth. However, it is a time-consuming
(e.g., 36 hours per image for up to 20 OARs
19,87,98
),
tedious, and costly task that is limited by the subjective
human interpretation of organ boundaries, which is mani-
fested through the intra- and interobserver variability in
the delineation (Table V). Most studies therefore rely on
a single set of ground truth per image, nevertheless, stud-
ies report also two,
32,60,63,79,88,93
three,
22,25,41,58,75,97,99
four,
71,98
five
77
, or even eight
89
independently obtained
sets of ground truth per image. An anatomically validated
ground truth was introduced for a single OAR, that is,
the brachial plexus,
6,121
so that its manual delineations
obtained from high-resolution MR images of up to 12
cadavers were first validated by dissection and then regis-
tered to corresponding CT images to obtain the ground
truth for the purpose of RT planning.
In some cases, multiple ground truth sets were com-
bined into a consensus by generating probability maps,
89
(weighted) majority voting,
44,69
performing intensity-based
patch-based label fusion (Patch),
67
applying the simultane-
ous truth and performance level estimation (STAPLE)
expectation maximization algorithm
67,77,81,89
that estimates
the correct segmentation by weighting each input by its
estimated performance level, or applying the similarity and
truth estimation for propagated segmentations (STEPS)
algorithm
58
that introduces a spatially variant image simi-
larity term into STAPLE. Alternatively, a less labor inten-
sive but relatively biased approach for generating the
ground truth is to manually correct the auto-segmentation
boundaries
73,77,80,84,85,87,93
or to merge different auto-seg-
mentation results with, for example, the STAPLE algo-
rithm.
96
To mitigate the intra- and interobserver delineation vari-
ability, well-defined guidelines have been proposed
102,121128
that help ensuring the consistency and accuracy of manual
delineation. The most established consensus
102
encompasses
a complete set of OARs in the H&N region, with the expert
recommendation to always include the parotid glands, sub-
mandibular glands, spinal cord, and pharyngeal constrictor
muscles in the RT plan. Other guidelines are focused on OARs
involved in the nasopharyngeal carcinoma (i.e., the temporal
lobe, parotid glands, spinal cord, and inner and middle ear),
122
swallowing (i.e., the pharyngeal constrictor muscles,
cricopharyngeal muscle, esophagus inlet muscles, cervical
esophagus, base of tongue, and larynx),
124
salivary function-
ing (i.e., the parotid glands, submandibular glands, sublingual
gland, and minor salivary glands in the soft palate, lips, and
cheeks),
125
hearing and balance (i.e., the inner and middle
ear),
126
brachial plexopathy (i.e., the brachial plexus and
TABLE III. Number of samples included in image databases used for auto-
segmentation of organs at risk in the head and neck region for the purpose of
radiotherapy planning, and the corresponding references.
Image database (number of samples)
510 5:L,
83
7,
98
7,
38
10,
77
10:L,
95
10:L,
78
10:L
87
1118 11:L,
97
12:L,
58
12,
67
13,
93
14:L,
68
14:L,
29
5|10,
74
15:5F,
28
8|8,16:L,
72
16,
90
18:L,
76
18:L,
64
18:L
99
2025 20:L,
85
y 20,
89
20,
84
20,
59
20,
27
21:L,
61
14|10 ,
88
25:LN,
69
15|10,
86
15|10,
40
10|15
91
3033 30,
62
15|15,
63
30:L,
79
20|10N,
70
32,
82
22|10N,
40,55
33:LN,
41
33:2FN
56
3950 25|14N,
49
25|15N,
66
25|15N,
39
30|10,
52
40,
80
41,
96
41,
25
42,
75
44:5F,
57
45:L,32N,
35
33|15N,
31,54
32+6|10N,
24
50:5F,
65
50:5F,
41
40|10
33
7095 70,
38
74,
24
48+12|20,
43
70|17,
26
10|80,
81
70|20,
53
70+10|15,
32
100:L,
73
100:5F
48
>100 52+8|49,
39
100 |10,
44
100 +20|20,
37
142 |15,
50
185:4F,
47
160 +20|20,
42
246*,
51
234|20,15N,
45
261|10N,
46
215|100,
30
328|20,
22
389+51|46,+6|24,15 N,
60
475+5|20
34
>500 549+40|104
23
>1000 (660+1651362+340)|(48168),24
36
Legend: nnumber of cases with a model or without a training set; m|nmcases
for training, ncases for testing; m+k|nmcases for training (if omitted, models
are used), kcases for model selection, ncases for testing; n:kFncases with the
k-fold cross-validation; n:L ncases with the leave-one-out validation; * for 30
patients, 2 or more images available, together 36|262; Nevaluated on the
PDDCA database;
66
evaluated on the TCIA-RT database.
60
TABLE IV. Methodology applied for auto-segmentation of organs at risk in
the head and neck region for the purpose of radiotherapy planning, and the
corresponding references.
Methodology
Atlas
27,29,34,44,52,58,59,61,68,69,71,73,7881,84,85,87,8991,93,94,99
with shape/appearance models
38,66,76,77,82,86,92,95
with intensity models
9799
with feature classification
35,63,72,75,83,86
with contour refinement
72,76,92
with level set refinement
91
Feature classification
64,74
Localization model and feature classification
51,56
Level-set statistical model
88,89
Shape models
25,62,96
Deep learning
23,24,37,40,47,49,54,57,65,70
with U-Net and its versions
22,2831,33,36,39,4143,45,46,50,55,60
with DeepMedic and its versions
26,32,53
Medical Physics, 47 (9), September 2020
e933 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e933
TABLE V. Observer variability of manual delineations of organs of risk in the head and neck region, and the corresponding references (cf. Table VI for the listof
metrics).
Observer variability
Parotid glands
DC (%) 91m;f(o =5,p =10,S),
77
91 (o =2,p =32),
60
89 3,
32
87 3(o =2,p =24,),
60
84 4(o =3,p =12),
58
91m;f,
22
83 2
(o =8,p =16),
145
81 (o =2,p =13),
63
77 8(o =32,p =1)
143
SC (%) sDC: 94.4 2.8 (s=2.85mm,o =2,p =24,)
60
HD (mm) HD91m;f: 10.7 4.4 (o =3,p =12)
58
; DTA91m;f:91
m;f(o =5,p =10,S)
77
; HD91m;f:91
m;f,
22
5.0 1.7 (o =3,p =12)
58
ASD (mm) ASSD: 1.8 0.2
32
; ASD91m;f:1.40.5 (o =3,p =12)
58
; DTA91m;f:91
m;f(o =5,p =10,S)
77
Submandibular glands
DC (%) 91(o =2,p =64)
60
,91
m;f(o =5,p =10,S),
77
87 5,
32
91m;f,
22
83 20 (o =2,p =24,),
60
77 5(o =8,p =16)
145
SC (%) sDC: 89 21.2 (s=2.02mm,o =2,p =24,)
60
HD (mm) DTA91m;f:91
m;f(o =5,p =10,S)
77
; HD91m;f:91
m;f
22
ASD (mm) ASSD: 1.5 0.2
32
; DTA91m;f:91
m;f(o =5,p =10,S)
77
Brainstem
DC (%) 91m;f(o =3,p =11),
97
92 (o =2,p =45),
60
90 2(o =2,p =24,),
60
91m;f,
22
84(82,85) (intra,o =4,p =7),
98
83 3
(o =8,p =16),
145
83 10 (o =8,p =20),
89
91m;f(o =3,p =13),
99
78(73,85) (o =4,p =7),
98
68 12,
32
66 17 (o =31,
p=1)
143
SC (%) sDC: 96.7 2.5 (s=2.5mm,o =2,p =24,)
60
; sPPV: 91m;f(s=2mm,o =8,p =20)
89
HD (mm) HD91m;f:91
m;f(o =3,p =13)
99
; HD91m;f:91
m;f(o =3,p =11),
97
91m;f
22
ASD (mm) ASSD: 2.2 0.5
32
; ASD91m;f: 1.1(0.9,1.2) (intra,o =4,p =7)
98
,91
m;f(o =3,p =13)
99
, 1.7(1.1,2.4) (o =4,p =7)
98
SSD (mm) SDTA91m;f: 0.8 (o =8,p =20,p)
89
; SDTA91m;f:3.9 (o =8,p =20,p)
89
; SDTA91m;f: 7.5 (o =8,p =20,p)
89
Brain, cerebrum (CBR) and cerebellum (CBE)
DC (%) 99 0.3 (o =2,p =24,),
60
99 (o =2,p =75),
60
99 (CBR,intra,o =4,p =7),
98
98 1(o =10,p =1),
143
91m;f(CBR,o =3,
p=13),
99
91m;f(CBR,o =3,p =11),
97
94(93,95) (CBR,o =4,p =7),
98
91m;f(CBE,o =3,p =11),
97
94(91,95) (CBE,intra,
o=4,p =7),
98
91m;f(CBE,o =3,p =13),
99
86(84,88) (CBE,o =4,p =7)
98
SC (%) sDC: 96.2 1.1 (s=1.01mm,o =2,p =24,)
60
HD (mm) HD91m;f:91
m;f(CBE,o =3,p =13),
99
91m;f(CBR,o =3,p =13)
99
; HD91m;f:91
m;f(CBR,o =3,p =11),
97
91m;f(CBE,o =3,
p=11)
97
ASD (mm) ASD91m;f: 0.4 (CBR,intra,o =4,p =7),
98
0.9(0.6,1.2) (CBE,intra,o =4,p =7),
98
91m;f(CBR,o =3,p =13),
99
91m;f(CBE,
o=3,p =13),
99
2.2(1.8,2.5) (CBE,o =4,p =7),
98
2.4(2.0,2.9) (CBR,o =4,p =7)
98
Temporal lobes
DC (%) 82 2(o =8,p =16)
145
Pituitary gland
DC (%) 65 8(o =8,p =16)
145
Spinal cord and spinal canal
DC (%) 95 (canal,o =2,p =23),
60
94 2 (canal,o =2,p =24,),
60
91m;f(o =2,p =15),
63
91m;f(o =3,p =11),
97
88 (o =2,
p=24),
60
85(84,87) (intra,o =4,p =7),
98
84 5(o =2,p =24,),
60
91m;f,
22
80 7(o =29,p =1),
143
79 7(o =3,
p=12),
58
79(73,84) (o =4,p =7),
98
91m;f(o =3,p =13),
99
77 4(o =8,p =16),
145
71 7
32
SC (%) sDC: 99.8 0.4 (s=2.93mm,o =2,p =24,),
60
95 2 (canal,s=1.17mm,o =2,p =24,)
60
HD (mm) HD91m;f:91
m;f(o =3,p =13),
99
7.1 5.2 (o =3,p =12)
58
; HD91m;f:91
m;f(o =3,p =11),
97
91m;f,
22
4.6 3.1 (o =3,
p=12)
58
ASD (mm) ASSD: 4.4 1.9
32
; ASD91m;f: 0.6 (intra,o =4,p =7),
98
91m;f(o =3,p =13),
99
1(0.81,1.3) (o =4,p =7)
98
; ASD91m;f:
1.6 0.8 (o =3,p =12)
58
Cerebrospinal fluid
DC (%) 91m;f(o =3,p =11)
97
HD (mm) HD91m;f:91
m;f(o =3,p =11)
97
Eyeballs and vitreous humor (VH)
DC (%) 91m;f(VH,o =3,p =11),
97
95 (o =2,p =19),
60
93 2(o =2,p =24,),
60
91(90,92) (VH,intra,o =4,p =7),
98
91m;f,
22
89 1(o =8,p =16),
145
91m;f(VH,o =3,p =13),
99
86(82,89) (VH,o =4,p =7),
98
85 3(+eye muscles,o =2,p =15),
79
83 9(o =8,p =20)
89
SC (%) sDC: 96 3(s=1.65mm,o =2,p =24,)
60
; sPPV: 91m;f(s=2mm,o =8,p =20)
89
HD (mm) HD91m;f:91
m;f(VH,o =3,p =13),
99
4.9 0.6 (+eye muscles,o =2,p =15)
79
; HD91m;f:91
m;f(VH,o =3,p =11),
97
91m;f
22
ASD (mm) ASD91m;f: 0.4 (VH,intra,o =4,p =7),
98
91m;f(VH,o =3,p =13),
99
0.7(0.5,1.1) (VH,o =4,p =7)
98
; ASD91m;f: 0.5 0.2 (+
eye muscles,o =2,p =15)
79
SSD (mm) SDTA91m;f: 0.5 (o =8,p =20,p)
89
; SDTA91m;f:2.8 (o =8,p =20,p)
89
; SDTA91m;f: 3.4 (o =8,p =20,p)
89
Optic chiasm
DC (%) 91m;f(o =2,p =10),
88
91m;f,
22
39 23 (o =8,p =20),
89
38 8(o =8,p =16)
145
Medical Physics, 47 (9), September 2020
e934 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e934
TABLE V. Continued.
Observer variability
SC (%) sPPV: 91m;f(s=2mm,o =8,p =20)
89
HD (mm) HD91m;f:91
m;f(o =2,p =10)
88
; HD91m;f:91
m;f
22
ASD (mm) ASD91m;f:91
m;f(o =2,p =10)
88
SSD (mm) SDTA91m;f: 0.7 (o =8,p =20,p)
89
; SDTA91m;f:2.0 (o =8,p =20,p)
89
; SDTA91m;f: 4.7 (o =8,p =20,p)
89
Optic nerves
DC (%) 91m;f(o =2,p =10),
88
79 5(o =2,p =24,),
60
77 6(o =2,p =17),
60
73 4(o =2,p =15),
79
70(65,76) (intra,o =4,
p=7),
98
91m;f(o =3,p =13),
99
60(50,66) (o =4,p =7),
98
91m;f,
22
57 9(o =8,p =16),
145
50 17 (o =8,p =20)
89
SC (%) sDC: 97 3(s=2.5mm,o =2,p =24,)
60
; sPPV: 91m;f(s=2mm,o =8,p =20)
89
HD (mm) HD91m;f:91
m;f(o =2,p =10),
88
2.9 0.5 (o =2,p =15),
79
91m;f(o =3,p =13)
99
; HD91m;f:91
m;f
22
ASD (mm) ASD91m;f: 0.6(0.4,0.7) (intra,o =4,p =7),
98
91m;f(o =3,p =13),
99
0.9(0.6,1.7) (o =4,p =7)
98
; ASD91m;f:91
m;f(o =2,
p=10),
88
0.5 0.1 (o =2,p =15)
79
SSD (mm) SDTA91m;f: 0.3 (o =8,p =20,p)
89
; SDTA91m;f:2.3 (o =8,p =20,p)
89
; SDTA91m;f: 4.0 (o =8,p =20,p)
89
Lens
DC (%) 91m;f(o =3,p =11),
97
88 10 (o =2,p =73),
60
87 8(o =2,p =24,),
60
80(75,85) (intra,o =4,p =7),
98
91m;f(o =3,
p=13),
99
70 5(o =8,p =16),
145
68(55,76) (o =4,p =7)
98
SC (%) sDC: 98 3(s=0.98mm,o =2,p =24,)
60
HD (mm) HD91m;f:91
m;f(o =3,p =13)
99
; HD91m;f:91
m;f(o =3,p =11)
97
ASD (mm) ASD91m;f: 0.3(0.2,0.4) (intra,o =4,p =7),
98
91m;f(o =3,p =13),
99
0.7(0.4,1.2) (o =4,p =7)
98
Sclera
DC (%) 91m;f(o =3,p =11),
97
63(62,67) (intra,o =4,p =7),
98
91m;f(o =3,p =13),
99
48(30,56) (o =4,p =7)
98
HD (mm) HD91m;f:91
m;f(o =3,p =13)
99
; HD91m;f:91
m;f(o =3,p =11)
97
ASD (mm) ASD91m;f: 0.5 (intra,o =4,p =7),
98
91m;f(o =3,p =13),
99
0.9(0.6,1.8) (o =4,p =7)
98
Cornea
DC (%) 91m;f(o =3,p =13)
99
HD (mm) HD91m;f:91
m;f(o =3,p =13)
99
ASD (mm) ASD91m;f:91
m;f(o =3,p =13)
99
Lacrimal glands
DC (%) 67 10 ( o =2,p =24,),
60
63 13 (o =2,p =75 )
60
SC (%) sDC: 93.9 4.7 (s=2.5mm,o =2,p =24,)
60
Mandible
DC (%) 95 (o =2,p =74),
60
94 2(o =2,p =24,),
60
94 3,
32
92 (o =3,p =50),
41
89 2(o =8,p =16),
145
87 7(o =18,
p=1),
143
85 4(o =3,p =12)
58
SC (%) sDC: 98 2(s=1.01mm,o =2,p =24,)
60
HD (mm) HD91m;f: 8.9 3.2 (o =3,p =12)
58
; HD91m;f: 3.9 1.6 (o =3,p =12)
58
ASD (mm) ASSD: 1.2 0.2
32
; ASD91m;f: 0.9 0.5 (o =3,p =12)
58
Oral cavity
DC (%) 94 5,
32
81 4(o =8,p =16)
145
ASD (mm) ASSD: 2.9 0.6
32
Temporo-mandibular joints
DC (%) 50 18 (o =8,p =16)
145
Pharyngeal constrictor muscles
DC (%) 76 8 (inf),
32
91m;f(o =5,p =10,S),
77
72 7 (mid),
32
54 8 (inf),
32
50 8 (middle,o =8,p =16),
145
50 9 (inferior,
o=8,p =16),
145
44 7 (superior,o =8,p =16 )
145
HD (mm) DTA91m;f:91
m;f(o =5,p =10,S)
77
ASD (mm) ASSD: 1.5 0.2 (mid),
32
1.7 0.3 (inf),
32
2.1 0.3 (sup)
32
; DTA91m;f:91
m;f(o =5,p =10,S)
77
Cervical esophagus
DC (%) 64 15
32
ASD (mm) ASSD: 2.0 0.6
32
Thyroid
DC (%) 91m;f(o =3,p =13),
99
84(71,92) (intra,o =4,p =7),
98
82 3(o =8,p =16),
145
76(53,89) (o =4,p =7)
98
HD (mm) HD91m;f:91
m;f(o =3,p =13)
99
ASD (mm) ASD91m;f: 0.8(0.4,1.8) (intra,o =4,p =7),
98
91m;f(o =3,p =13),
99
1.9(0.5,4.7) (o =4,p =7)
98
Larynx
DC (%) 86 11 (supraglottic),
32
91m;f(o =5,p =10,S),
77
73 18 (glottic),
32
60 5 (supraglottic,o =8,p =16),
145
49 9 (glottic,
o=8,p =16)
145
Medical Physics, 47 (9), September 2020
e935 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e935
adjacent structures, esophagus, spinal cord, and tra-
chea),
121,123,127
and optic neuropathy (i.e., the optic chi-
asm).
128
3.F. Performance metrics
The agreement between the ground truth and the resulting
auto-segmentation is quantitatively evaluated by various over-
lap and distance metrics,
129
computed over the corresponding
binary segmentation masks (Table VI). The overlap metrics
originate from the statistical measures of the performance of a
binary classification test, and the Dice coefficient is the stan-
dard and widely accepted metrics for volumetric mask overlap
that measures the harmonic average of the classification preci-
sion and recall (i.e., the F1score). Variations of the volumet-
ric coefficient include the sensitivity and positive predictive
value (often referred to as the inclusion), which measure the
ratio of correctly segmented voxels, while the specificity mea-
sures the ratio of correctly nonsegmented voxels and the false
discovery rate measures the ratio of incorrectly segmented
voxels. On the other hand, surface coefficients measure the
overlap of the corresponding mask surfaces.
Contrary to the overlap metrics, the distance metrics evalu-
ate the mutual proximity of the segmentation mask surfaces.
Within this group, the most established are the Hausdorff dis-
tance and its variations, which measure the maximal distance
between any voxel on the mask surface to the other mask sur-
face, as well as variations of the average surface distance,
which measure the distance between voxels on the mask sur-
face to the closest voxels on the other mask surface.
3.G. Segmentation performance
The performance of different auto-segmentation methods
from the perspective of different metrics and OARs is presented
in Table VII, which summarizes the comparisons of auto-seg-
mentation results to the corresponding ground truth obtained by
manual delineation
*
. A systematic and relatively unbiased evalu-
ation of different methods can be obtained through computa-
tional challenges, which have in the past decade gained
increased popularity and become the standard for validation of
methods in the field of biomedical image analysis.
130
In such
a competition-oriented setting, the challenge organizers first
release images with the ground truth that are used by the par-
ticipating teams for method development, and then the meth-
ods are evaluated on images for which the ground truth is
knowntoorganizersonly.
To this date, five H&N auto-segmentation challenges have
been organized. In 2009
, five different teams attempted to
segment the mandible and brainstem from 25 CT images (10
for training, 15 for testing).
92
The second challenge was orga-
nized by the same group in 2010
, when the same image data-
base was used but six different teams attempted to segment
the parotid glands instead.
91
In 2015
§
, six different teams par-
ticipated in a challenge to segment the brainstem, mandible,
optic chiasm, optic nerves, parotid glands, and
TABLE V. Continued.
Observer variability
HD (mm) DTA91m;f:91
m;f(o =5,p =10,S)
77
ASD (mm) ASSD: 1.4 0.4,
32
1.8 0.4 (supraglottic)
32
; DTA91m;f:91
m;f(o =5,p =10,S)
77
Trachea
DC (%) 91m;f(o =2,p =12)
63
Cochlea
DC (%) 78 8(o=2,p =24,),
60
76 9(o =2,p =8),
60
91m;f(o =5,p =10,S),
77
50 13,
32
37 10 (o =8,p =16)
145
SC (%) sDC: 96 4(s=1.25mm,o =2,p =24,)
60
HD (mm) DTA91m;f:91
m;f(o =5,p =10,S)
77
ASD (mm) ASSD: 1.1 0.4
32
; DTA91m;f:91
m;f(o =5,p =10,S)
77
Brachial plexus
DC (%) 26 (o =5,p =1,S*)
71
VC (%) TPR: 36 (o =5,p =1,S*)
71
HD (mm) HD91m;f: 22.2 (o =5,p =1,S*)
71
Legend: m median, average not reported; f value estimated from a figure, exact value not reported; o number of observers; p number of patients; intra intra-
observer variability; S compared against the STAPLE consensus among other physicians; S* comparison of trainee contours against the STAPLE consensus among
four other expert physicians; P compared against the probability map consensus among other physicians; •—evaluated on the TCIA-RT database;
60
+eye muscles
the eyes and eye muscles were segmented as one organ; ssize of the volumetric neighborhood.
*
Table VII does not report comparisons to the ground truth that was
obtained by manually corrected or merged auto-segmentation
results.
32,80,96
In the case the results were reported separately for
multiple versions of a method Table VII reports only the results for
the best performing method version.
The Head and Neck Auto-segmentation Challenge was part of the
workshop 3D Segmentation in the Clinic: A Grand Challenge during
the conference on Medical Image Computing and Computer Assisted
Interventions - MICCAI 2009.
The Head and Neck Auto-segmentation Challenge: Segmentation
of the Parotid Glands was part of the workshop Medical Image
Analysis in the Clinic: A Grand Challenge during MICCAI 2010.
§
The Head and Neck Auto-Segmentation Challenge 2015 was held
as a standalone satellite event during MICCAI 2015.
Medical Physics, 47 (9), September 2020
e936 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e936
submandibular glands from 40 CT images (25 for training, 15
for testing).
66
In July 2019
|
, 10 teams attempted to segment
the parotid glands, submandibular glands and lymph nodes
from 55 MR images (31 for training, 24 for testing)
131
,how-
ever, detailed results of this challenge have yet not been pub-
lished and are not publicly available. The last auto-
segmentation challenge was carried out in October 2019
,
where 12 teams attempted to segment 13 OARs (i.e., the eyes,
lens, optic nerves, optic chiasm, pituitary gland, brainstem,
temporal lobes, spinal cord, parotid glands, inner ear, middle
ear, temporo-mandibular joints, and mandible) as well as the
TABLE VI. Performance metrics applied for measuring the performance of auto-segmentation of organs at risk in the head and neck region for the purpose of
radiotherapy planning, and the corresponding references and mathematical definitions.
Metrics label name and definition
Overlap metrics, reported in percents (%)
Standard volumetric coefficient
DC Dice coefficient (F1score)
2263,9295,9799
2jA\Bj
jAjþjBj
Variations of the volumetric coefficient (VC)
TPR Sensitivity
24,31,40,41,50,55,56,59,67,68,71,90,93,94,96
jA\Bj
jAj
TNR Specificity
41,93,94,96
A[BÞCj
jACj
PPV Positive predictive value (inclusion)
24,31,40,55,56,68
jA\Bj
jBj
FDR False discovery rate (segmented volume)
50,59
jBnAj
jBj
Variations of the surface coefficient (SC)
sDC Surface overlap
60
j@A\@sBjþj@B\@sA
j@Ajþj@Bj
sPPV Surface positive predictive value (inclusion)
78,89
j@B\@sAj
j@Bj
Distance metrics, reported in millimeters (mm)
Variations of the Hausdorff distance (HD)
HD
reg
Hausdorff distance, regular
25,36,41,43,44,48,52,53,58,66,70,73,76,79,84,88,99
max
a2@A
b2@B
dða;@BÞ;dðb;@AÞ
DTA
max
Maximum distance to agreement
27,77
maxb2@Bdðb;@AÞ
HD95 95-percentile Hausdorff distance
22,23,2931,35,3740,46,49,55,58,66,69,71,97
K95
a2@A
b2@B
dða;@BÞ;dðb;@AÞ
HDmid
95 95-percentile Hausdorff distance, mid-value
24,54,62
1
2K95
a2@Adða;@BÞþK95
b2@Bdðb;@AÞ
HDsw Slice-wise Hausdorff distance
81,82,85,86,91,92
<HD
reg
aggregated over two dimensions>
Variations of the average surface distance (ASD)
ASSD Average symmetric surface distance
26,53,57
Pa2@Adða;@BÞþPb2@Bdðb;@AÞ
j@Ajþj@Bj
ASD
max
Average surface distance, maximum
35,64,66,72,75,76,98,99
max Pa2@Adða;@BÞ
j@Aj;Pb2@Bdðb;@AÞ
j@Bj
ASDmid Average surface distance, mid-value
24,32,40,55,56,61,81
1
2Pa2@Adða;@BÞ
j@AjþPb2@Bdðb;@AÞ
j@Bj
ASD
n/a
Average surface distance, unspecified
39,58,75,79,88
<unspecified>
DTA
avg
Average distance to agreement
27,42,68,77,84,87
Pb2@Bdðb;@AÞ
j@Bj
Variations of the signed surface distance (SSD)
SSD
avg
Signed surface distance, average
45
Pa2@Ads
ða;@BÞx02010;Pb2@Bdsðb;@AÞj@Ajþj@Bj
SDTA
avg
Signed distance to agreement, average
89
Pb2@Bds
ðb;@AÞj@Bj
SDTA
min
Signed distance to agreement, minimum
89
minb2@Bdsðb;@AÞ
SDTA
max
Signed distance to agreement, maximum
89
maxb2@Bdsðb;@AÞ
Legend: |A|and |B|are the number of voxels in volumetric masks A(e.g., ground truth) and B(e.g., auto-segmentation), respectively, and |@A|and |@B|are the number of
voxels in the corresponding subsets of surface voxels @Aand @B, respectively. The Euclidean distances of voxels aand bto surfaces @Band @A, respectively, are defined as
dða;@BÞ¼minb2@Bkax02010;bkand dðb;@AÞ¼mina2@Akbx02010;ak, respectively. The signed Euclidean distance dsða;@BÞis defined as d(a,@B)ifa2BCand as d
(a,@B)ifa2B, and the signed Euclidean distance dsðb;@AÞis defined as d(b,@A)ifb2ACand as d(b,@A)ifb2A. The volumetric neighborhoods within distance s
from surfaces @Aand @Bare defined as @sA¼fx2R3;9a2@A;kxx02010;aksgand @sB¼fx2R3;9b2@B;kxx02010;bksg, respectively.
|
The AAPM RT-MAC challenge was part of the 2019 American Asso-
ciation of Physicists in Medicine (AAPM) Annual Meeting (https://
www.aapm.org/GrandChallenge/RT-MAC/; http://aapmchallenges.c
loudapp.net/competitions/34).
The StructSeg2019: Automatic Structure Segmentation for Radio-
therapy Planning Challenge was held as a standalone satellite event
during MICCAI 2019 (https://structseg2019.grand-challenge.org;
http://www.structseg-challenge.org).
Medical Physics, 47 (9), September 2020
e937 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e937
TABLE VII. Performance of auto-segmentation for the purpose of radiotherapy planning, and the corresponding references (cf. Table VI for the list of metrics).
Results
Parotid glands
DC (%) 92 4,
37
91 2,
75
88 2,
46
91m;f(N),
45
88,
53
87 3(N),
60
87 4(N),
24
87,
64
86 2(N),
40
86 3,
48
86 4,
24
86 5(N),
31
86 5,
42
86 5,
40
86 7,
93
91m;f,
72
85 2,
83
85 3,
26
85 4,
91
85 4,
30
85 5,
47
91m;f(DL),
29
85,
60
84 3,
34
84 3(N),
55
84 4(),
60
84 7(N,IM),
66
84,
76
91m;f,
22
91m;f,
23
83 2,
50
83 3,
58
83 5(),
36
83 5,
86
83 6,
36
83 6(N),
56
91m;f,
95
91m;f,
45
81 4(N),
70
81 5,
28
81 8(N),
49
81 8,
27
81 (N),
54
91m;f,
52
91m;f(ABAS),
29
79 (MR),
68
91m;f,
77
79,
87
79,
57
77 6,
65
91m;f(N),
69
91m;f(N),
35
76 6,
63
76 (CT),
68
91m;f,
35
75,
51
72 10,
90
72 12,
82
91m;f,
84
91m;f,
78
91m;f
73
VC (%) TPR: 97 4,
40
91 9,
93
88 5(N),
40
86 7,
24
85 5(N),
24
85 7(N),
31
85 7,
50
84 (MR),
68
83 10 (N),
56
82 5(N),
55
72 9,
90
71 (CT)
68
; TNR: 91 7
93
; PPV: 88 5(N),
24
87 3(N),
40
87 6(N),
31
86 2(N),
55
84 7(N),
56
83 7,
40
83 (CT),
68
80 6,
24
77 (MR)
68
; FDR: 18 6
50
SC (%) sDC: 95 3(s=2.85mm,N),
60
90 6(s=2.85mm,)
60
HD (mm) HD91m;f:1.4 0.6,
36
1.7 0.7 (),
36
91m;f,
73
5.1 1.1,
48
91m;f,
76
10.7,
53
91m;f,
84
12.1 3.9,
58
91m;f,
52
91m;f(N,IM),
66
14.2 6.6 (N)
70
;
DTA91m;f: 6.8 2.5,
27
91m;f(N),
35
91m;f,
35
91m;f
77
; HD91m;f:91
m;f(N),
69
2.6 1.4,
40
2.7 1.1 (N),
31
3.2 0.6,
37
3.8 1.1 (N),
40
4.0 2.2(N),
55
91m;f,
22
4.6 1.2,
58
5.0 2.4 (N,IM),
66
91m;f,
23
5.2 1.8 ( N),
49
91m;f(DL),
29
6.6 3.3,
30
91m;f,
35
91m;f(N),
35
91m;f
(ABAS),
29
9.3 3.3
46
; HD91m;f: 3.3 1.0 (N),
24
3.9 2.0,
24
3.9 (N)
54
; HD91m;f: 5.0 1.0,
91
5.8 1.6,
86
91m;f
82
ASD (mm) ASSD: 0.9 0.3,
26
1.2,
53
1.6
57
; ASD91m;f:91
m;f,
76
91m;f,
64
91m;f,
72
91m;f(N,IM),
66
3.6 1.4
75
; ASD91m;f:1.0 0.3 (N),
55
1.0 0.4,
40
1.2 0.3 (N),
24
1.3 0.4,
24
1.4 0.4 (N),
40
1.8 0.6 (N)
56
; ASD91m;f: 0.3 0.1,
75
1.4 0.4
58
; DTA91m;f:91
m;f,
77
1.6 0.6,
27
1.7 1.1,
42
91m;f,
84
2.5 2.8,
87
4.8 (MR),
68
6.2 (CT)
68
SSD (mm) SSD91m;f:91
m;f,
45
91m;f(N)
45
Submandibular glands
DC (%) 91m;f,
22
85 10,
42
85,
60
84 6,
24
83,
53
82 5,
86
82 5(N),
40
82 7,
30
82 7,
50
81 4,
46
81 6(N),
55
80 7(N),
24
80 7,
26
80 8(),
60
91m;f,
77
91m;f,
23
78 7(N),
60
78 8(N,IM),
66
77 6,
34
75 13 (N),
31
73,
51
71 12,
65
91m;f,
95
70 12,
82
70,
87
65 8
(N),
70
91m;f(N),
69
91m;f(N),
35
91m;f,
35
91m;f
78
VC (%) TPR: 87 5,
24
85 6(N),
55
80 11,
50
79 8(N),
24
79 9(N),
40
72 16 (N)
31
; PPV: 85 9(N),
40
83 11,
24
82 9(N),
24
82 11
(N),
31
80 8(N)
55
; FDR: 14 8
50
SC (%) sDC: 84 10 (s=2mm,),
60
82 10 (s=2mm,N)
60
HD (mm) HD91m;f: 6.6,
53
91m;f(N,IM),
66
9.7 4.8 (N)
70
; DTA91m;f:91
m;f,
77
91m;f(N),
35
91m;f
35
; HD91m;f:91
m;f,
22
3.2 1.6 (N),
31
4.0 2.7 (N),
40
91m;f,
23
4.8 1.8 (N,IM),
66
4.8 1.7 (N),
55
91m;f(N),
69
91m;f(N),
35
6.0 1.8,
46
6.2 4.3,
30
91m;f
35
; HD91m;f: 3.2 2.3,
24
3.9 1.2
(N)
24
; HD91m;f: 3.8 1.0,
86
91m;f
82
ASD (mm) ASSD: 1.2,
53
1.3 1.2
26
; ASD91m;f:91
m;f(N,IM)
66
; ASD91m;f: 0.9 0.5 (N),
55
1.2 0.7,
24
1.4 1. 0 (N),
40
2.0 1.9 (N)
24
; DTA91m;f:
91m;f,
77
1.2 1.3,
42
1.9 1. 4
87
Brainstem
DC (%) 93 1,
97
93 3,
27
92 3,
40
92,
53
91 1,
86
91 3,
43
90 1,
26
90 2,
48
90 2,
24
90 3,
47
90 4(N),
56
89 3,
42
88 2(N),
24
88 2(N),
31
88 3,
92
88,
60
88 3(),
36
87 3(N),
55
87 3(N),
40
87 4(N,IM),
66
91m;f,
22
91m;f(DL),
29
86 4,
30
86 8,
36
86,
76
85
(80,88),
94
91m;f,
38
91m;f,
95
84 (N),
54
91m;f,
23
83 6,
89
91m;f,
73
91m;f,
52
82 4(N),
49
91m;f(ABAS),
29
80 8(N),
60
79 6,
59
79 10
(),
60
91m;f(N),
35
78,
99
91m;f(N),
69
77 7,
93
77 8,
90
91m;f,
35
76(68,81),
98
91m;f,
84
75 12,
82
73 (MR),
68
69 (CT),
68
67 2,
46
64 16
50
VC (%) TPR: 95 3,
40
91 4,
24
90 4(N),
56
90 4,
40
89 3(N),
24
88 3(N),
55
88 6(N),
40
86 14,
50
87 5(N),
31
79 9,
59
75 14,
90
69 (CT),
68
64 (MR),
68
63 10
93
; TNR: 98 2
93
; PPV: 91 4(N),
56
89 4,
24
89 6(N),
31
89 (MR),
68
88 4(N),
40
87 5
(N),
24
85 2(N),
55
74 (CT)
68
; FDR: 15 8,
59
42 23
50
SC (%) sDC: 83 13 (s=2.5mm,N),
60
83 14 (s=2.5mm,)
60
; sPPV: 91m;f(s=2mm)
89
HD (mm) HD91m;f: 0.6 0.1 (),
36
0.9 2.0,
36
2.7 0.9,
43
2.9 0.3,
48
91m;f,
73
91m;f,
52
6.5,
53
91m;f(N,IM),
66
91m;f,
84
8.7,
99
91m;f
76
; DTA91m;f:
3.5 1.2,
27
91m;f(N),
35
91m;f
35
; HD91m;f:1.30.5,
40
2.0 0.3 (N),
31
91m;f,
97
3.6 0.8 (N),
40
91m;f,
22
91m;f,
23
4.0 0.9 (N),
55
4.0 2.0
(N,IM),
66
91m;f(ABAS),
29
4.8 1.6,
30
91m;f,
38
91m;f(N),
69
91m;f(DL),
29
91m;f,
35
91m;f(N),
35
6.4 2.4,
46
12.4 26.3 (N)
49
; HD91m;f:
2.6 0.8,
24
2.9 (N),
54
3.0 0.6 (N)
24
; HD91m;f: 2.8 0.5,
92
2.8 0.5,
86
91m;f
82
ASD (mm) ASSD: 0.6 0.1,
26
0.8
53
; ASD91m;f:91
m;f,
76
91m;f(N,IM),
66
2.1,
99
2.2(1.7,3.1)
98
; ASD91m;f: 0.7 0.3,
40
0.9 0.3 (N),
56
1.0 0.2,
24
1.2 0.6 (N),
55
1.2 0.2 (N),
24
1.4 0.3 (N)
40
; DTA91m;f: 0.9 0.4,
27
1.0 0.5,
42
91m;f,
84
3.2 (MR),
68
4.3 (CT)
68
SSD (mm) SDTA91m;f: 0.2
89
; SDTA91m;f:4.3
89
; SDTA91m;f: 5.4
89
Brain, cerebrum (CBR) and cerebellum (CBE)
DC (%) 99 0.2 (),
60
99,
60
98 0.3,
36
98 (CBR),
99
97 0.5 (),
36
91m;f(CBR),
23
96 1 (CBR),
97
96 2,
82
94 1 (CBE),
97
94(93,95)
(CBR),
98
91m;f(CBE),
23
92 (CBE),
99
87(80,91) (CBE),
98
84(79,86) (CBE)
94
SC (%) sDC: 95 2(s=1mm,)
60
HD (mm) HD91m;f:1.2 1.5,
36
3.6 0.2 (),
36
10.8 (CBE),
99
18.4 (CBR)
99
; HD91m;f:91
m;f(CBR),
97
91m;f(CBE),
97
91m;f(CBR),
23
91m;f(CBE)
23
;
HD91m;f:91
m;f
82
ASD (mm) ASD91m;f: 0.8 (CBR),
99
1.2 (CBE),
99
1.9(1.3,3.4) (CBE),
98
2.9(2.5,3.2) (CBR)
98
Temporal lobes
DC (%) 93 4,
27
84 3
30
HD (mm) DTA91m;f: 4.7 2.2
27
; HD91m;f: 12.5 4.1
30
ASD (mm) DTA91m;f:1.1 0.6
27
Medical Physics, 47 (9), September 2020
e938 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e938
TABLE VII. Continued.
Results
Hippocampus
DC (%) 91m;f
38
HD (mm) HD91m;f:91
m;f
38
Pituitary gland
DC (%) 90,
33
64 9,
30
30(0,72)
94
HD (mm) HD91m;f: 3.2 0.8
30
Spinal cord and spinal canal
DC (%) 96,
53
95 (canal),
60
92 2 (canal,),
60
91 1,
48
88 2,
27
88 7,
47
88,
60
91m;f,
23
87 3,
65
87 3,
42
86 6,
30
86 9,
97
85 2,
28
85,
99
91m;f,
35
91m;f,
52
83 6,
36
91m;f,
22
82 5,
34
80 5,
58
80 5,
63
80 8(),
60
80 (CT),
68
79 8(),
36
78 (+brainstem),
87
76 8,
90
76
(66,82),
98
91m;f,
95
75,
51
74 8,
82
74 8,
26
91m;f,
73
37 (MR)
68
VC (%) TPR: 80 (CT),
68
76 12,
90
26 (MR)
68
; PPV: 93 (MR),
68
81 (CT)
68
SC (%) sDC: 99 1(s=2.93mm,)
60
,93 3 (canal,s=1.17mm,)
60
HD (mm) HD91m;f: 0.5 0.1 (),
36
0.7 1.3,
36
1.7 0.2,
48
91m;f,
73
91m;f,
52
4.3,
53
6.6,
99
10.4 3.8
58
; DTA91m;f: 3.3 0.3,
27
91m;f
35
; HD91m;f:
91m;f,
22
91m;f,
35
4.3 1.4,
58
91m;f,
97
91m;f,
23
6.9 22.0
30
; HD91m;f:91
m;f
82
ASD (mm) ASSD: 0.4,
53
2.6 1.6
26
; ASD91m;f: 0.8,
99
1.5(0.8,2.4)
98
; ASD91m;f:1.20.4
58
; DTA91m;f: 0.9 0.1,
27
1.6 0.9,
42
2.3 1.4
(+brainstem),
87
3.5 (CT),
68
17.5 (MR)
68
Cerebrospinal fluid
DC (%) 82 7
97
HD (mm) HD91m;f:91
m;f
97
Eyeballs and vitreous humor (VH)
DC (%) 96 1 (VH),
97
95,
60
95 2,
43
94,
33
91m;f(DL),
29
93 1,
48
93 4,
47
92 2(),
60
92 2,
30
91 2(),
36
91m;f(ABAS),
29
91 (MR),
68
89 4,
36
91m;f,
22
88 3,
65
87 (CT),
68
91m;f,
38
85 8,
82
84 5,
59
84 7,
89
84(19) (+eye muscles),
79
81 5,
62
81 (VH),
99
81(78,85),
94
80
(72,84) (VH),
98
91m;f
73
VC (%) TPR: 93 (MR),
68
91 (CT),
68
83 8
59
; PPV: 89 (MR),
68
84 (CT)
68
; FDR: 10 8
59
SC (%) sDC: 95 3(s=1.65mm,)
60
; sPPV: 91m;f(s=2mm)
89
HD (mm) HD91m;f: 0.3 0.1 (),
36
0.4 1.0,
36
1.3 0.3,
43
1.7 0.3,
48
91m;f(DL),
29
91m;f(ABAS),
29
5.0 (VH),
99
5.3(4.7) (+eye muscles),
79
91m;f
73
;
HD91m;f:91
m;f(VH),
97
91m;f,
22
91m;f
38
; HD91m;f: 2.4 0.5,
62
2.4 1.0
30
; HD91m;f:91
m;f
82
ASD (mm) ASD91m;f: 1.0 (VH),
99
1.2(0.9,1.8) (VH)
98
; ASD91m;f: 0.6(0.8) (+eye muscles)
79
; DTA91m;f: 2.0 (MR),
68
3.3 (CT)
68
SSD (mm) SDTA91m;f: 0.8
89
; SDTA91m;f:2.3
89
; SDTA91m;f: 3.8
89
Optic chiasm
DC (%) 91m;f,
88
71 9,
43
64 16,
30
62 17,
27
61 6(N),
24
59 7,
40
59 10 (N),
40
59 14,
24
58 10 (N),
55
58 17 (N),
54
57 13 (N,
UB),
66
91m;f,
73
53 15,
46
52 11 (N),
70
91m;f,
22
45 17 (N),
31
42 17 (N),
49
91m;f,
38
41(0,58),
94
41 14,
36
37 13,
65
37 18,
89
91m;f
(N),
35
24 15
59
VC (%) TPR: 68 8(N),
40
64 11 (N),
24
64 15,
24
61 5,
40
61 10 (N),
55
50 25 (N),
31
48 31
59
; PPV: 65 8,
40
61 12 (N),
24
56 10
(N),
55
56 11 (N),
40
56 16,
24
47 18 (N)
31
; FDR: 77 24
59
SC (%) sPPV: 91m;f(s=2mm)
89
HD (mm) HD91m;f:91
m;f,
88
1.0 0.4,
36
2.5 1.0,
43
91m;f(N,UB),
66
5.6 1.6 (N),
70
91m;f
73
; DTA91m;f: 3.7 1.4,
27
91m;f(N)
35
; HD91m;f:
2.1 1.4,
40
2.2 1.0 (N),
55
2.6 0.8 (N,UB),
66
2.8 1.4 (N),
31
3.8 1.2 (N),
40
4.4 3(N),
49
4.6 2.4,
30
91m;f,
22
91m;f(N),
35
5.8 2.5,
46
91m;f
38
; HD91m;f: 2.7 0.5 (N),
24
2.8 1.6 (N),
54
3.9 2.2
24
ASD (mm) ASD91m;f:91
m;f(N,UB)
66
; ASD91m;f: 0.7 0.2 (N),
55
0.8 0.4,
40
0.9 0.2 (N),
24
1.3 0.3 (N),
40
1.5 0.7
24
; ASD91m;f:91
m;f
88
;
DTA91m;f:1.1 0.7
27
SSD (mm) SDTA91m;f: 0.04
89
; SDTA91m;f:2.4
89
; SDTA91m;f: 3.0
89
Optic nerves
DC (%) 90 4,
37
82 6,
43
81,
33
79 6,
62
91m;f,
88
78 5(),
60
77 6,
60
76 7,
30
76(73,82),
74
75 5(),
36
74 6,
24
74 8(N),
31
74(41),
79
72 4,
40
72 5(N),
24
72 6,
34
72 6(N),
60
72 6,
46
71 8(N),
54
70 4(N),
40
69 5(N),
55
69 9,
36
69 10,
47
91m;f
(ABAS),
29
64 7,
65
64 8(N),
49
63 10 (N,UB),
66
62,
99
60 12,
27
91m;f,
22
58(49,63),
98
91m;f,
38
52 14,
89
91m;f(DL),
29
48 11,
59
91m;f(N),
69
38(0,53),
94
91m;f
35
VC (%) TPR: 85 8(N),
40
80 8(N),
24
77 11 (N),
31
74 6(N),
55
71 10,
24
70 6,
40
64 16
59
; PPV: 80 9,
24
76 7,
40
72 9(N),
31
70 8(N),
40
66 8(N),
24
64 6(N)
55
; FDR: 57 12
59
SC (%) sDC: 98 3(s=2.5mm,),
60
92 6(s=2.5mm,N)
60
; sPPV: 91m;f(s=2mm)
89
HD (mm) HD91m;f: 0.5 0.3 (),
36
0.7 0.8,
36
91m;f,
88
1.8 0.7,
43
3.8(6.9),
79
91m;f(N,UB),
66
6.5
99
; DTA91m;f: 3.7 1.0,
27
91m;f(N)
35
; HD91m;f:
1.4 0.4,
40
2.0 0.5 (N),
40
2.1 0.3,
37
2.3 2.4 (N),
31
2.5 1.0 ( N),
55
2.6 0.4 (N),
49
3.0 1.0 (N,UB),
66
91m;f(ABAS),
29
91m;f
(N),
69
3.7 1.1,
30
4.8 4.3,
46
91m;f(DL),
29
91m;f,
38
91m;f,
22
91m;f
35
; HD91m;f:1.9 1. 9 (N),
24
1.9 1.3,
24
2.2 0.9 (N),
54
3.3 1.6
62
ASD (mm) ASD91m;f:91
m;f(N,UB),
66
1(0.8,1.4),
98
1.0
99
; ASD91m;f: 0.4 0.3,
40
0.6 0.3,
24
0.7 0.2 (N),
24
0.7 0.2 (N),
40
1.1 0.8 (N)
55
;
ASD91m;f:91
m;f,
88
0.6(2.0)
79
; DTA91m;f:1.2 0.5
27
SSD (mm) SDTA91m;f:0.4
89
; SDTA91m;f:2.7
89
; SDTA91m;f: 2.4
89
Medical Physics, 47 (9), September 2020
e939 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e939
TABLE VII. Continued.
Results
Lens
DC (%) 88 5,
97
84 7,
47
84
33
,82 6
30
,81 12
60
,91
m;f(DL)
29
,80 18 ()
60
,79 11 ()
36
,72 14
36
,67
99
, 50(37,66)
98
,91
m;f(ABAS)
29
,
35 25
59
VC (%) TPR: 50 32
59
; FDR: 73 21
59
SC (%) sDC: 93 20 (s=0.98mm,)
60
HD (mm) HD91m;f: 0.2 0.1 ()
36
, 0.4 0.9
36
, 3.7
99
; HD91m;f:91
m;f
97
, 2.0 1.1
30
,91
m;f(DL)
29
,91
m;f(ABAS)
29
ASD (mm) ASD91m;f:1.0
99
, 1.6(0.7,2.9)
98
Sclera
DC (%) 69 5
97
,46
99
, 38(24,55)
98
HD (mm) HD91m;f: 5.9
99
; HD91m;f:91
m;f
97
ASD (mm) ASD91m;f:1.1
99
, 1.8(1.0,3.8)
98
Cornea
DC (%) 43
99
HD (mm) HD91m;f: 6.4
99
ASD (mm) ASD91m;f:1.7
99
Lacrimal glands
DC (%) 70 12
60
,62 13 ()
60
SC (%) sDC: 92 7(s=2.5mm,)
60
Extraocular muscle
DC (%) 76 6
62
HD (mm) HD91m;f:2.1 0.5
62
Mandible
DC (%) 96
60
,96
53
,91
m;f
23
,94 1(N)
56
,94 1(N)
55
,94 1(N)
40
,94 1(N)
24
,94 2()
60
,94 2(N)
60
,94(N)
41
,94
41
,93 1
30
,93 1
92
,
93 1
86
,93 1(N,IM)
66
,93 1(N)
39
,93 1
24
,93 2
46
,93 2(N)
31
,92 1
44
,92 2
48
,92 2
26
,92(N)
54
,91 2(N)
49
,
91 4
47
,91 9
42
,90 2()
36
,90 4
65
,91
m;f
95
,89 4
82
,88 3
28
,89
51
,88
39
,91
m;f(N)
69
,87 3
36
,85 2
34
,91
m;f
35
,91
m;f
78
,
91m;f
52
,91
m;f(N)
35
,82 4
93
,82 4
40
,80 4
58
,78 8
90
VC (%) TPR: 95 2(N)
56
,95(N)
41
,93 2(N)
24
,93
41
,92 2(N)
55
,92 3
24
,92 3(N)
31
,91 3(N)
40
,87 5
40
,83 13
93
,79 11
90
;
TNR: 100 (N)
41
,100
41
,95 3
93
; PPV: 97 2(N)
40
,95 2(N)
24
,95 2(N)
31
,95 5(N)
55
,94 2(N)
56
,94 3
24
,79 4
40
SC (%) sDC: 97 2(s=1mm,)
60
,97 2(s=1mm,N)
60
HD (mm) HD91m;f:1.3 1. 0
36
,1.3 0.4 ()
36
, 2.4 0.4
48
,91
m;f(N,IM)
66
, 4.6 (N)
41
, 6.4
41
, 6.5
53
, 6.7 1.3
44
,91
m;f
52
, 10.9 2.1
58
; DTA91m;f:
91m;f
35
,91
m;f(N)
35
; HD91m;f:91
m;f
23
,1.3 0.5 (N)
31
,1.4 0.6 (N)
39
,1.5 0.3 (N)
55
,1.7 0.6 (N,IM)
66
,1.9 0.6 (N)
40
, 2.4 0.6
(N)
49
, 2.5 0.8
30
, 2.7 1.7
40
,91
m;f
35
,91
m;f(N)
35
,91
m;f(N)
69
, 4.3 1.1
58
, 6.3 2.2
46
; HD91m;f:1.30.1
24
,1.4 0.02 (N)
24
, 1.9 (N)
54
;
HD91m;f:2.1 0.1
92
, 2.6 0.6
86
,91
m;f
82
ASD (mm) ASSD: 0.2 0.1
26
, 0.6
53
; ASD91m;f:91
m;f(N,IM)
66
; ASD91m;f: 0.4 0.1 (N)
55
, 0.4 0.1 (N)
56
, 0.5 0.1 (N)
24
, 0.5 0.1
24
, 0.5 0.1
(N)
40
,1.1 0.7
40
; ASD91m;f: 0.6
39
,1.1 0.3
58
; DTA91m;f: 0.7 0.3
42
Oral cavity
DC (%) 93 3
47
,91 2
30
,89 2
26
,89 2
28
,91
m;f
35
,87 5
42
,91
m;f
52
,787
50
VC (%) TPR: 68 11
50
; FDR: 5 3
50
HD (mm) HD91m;f:91
m;f
52
; DTA91m;f:91
m;f
35
; HD91m;f:91
m;f
35
,7.4 2.1
30
ASD (mm) ASSD: 1.0 0.3
26
; DTA91m;f: 0.8 0.4
42
Temporo-mandibular joints
DC (%) 87 3
30
,87 6
42
,85 5
47
HD (mm) HD91m;f: 2.8 0.9
30
ASD (mm) DTA91m;f: 0.4 0.3
42
Mastoids
DC (%) 82 6
47
Chewing muscles
DC (%) 91m;f(pterygoid)
95
,91
m;f(masseter)
95
,71
87
ASD (mm) DTA91m;f:1.6 1. 4
87
Pharyngeal constrictor muscles (PCM), cricopharynx (CP), orohypopharynx constrictor muscle (OPCM)
DC (%) 81 4 (PCM)
28
,73 11 (CP)
50
,71 8 (PCM)
40
,69 6 (PCM)
65
,68 9 (PCM)
50
,91
m;f(PCM)
23
,91
m;f
52
, 61 (middle) & 58 (inferior) &
46 (superior)
53
, 58 (OPCM)
51
,54 26 (inferior) & 58 18 (middle) & 52 11 (superior) (PCM)
26
,91
m;f(PCM)
77
, 50 (PCM)
87
VC (%) TPR: 78 7 (PCM)
40
,70 11 (CP)
50
,66 9 (PCM)
50
; PPV: 69 8 (PCM)
40
; FDR: 20 16 (CP)
50
,29 9 (PCM)
50
HD (mm) HD91m;f: 9.6 (inferior) & 12.7 (middle) & 14.7 (superior)
53
,91
m;f
52
; DTA91m;f:91
m;f(PCM)
77
; HD91m;f: 2.8 1.3 (PCM)
40
,91
m;f(PCM)
23
Medical Physics, 47 (9), September 2020
e940 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e940
tumor gross target volumes of the nasopharyngeal cancer from
60 CT images (50 for training, 10 for testing). While detailed
results for this challenge are yet to be published, the publicly
available data indicate that best ranking method achieved an
average Dice coefficient of 81% and 95-percentile Hausdorff
distance of 2.8 mm across all OARs. Moreover, a new edition
of this challenge is scheduled for October 2020
**
.
4. DISCUSSION
The field of RT planning in the H&N region expands
beyond auto-segmentation of OARs that was presented in this
review, for example to (auto-)segmentation of target volumes
(including gross target volume, clinical target volume, and
planning target volume), analysis of commercial solutions for
RT planning, dosimetric evaluations, and longitudinal stud-
ies. For additional information, we kindly refer the reader to
specific reviews that include the topics of segmentation
methodology,
8,21
target volume segmentation,
20
ABAS,
19,1 32
commercial segmentation tools,
5,66,119
MR-only RT
133
and
observer variability in OAR delineation
3
.
In this review, we focused on auto-segmentation of OARs
in the H&N region, and provided a comprehensive and sys-
tematic overview with a complete list of relevant references
from 2008 to date along with a systematic analysis from dif-
ferent perspectives that we consider relevant: image modality,
OAR,image database,methodology,ground truth,perfor-
mance metrics, and segmentation performance. In this sec-
tion we discuss the advantages and limitations of the
TABLE VII. Continued.
Results
ASD (mm) ASSD: 1.6 1.7 (inferior) & 1.9 1.7 (middle) & 3.7 5.2 (superior) (PCM)
26
, 2.0 (middle) & 2.0 (inferior) & 2.1 (superior)
53
; ASD91m;f:
1.0 0.5 (PCM)
40
; DTA91m;f:91
m;f(PCM)
77
, 2.0 1.9 (PCM)
87
Cervical esophagus with the cricopharyngeal inlet, upper esophageal sphincter (UES)
DC (%) 86 3
42
,82 6
28
,81 14 (UES)
50
,81 7
36
,70 7
61
,69 10
26
,62
51
,60 11
50
,91
m;f
52
,91
m;f
23
,35
53
VC (%) TPR: 80 16 (UES)
50
,50 15
50
; FDR: 15 14 (UES)
50
,21 14
50
HD (mm) HD91m;f:1.1 1.1
36
,91
m;f
52
, 35.8
53
; HD91m;f:91
m;f
23
ASD (mm) ASSD: 1.3 0.6
26
,7.7
53
; ASD91m;f:1.90.7
61
; DTA91m;f:1.0 0.7
42
Thyroid
DC (%) 92 3.7
37
,86 5
30
,91
m;f
23
,80
85
,79 6
44
,68
99
, 57(37,80)
98
HD (mm) HD91m;f: 10.2 2.9
44
,17.5
99
; HD91m;f: 2.7 0.6
37
,91
m;f
23
, 3.9 2.4
30
; HD91m;f:91
m;f
85
ASD (mm) ASD91m;f: 2.5
99
, 5.1(1.1,9.3)
98
Larynx
DC (%) 89 3
30
,87 4
47
,86 4
65
,86 7
42
,83 8
28
,80 5
40
,78 4
50
,91
m;f
52
,77 7
26
,74
51
,91
m;f
35
,71
53
,91
m;f
77
VC (%) TPR: 88 6
40
,83 8
50
; PPV: 77 6
40
; FDR: 25 10
50
HD (mm) HD91m;f:11.1
53
,91
m;f
52
; DTA91m;f:91
m;f
35
,91
m;f
77
; HD91m;f: 3.2 2.7
40
, 6.2 5.8
30
,91
m;f
35
ASD (mm) ASSD: 1.0 0.4
26
, 2.2
53
; ASD91m;f:1.71.6
40
; DTA91m;f:1.3 1.0
42
,91
m;f
77
Trachea
DC (%) 84 8
63
,81 5
30
,91
m;f
52
HD (mm) HD91m;f:91
m;f
52
; HD91m;f: 20.9 9.0
30
Cochlea
DC (%) 95 10
60
,82 7()
60
,74
53
,66 13
36
,65 7
26
,41 8()
36
,91
m;f
77
SC (%) sDC: 99 2(s=1.25mm,)
60
HD (mm) HD91m;f: 0.5 0.4
36
, 0.7 0.1 ()
36
,1.7
53
; DTA91m;f:91
m;f
77
ASD (mm) ASSD: 0.4
53
, 0.6 0.2
26
; DTA91m;f:91
m;f
77
Brachial plexus
DC (%) 77
81
,56 11
30
,53 12
67
,32
71
VC (%) TPR: 49
71
,47 12
67
HD (mm) HD91m;f: 15.4
71
; HD91m;f:91
m;f
81
ASD (mm) ASD91m;f:1.6
81
Carotid artery
DC (%) 91
25
,91
m;f
23
HD (mm) HD91m;f: 0.9
25
; HD91m;f:91
m;f
23
, 18.3 14.5
30
Legend: m median, average not reported; f value estimated from a figure, exact value not reported; o1/o2 compared against observer 1/observer 2; Nevaluated on
the PDDCA database;
66
•—evaluated on the TCIA-RT database;
60
CT, MR the results in
68
are obtained from CT or MR images; IM, UB winning teams of the 2015
computational challenge
66
;+brainstem the spinal cord and brainstem were segmented as one organ; +eye muscles the eyes and eye muscles were segmented as one
organ; +chiasm optic nerves and optic chiasm were segmented as one organ; ssize of the volumetric neighborhood.
**
The Automatic Structure Segmentation for Radiotherapy Planning
Challenge 2020 is planned as a standalone satellite event during
MICCAI 2020 (https://miccai2020.org/en/MICCAI-2020-CHAL
LENGES.html).
Medical Physics, 47 (9), September 2020
e941 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e941
reviewed methods, and provide corresponding recommenda-
tions from the relevant perspectives.
4.A. Image modality
For the purpose of RT planning, CT images are always
acquired because they contain information about the electron
density that is required to calculate the interaction of radia-
tion beams with tissues, and further used to define radiation
dose distribution maps. Although MR images proved to be
advantageous for RT planning because they can provide
anatomical information complementary to CT images, espe-
cially in the case of soft tissues, they are not commonly used
in clinical practice. Moreover, the structures in MR images
may be subjected to geometrical distortions,
134
for example,
due to the magnetic field inhomogeneities.
101
However, as
MR imaging has become more accessible in the past decade,
it can be expected that its utilization will increase toward
making MR images an integral part of RT planning, and that
auto-segmentation approaches exploring both CT and MR
image modalities simultaneously will be further developed.
The start of this trend is already indicated by the recent
increase in the number of studies that include the MR image
modality.
38,40,43,57-59
In a single study where OARs were
independently auto-segmented from CT and MR images of
the same patients, the results for MR images outperformed
those for CT images in the case of the parotid glands, eye-
balls, and brainstem.
68
Although methods for MR-only RT planning are being
developed,
135
their routine clinical implementation is still
very limited, as challenges remain of how to assign data on
electron density to MR images for the purpose of dose calcu-
lation
133
by means of synthetic CT image generation
136
or
MR-to-CT image registration.
105,137
In general, better perfor-
mance is achieved by applying deformable (i.e., nonrigid)
image registration and using rigid registration as the first
step,
103,104
however, this may not always be the case.
105
To
further improve the registration process, DL approaches have
recently started to emerge.
137
Complementary information can be obtained from PET-
CT and PET-MR scanners, which combine the CT or MR
with the PET modality and acquire coregistered images.
However, as PET images enable functional investigation
through the radiolabeling of tissues with a high metabolic
activity (i.e., cancerous cells), they are more appropriate for
target volume than for OAR segmentation.
118,138
On the other
hand, monoenergetic images generated from DECT were
shown to be adequate for H&N OAR segmentation
108
because they can exhibit superior image quality in compar-
ison to classical 120 keV CT, especially in terms of a better
contrast-to-noise ratio, reduced influence of the beam harden-
ing phenomenon and metal artifact suppression. For several
OARs, it was shown that ABAS and DL-based auto-segmen-
tation can be successfully applied to monoenergetic images
of 40 and 70 keV.
29
However, a study on a larger DECT data-
base with a complete set of OARs and comparison to
classical CT images needs to be performed in order to objec-
tively assess and identify eventual advantages.
To conclude, both CT and MR image modalities are being
explored for H&N OAR auto-segmentation, but the potential
of the MR image modality for auto-segmentation of several
soft tissues should be explored more in the future.
4.B. Organ at risk
The relatively small area of the H&N region comprises a
large number of OARs with a relatively complex and variable
anatomy. The decision of which OAR needs to be delineated is
based on a number of factors, including the proximity of the
OAR to the tumor, its susceptibility to the radiation and impor-
tance for life functions. Auto-segmentation was therefore com-
monly performed for OARs whose RT-induced damage
proved to be linked to post-RT complications that may endan-
ger the life of the patient or notably jeopardize its quality.
109-111
Due to the potentially devastating morbidity resulting
from over-irradiation of the spinal cord and brainstem,
delineation of these two anatomical structures is a manda-
tory part of any segmentation process in the H&N
region.
102
The parotid and submandibular glands are by far
the most represented of the remaining OARs, although their
poor boundary distinction in CT images makes segmenta-
tion very challenging. On the other hand, the optic chiasm
and optic nerves are also demanding to segment because of
their small size and tubular geometry. The mandible is the
only well visible bony structure, and due to its excellent
visibility in CT images it can act as a spatial reference for
segmenting other neighboring OARs.
51,66
As the definition
of exact OAR boundaries is subjected to observer interpre-
tation, new studies should adhere to existing delineation
guidelines.
102
Nevertheless, with the introduction of addi-
tional image modalities, such as the MR, the boundaries of
OARs should become easier to interpret.
To conclude, the spinal cord, brainstem and major salivary
glands (the parotid and submandibular glands) are the most
studied OARs in the H&N region, however, more experi-
ments should be conducted in the future for auto-segmenta-
tion of the pharyngeal constrictor muscles, larynx and
cervical esophagus with the cricopharyngeal inlet that are
important for RT planning.
4.C. Image database
To account for the anatomical and disease-related vari-
ability among different patients as well as for the variability
in the image acquisition settings, auto-segmentation methods
must be validated on a preferably large number of images
and patients to ensure reliable statistical results. In general,
the current trend shows an increasing number of cases being
included in evaluation databases, which is mostly due to the
application of state-of-the-art machine learning methods,
such as DL, which require relatively large training datasets.
Image databases should include representative clinical
Medical Physics, 47 (9), September 2020
e942 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e942
samples, with images from various acquisition setups and of
patients with different tumors according to their localization
and stage. However, images should retain certain common
characteristics (e.g., imaging sequence, field of view, image
noise), otherwise auto-segmentation may become too chal-
lenging. Still, objective comparison of different auto-seg-
mentation methods is often difficult, because they were
evaluated on different image databases, or on a different set
of annotations representing reference OAR delineations. As
the construction of a representative set of samples requires a
lot of effort, many such databases remain proprietary and
represent a valuable research advantage.
Besides using proprietary databases, evaluation should be
performed also on publicly available image databases to
ensure an objective comparison to existing approaches.
Among the publicly available CT image databases, PDDCA
66
has been already used in several studies
45,5456,60,69,70
because it was devised for a computational challenge that set
benchmarks for auto-segmentation of OARs in the H&N
region, while TCIA-RT
60
and StructSeg have yet to gain visi-
bility. As it was shown that MR images provide valuable sup-
port to CT image auto-segmentation, or can be treated as
standalone in the case of MR-only RT planning, public MR
image databases have recently surfaced, such as the RT-
MAC
116
or MRI-RT,
105
which is augmented with CT images
of the same patients.
To conclude, several image databases with the correspond-
ing ground truth are currently publicly available and should
be used for an independent performance evaluation of OAR
auto-segmentation approaches. In the future, there is a need
for such databases to evolve, that is, to include a large number
of cases and reference delineations, preferably performed by
multiple observers from different institutions and at multiple
times, so as to enable a proper evaluation of multimodal
auto-segmentation methods.
4.D. Methodology
For OAR auto-segmentation in the H&N region, ABAS
is still the prevailing methodological approach, and has
been as such implemented in several commercial tools for
RT planning.
5,66,119
However, its segmentation performance
highly depends on the range of anatomical variations that
can be observed in the library of atlases, which can be
built up from previously treated patients or, if used, built
into the commercial software. As a result, ABAS may per-
form poorly for cases that differ from the library of
atlases,
5
therefore making the selection of the most appro-
priate atlases a challenging task. For most OARs, perfect
ABAS results cannot be reasonably expected, however, the
performance of a level corresponding to clinical quality
can be consistently expected given a large atlas database
under the assumption of perfect atlas selection.
139
It was
shown that ABAS reaches its upper performance limit with
the inclusion of 1020 atlases,
23,67,140
and that it generally
underperforms for small and/or thin OARs (e.g., swallow-
ing muscles).
87
Another drawback is its long execution
time due to atlas registration, which limits on-line clinical
applications.
Recently, the focus has shifted toward machine learning,
with DL approaches for H&N OAR auto-segmentation start-
ing to emerge as early as in 2016,
70
and have been consider-
ably increasing in number since (Fig. 1). When compared to
ABAS, DL-based auto-segmentation requires considerably
less time for on-line applications, but is associated with a
high computational burden in the off-line training phase,
where currently up to a few days or more may be required to
complete the model training. Moreover, the training set of
images has to be quite large, but the actual number depends
on image quality and representativeness, and can be reduced
by applying different training set augmentation techniques
(e.g., intensity and geometrical transformations of original
images). The underlying DL model is, in comparison to
ABAS, also more robust because it can be trained with all
available data, including patients with metal artifacts and
diverse anatomy.
7
The main advantage of DL-based auto-
segmentation is in its ability to systematically learn the most
adequate features for segmentation from a set of annotated
training images, and then automatically search for the same
features in a previously unseen image. Although this proved
to result in the best overall segmentation performance,
49
it is
not without drawbacks. For example, the most popular DL-
based medical image auto-segmentation architecture, the U-
Net,
8
can result in many false positives if the approximate
location and size of the observed OAR is not constrained
beforehand. As a result, state-of-the-art techniques from the
field of artificial intelligence (e.g., attention learning,
24
adversarial learning
40
) are constantly being explored and uti-
lized to improve its performance.
141
Both ABAS and DL-based auto-segmentation are based
on reference OAR delineations in the given image database,
which may, however, not represent the ground truth. If the
cases included in the image database are not representative
for the actual OAR segmentation task, or if the corresponding
manual delineations are of low quality and inconsistent, the
underlying DL model will either fail to train or produce
inconsistent segmentations. Therefore, attention needs to be
given to the choice of image database and to reduce the intra-
and interobserver variability of reference delineations, for
example, by including publicly available databases
112 ,113
and
adhering to OAR delineation guidelines.
102
To conclude, while ABAS was the dominating approach for
segmenting OARs in the H&N region in the past, current
approaches have shifted to DL, resulting in a superior segmen-
tation performance. Moreover, DL-based auto-segmentation is
expected to become even more sophisticated through the
inclusion of methodological advances in the field of artificial
intelligence,
142
and even more powerful from the perspective
of being trained on larger and more diverse image databases.
4.E. Ground truth
To generate the ground truth, manual delineation of OARs
by human experts is still the most common approach,
Medical Physics, 47 (9), September 2020
e943 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e943
although it has been recognized as a very tedious and time-
consuming task. For the delineation of ground truth contours,
it is strongly recommended to follow the recently introduced
guidelines,
102
which have been formed as a consensus of dif-
ferent professional associations and groups,
††
and also incor-
porate guidelines that have been introduced in the
past.
124,125,127
However, even if guidelines are followed, the
delineation is still biased by subjective observer interpreta-
tion, and therefore it is strongly recommended to perform
basic observer training with joint delineation review ses-
sions,
143,144
and to include additional modalities to improve
the visibility of structure boundaries.
144
Moreover, to increase the reliability of statistical results
related to the methodology testing in the clinical context, the
ground truth should be provided from multiple experts per-
forming the delineation on multiple time occasions, therefore
enabling the evaluation of the variability among and within
the observers, that is, the inter- and intraobserver variability,
respectively. In a study where manual H&N OAR delin-
eations of eight different observers from CT and MR images
of 20 subjects were compared to ABAS, it was reported that
manual delineations and ABAS generated structures of simi-
lar volume with no statistically significant difference in vol-
ume overlap, however, the observers exhibited higher
variation with respect to tubular structures (e.g., optic chiasm,
optic nerves).
89
On the other hand, a different study evaluated
32 multi-institution delineations of six OARs from a single
CT image, and reported a significant delineation variability
among observers that consequently caused large differences
in the planned radiation doses, with the most variable organs
being the brainstem and the two parotid glands.
143
Similarly,
in a multi-institutional study where eight observers manually
delineated 20 OARs from 16 CT images, statistically signifi-
cant interobserver delineation variability as well as differ-
ences in dosimetric parameters were reported for all OARs,
however, both could be reduced for most OARs by manually
editing the results of ABAS, in particular for the brainstem,
spinal cord, cochleae, temporo-mandibular joints, larynx, and
pharyngeal constrictor muscles.
145
On the other hand, a high
agreement was reported for auto-segmentations of 13 OARs
from 125 CT images that were independently obtained at
seven different institutions with the same commercial RT
planning system but with different institution-specific set-
tings.
82
Nevertheless, the variability in manual as well as auto-seg-
mentation results cannot be completely eliminated because
each individual observer is exposed to his/her subjective bias
that is conditioned by experience (i.e., novice vs expert), and
because imaging protocols and setups as well as RT protocols
and planning systems vary greatly across institutions.
146
For a
particular OAR, the observer variability imposes the upper
limit for auto-segmentation performance, as we cannot expect
any auto-segmentation result to overcome the obtained con-
sensus among the ground truth delineations. Although man-
ual correction of auto-segmentation boundaries is a less labor
intensive approach for ground truth generation, it contains
auto-segmentation bias and is therefore not the most appro-
priate reference for performing auto-segmentation evaluation.
On the other hand, the ground truth can be relatively easily
obtained by using phantom objects, synthetic images, or
cadaver sections,
67,89,121,147
however, they represent unrealis-
tic surrogates for patient imaging and were in fact not present
in the reviewed studies.
To conclude, delineation guidelines should be followed for
the ground truth generation, and participation of multiple
experts from multiple institutions is recommended for a reli-
able reporting of the intra/interobserver variability.
4.F. Performance metrics
When reporting the geometric accuracy of auto-segmenta-
tion results, there is unfortunately no universal consensus
about the corresponding performance metrics. Moreover, var-
ious mutually incompatible definitions and different nomen-
clatures make the comparison of auto-segmentation results
relatively difficult.
129
As there is a strong need for an agreed-
upon metrics, which would allow an exact comparison of
results and eliminate the need for specifying its definition in
each new study, we would recommend the nomenclature and
definitions presented in Table VI.
For reporting the volumetric overlap of two segmentation
masks, we advise a mandatory use of the Dice coefficient.
Although the Jaccard index is an established volumetric coef-
ficient and has been reported in a few studies,
59,67,96
it is
redundant because it can be calculated from the Dice coeffi-
cient
‡‡
. Other variations of the volumetric coefficient provide
additional insight into the segmentation performance from
the perspective of binary classification, specifically the
degree of over- or under-segmentation, but their interpretation
may be ambiguous. For example, in the case of reporting the
specificity, a dilemma about the calculation of true negatives
(the set complement in its definition in Table VI) may arise.
94
On the other hand, sensitivity is the metrics of choice in the
case we want to reduce the number of voxels that are missing
from the resulting segmentation (i.e., false negatives), even if
at the expense of adding voxels (i.e., false positives).
Although volumetric metrics may result in a high overlap,
clinically relevant differences between segmentation bound-
aries may still exist, which are important in RT planning
because they are used to compute the radiation dose distribu-
tion. The mismatches in boundary segments that encompass
††
Radiotherapy Oncology Group for Head and Neck (GORTEC),
France; The Danish Head and Neck Cancer Group (DAHANCA),
Denmark; Head and Neck Cancer Group of the European Organiza-
tion for Research and Treatment of Cancer (EORTC), European
Union; Hong Kong Nasopharyngeal Cancer Study Group
(HKNPCSG), Hong Kong; National Cancer Research Institute
(NCRI), UK; National Cancer Institute of Canada Clinical Trials
Group (NCIC CTG), Canada; NRG Oncology Group (NRG), USA;
Trans Tasman Radiation Oncology Group (TROG), Australia.
‡‡
Jaccard index: JI =|AB|/|AB|; Dice coefficient: DC =2|AB|/(|
A|+|B|); DC =200%JI/(100%+JI); JI =DC/(200%DC).
Medical Physics, 47 (9), September 2020
e944 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e944
a volumetrically small but eventually important regions of
interest can be, to a certain degree, captured by surface coef-
ficients,
60
which measure the overlap of the corresponding
mask surfaces. While surface coefficients may gain a wider
adoption among the overlap metrics in the future, especially
if different values of the neighborhood distance sare
explored simultaneously, a consensus needs to be made about
their usage, with the surface Dice coefficient being the most
appropriate due to its bidirectional (i.e., symmetric) proper-
ties.
Any overlap metrics should be accompanied with at least
one distance metrics, which provides complementary infor-
mation about the segmentation boundaries by measuring the
spatial separation between the corresponding surfaces. The
Hausdorff distance measures the maximum point-to-point
distance between two segmentation masks, and it originates
from a proper mathematical metrics to measure the distance
between two subsets in a metric space. However, because it is
very sensitive to outliers, the 95-percentile version of this
metrics may be alternatively used to robustly suppress their
influence. On the other hand, two-dimensional computation
of metrics, such as in the case of the slice-wise Hausdorff dis-
tance, is not appropriate for volumetric segmentation. In the
case of the average surface distance, we recommend to report
the average symmetric surface distance because it equally
takes into account all possible point-to-surface distances and
is bidirectional (i.e., symmetric). On the other hand, both the
maximum and mid-value versions of the average surface dis-
tance unnecessarily use two different point-to-surface weight-
ing factors, while the average distance to agreement is
unidirectional. The variations of the signed surface distance
can be used to deduce consistent over- or under-segmenta-
tion, however, they are unable to detect the overall boundary
mismatch when either over- or under-segmentation regions
are present in an approximately equal quantity, because they
cancel out. In general, distance metrics perform better when
the observed structures are small, and are especially efficient
for structures with a high surface-to-volume ratio (e.g., tubu-
lar structures such as the spinal cord, optic nerve and optic
chiasm, and the pharyngeal constrictor muscles) and cases
where otherwise acceptable small boundary variations result
in a large relative volume discrepancy (e.g., the pharyngeal
constrictor muscles). Other reported metrics, such as the vol-
ume difference
35,93,94
or distance/variation of mass cen-
ters,
29,52,94
do not represent meaningful overlap or distance
measurements, and are therefore not proper to evaluate seg-
mentation results.
It has to be noted that, for a specific OAR, the reported per-
formance metrics only evaluate how close is the obtained seg-
mentation mask to its corresponding ground truth. Although
they represent a powerful tool for general method comparison,
they overlook the potential consequences of segmentation
errors from the clinical perspective. However, a method named
LinSEM
148
has been recently developed from the premise that
an ideal segmentation metrics should reflect the degree of clin-
ical acceptability directly from its values, and show the same
acceptability meaning with the same value for structures of
different shape, size, and form. The method combines, in a lin-
ear manner, the commonly used segmentation performance
metrics (i.e., the Dice coefficient, Jaccard index, and Haus-
dorff distance) with the clinical acceptability, which was pro-
vided by an expert observer (i.e., a subjective score from 1 to
5). By performing experiments on CT images including OARs
from the H&N region (i.e., the right parotid gland, mandible,
and cervical esophagus), it was concluded that the Jaccard
index has the most linear relationship with the acceptability
before actual linearization, while the Dice coefficient and
Hausdorff distance exhibit a significant improvement in
acceptability meaning from the perspective of an ideal met-
rics-to-acceptability relationship.
148
To conclude, the Dice coefficient is the standard volumet-
ric coefficient for reporting the overlap of two segmentation
masks, and it should be always accompanied with at least one
distance metrics, preferably the Hausdorff distance (or its 95-
percentile version) and the average symmetric surface dis-
tance. Future research should focus on combining existing
geometrical performance metrics with clinical acceptability
scores and risk assessments into a new class of metrics for
the purpose of augmenting the quantitative evaluation of seg-
mentation performance.
4.G. Segmentation performance
Although the auto-segmentation methods do not always
provide clinically acceptable results, their performance is
constantly improving due to the application of new technolo-
gies. The auto-segmentation of OARs and subsequent manual
corrections require considerably less time than direct manual
delineation
19,119
and reduce the intra/interobserver variabil-
ity.
145
However, a direct comparison of the segmentation per-
formance among different methods is difficult, mostly
because they were, in general, not evaluated on the same
image databases. The comparison is therefore often affected
by different image acquisition setups (e.g., imaging sequence,
field of view), image properties (e.g., size, resolution, noise),
manual delineation guidelines and patient cohorts. Moreover,
the studies report different performance metrics, focus on dif-
ferent OARs or even do not provide a detailed statistical
description of the corresponding ground truth.
The results reported by state-of-the-art techniques indicate
that auto-segmentation of OARs in the H&N region is feasi-
ble to be clinically implemented into an automated RT plan-
ning system. However, from the perspective of RT, both
target volume and OAR segmentation has direct clinical
implications. Apart from the geometrical agreement with the
corresponding ground truth, auto-segmentation results have
to be evaluated also from the perspective of their dosimetric
impact, because even if the geometric differences are small,
the impact on the final dose distribution may still be clinically
relevant. As a result, the geometrical performance metrics are
not sufficient to predict the dosimetric impact of auto-seg-
mentation inaccuracies. For example, it was shown that the
interobserver variability in manual delineations of OARs
from the H&N region (e.g., the brainstem, brain, parotid
Medical Physics, 47 (9), September 2020
e945 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e945
glands, mandible, and spinal cord) can lead to substantially
different dosimetric plans.
143,145,149
However, for several
OARs (e.g., the brainstem, spinal cord, cochlea, temporo-
mandibular joint, larynx and pharyngeal constrictor muscles),
the consistency in dosimetric plans can be improved by
reducing the interobserver variability, for example, by manu-
ally editing the results of ABAS,
90,145,150
which was shown to
produce clinically acceptable RT plans from the perspective
of dosimetric impact.
58
Similar conclusions were drawn in a
study that applied DL-based auto-segmentation,
50
and
reported little effect on the OAR dose despite the variation in
the Dice coefficient, indicating that imperfect geometrical
performance metrics do not necessarily result in inferior
OAR dosimetry.
50
Although the average radiation dose was,
for specific OARs (i.e., the pharyngeal constrictor muscles),
significantly higher for the DL-based than for manually
defined RT plans, these differences were not considered to be
clinically relevant.
50
On the other hand, a study evaluated RT
plans, obtained from expert manual delineations of several
H&N OARs, against those obtained by a knowledge-based
planning system, which is based on a preconfigured model
inferred from a cohort of past RT plans that were judged as
optimal.
151,152
A weak correlation between the geometric per-
formance metrics (i.e., the Dice coefficient, Hausdorff dis-
tances, volume differences, and centroid distances) and
dosimetric indices (i.e., dose to the hottest 98% of the planning
target volume and mean OAR dose) was reported, indicating
that the geometric performance metrics are not appropriate for
estimating the dosimetric impact.
152
However, besides obser-
ver variability in manual delineation, other factors may affect
the RT plan, such as the changes in the location and size of the
observed OARs due to RT effects, or the random and system-
atic patient setup er rors due to multiple RT sessions. In a study
where reference manual delineations were randomly perturbed
to simulate delineation variability and combined with simu-
lated patient setup variability at random magnitudes, it was
concluded that the dosimetric impact of the delineation vari-
ability is overstated when considered in isolation from the
setup variability, and that it depends largely on the OAR dis-
tance from the target volume.
153
Nevertheless, it has to be
noted that the dosimetric impact of OAR auto-segmentation is
always compared to the dosimetric impact of manual OAR
delineation, which is inherently subjected to observer variabil-
ity. Future studies on H&N OAR auto-segmentation should
therefore report, besides multiple geometric performance met-
rics, also metrics related to the dosimetric impact to encom-
pass clinically relevant endpoints for RT planning.
Nevertheless, the analysis of the reported results indicates
that the performance of OAR auto-segmentation in the H&N
region is, if we consider as clinically acceptable the results
with the Dice coefficient above 90% and average surface dis-
tance below 1.5 mm, currently adequate for several OARs,
including the parotid glands, brainstem, brain, cerebrum and
cerebellum, temporal lobes, spinal cord, eyeballs and vitreous
humor, mandible, oral cavity, and cochlea (Table VII).
48,60,97
According to the reported interobserver variability, there may
still be room for improvements in auto-segmentation of the
salivary glands, especially if performed on MR images.
68
On
the other hand, the eyeballs can be segmented relatively accu-
rately due to their spherical geometry, while the optic nerves
and optic chiasm can come close to the ground truth in terms
of the distance but not overlap metrics.
66,88
For the pharyn-
geal constrictor muscles, larynx and cervical esophagus with
the cricopharyngeal inlet, unfortunately not enough studies
have been conducted to draw relevant conclusions. Therefore,
it is expected that these OARs will receive more focus in the
future, especially because of their importance in the process
of the H&N RT planning. On the other hand, it has to be
again pointed out that all auto-segmentation results are com-
pared to corresponding reference segmentations, and their
definition is subjected to observer variability, meaning that
the reasonably achievable performance is not ideal segmenta-
tion, for example, it is not realistic to expect that the Dice
coefficient will reach 100% or that the Hausdorff and average
surface distance will drop to zero.
To conclude, the best performing methods achieve clini-
cally acceptable auto-segmentation for several H&N OARs,
even if manual corrections may still be needed, but certainly
they reduce the overall delineation time and observer variabil-
ity. To better evaluate the segmentation performance, future
studies should focus also on the dosimetric impact to provide
clinically relevant endpoints for RT planning.
5. CONCLUSIONS
We performed a systematic review of OAR auto-segmenta-
tion for H&N RT planning from 2008 to date. Besides outlin-
ing, analyzing and categorizing the relevant publications
within this field, we have provided also a critical discussion
of the corresponding advantages and limitations. The main
conclusions that may not only assist in the introduction to the
field but also be a valuable resource for studying existing or
developing new methods and evaluation strategies are as fol-
lows: (a) Image modality Both CT and MR image modali-
ties are being exploited for the task, but the potential of the
MR image modality for auto-segmentation of several soft tis-
sues should be explored more in the future. (b) OAR The
spinal cord, brainstem, and major salivary glands (the parotid
and submandibular glands) are the most studied OARs, how-
ever, more experiments should be conducted for auto-seg-
mentation of the pharyngeal constrictor muscles, larynx, and
cervical esophagus with the cricopharyngeal inlet that are
important for RT planning. (c) Image database Several
image databases with the corresponding ground truth are cur-
rently publicly available and should be used for an indepen-
dent performance evaluation of OAR auto-segmentation
approaches, however, they should be augmented with data
from multiple observers and multiple institutions. (d)
Methodology While ABAS was dominating in the past,
current approaches have shifted to DL, which resulted in
superior performance, and are expected to become even more
methodologically sophisticated and trained on larger image
databases. (e) Ground truth Delineation guidelines should
be followed for the ground truth generation, and participation
Medical Physics, 47 (9), September 2020
e946 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e946
of multiple experts from multiple institutions is recom-
mended for a reliable reporting of the intra/inter-observer
variability. (f) Performance metrics The Dice coefficient
as the standard volumetric overlap metrics should be always
accompanied with at least one distance metrics, preferably
the Hausdorff distance (or its 95-percentile version) and the
average symmetric surface distance, and future research
should focus on combining them with clinical acceptability
scores and risk assessments. (g) Segmentation performance
The best performing methods achieve clinically acceptable
auto-segmentation for several OARs, even if manual correc-
tions may still be needed, but certainly they reduce the overall
delineation time and observer variability, however, future
studies should focus also on the dosimetric impact to provide
clinically relevant endpoints for RT planning.
ACKNOWLEDGMENTS
This work was supported by the Slovenian Research
Agency (ARRS) under grants J2-1732, P2-0232 and P3-0307.
CONFLICT OF INTEREST
The authors declare no conflict of interest.
a)
Author to whom correspondence should be addressed. Electronic mail:
tomaz.vrtovec@fe.uni-lj.si.
REFERENCES
1. Bray F, Ferlay J, Soerjomataram I, Siegel R, Torre L, Jemal A. Global
cancer statistics 2018: GLOBOCAN estimates of incidence and mortal-
ity worldwide for 36 cancers in 185 countries. CA Cancer J Clin.
2018;68:394424.
2. Borras J, Barton M, Grau C, et al. The impact of cancer incidence and
stage on optimal utilization of radiotherapy: methodology of a popula-
tion based analysis by the ESTRO-HERO project. Radiother Oncol.
2015;116:4550.
3. Vinod S, Jameson M, Min M, Holloway L. Uncertainties in volume
delineation in radiation oncology: a systematic review and recommen-
dations for future studies. Radiother Oncol. 2016;121:169179.
4. Chaney E, Pizer S. Autosegmentation of images in radiation oncology.
J Am Coll Radiol. 2009;6:455458.
5. Sharp G, Fritscher K, Pekar V, et al. Vision 20/20: perspectives on
automated image segmentation for radiotherapy. Med Phys.
2014;41:050902.
6. Sahiner B, Pezeshk A, Hadjiiski L, et al. Deep learning in medical
imaging and radiation therapy. Med Phys. 2019;46:e1e36.
7. Seo H, Khuzani M, Vasudevan V, et al. Machine learning techniques for
biomedical image segmentation: an overview of technical aspects and
introduction to state-of-art applications. Med Phys. 2020;47:e148e167.
8. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional neural net-
works for biomedical image segmentation. In:Medical Image Comput-
ing and Computer-Assisted Intervention - MICCAI 2015. Volume 9351
of LNCS. Springer; 2015:234241.
9. C
ßicßek O, Abdulkadir A, Lienkamp S, Brox T, Ronneberger O. 3D U-
Net: learning dense volumetric segmentation from sparse annotation.
In: Medical Image Computing and Computer-Assisted Intervention -
MICCAI 2016, volume 9901 of LNCS. Springer; 2016:424432.
10. Milletari F, Navab N, Ahmadi S-A. V-Net: fully convolutional neural
networks for volumetric medical image segmentation. In: Fourth Inter-
national Conference on 3D Vision - 3DV 2016. IEEE; 2016:565571.
11. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional
encoder-decoder architecture for image segmentation. IEEE Trans Pat-
tern Anal Mach Intell. 2017;39:24812495.
12. Kamnitsas K, Ledig C, Newcombe V, et al. Efficient multi-scale 3D
CNN with fully connected CRF for accurate brain lesion segmentation.
Med Image Anal. 2017 36:6178.
13. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille A. DeepLab:
semantic image segmentation with deep convolutional nets. Atrous con-
volution, and fully connected CRFs. IEEE Trans Pattern Anal Mach
Intell. 2018;40:834848.
14. Chen H, Dou Q, Yu L, Qin J, Heng P-A. VoxResNet: deep voxelwise
residual networks for brain segmentation from 3D MR images. Neu-
roimage. 2018;170:446455.
15. He K, Gkioxari G, Doll
ar P, Girshick R. Mask R-CNN. IEEE Trans
Pattern Anal Mach Intell. 2020;42:386397.
16. Meyer P, Noblet V, Mazzara C, Lallement A. Survey on deep learning
for radiotherapy. Comput Biol Med. 2018;98:126146.
17. Thompson R, Valdes G, Fuller C, et al. Artificial intelligence in
radiation oncology imaging. Int J Radiat Oncol Biol Phys.2018;
102:11591161.
18. Boldrini L, Bibault J-E, Masciocchi C, Shen Y, Bittner MI. Deep
learning: a review for the radiation oncologist. Front Oncol.2019;
9:977.
19. Lim J, Leech M. Use of auto-segmentation in the delineation of target
volumes and organs at risk in head and neck. Acta Oncol. 2016;55:
799806.
20. Kosmin M, Ledsam J, Romera-Paredes B, et al. Rapid advances in
auto-segmentation of organs at risk and target volumes in head and
neck cancer. Radiother Oncol. 2019;135:130140.
21. Cardenas C, Yang J, Anderson B, Court L, Brock K. Advances in auto-
segmentation. Semin Radiat Oncol. 2019;29:185197.
22. Wong J, Fong A, McVicar N, et al. Comparing deep learning-based
auto-segmentation of organs at risk and clinical target volumes to
expert inter-observer variability in radiotherapy planning. Radiother
Oncol. 2020;144:152158.
23. van Dijk L, Van den Bosch L, Aljabar P et al. Improving automatic
delineation for head and neck organs at risk by deep learning contour-
ing. Radiother Oncol. 2020;142:115123.
24. Gou S, Tong N, Qi S, Yang S, Chin R, Sheng K. Self-channel-and-spa-
tial-attention neural network for automated multi-organ segmentation
on head and neck CT images. Phys Med Biol. 2020.
25. de Ruijter J, van Sambeek M, van de Vosse F, Lopata R. Automated 3D
geometry segmentation of the healthy and diseased carotid artery in free-
hand, probe tracked ultrasound images. Med Phys. 2020;47:10341047.
26. Vandewinckele L, Willems S, Robben D, etal. Segmentation of head-
and-neck organs-at-risk in longitudinal CT scans combining deformable
registrations and convolutional neural networks. Comput Methods Bio-
mech Biomed Eng Imaging Vis. 2020.
27. Fung N, Hung W, Sze C, Lee M, Ng W. Automatic segmentation for
adaptive planning in nasopharyngeal carcinoma IMRT: time, geometri-
cal, and dosimetric analysis. Med Dosim. 2020;45:6065.
28. Lei Y, Harms J, Dong X, et al. Organ-at-risk (OAR) segmentation in
head and neck CT using U-RCNN. In: SPIE Medical Imaging 2020:
Computer-Aided Diagnosis. Volume 11314. SPIE; 2020:1131444.
29. van der Heyden B, Wohlfahrt P, Eekers D, et al. Dual-energy CT for
automatic organs-at-risk segmentation in brain-tumor patients using a
multi-atlas and deep-learning approach. Sci Rep. 2019;9:4126.
30. Tang H, Chen X, Liu Y, et al. Clinically applicable deep learning
framework for organs at risk delineation in CT images. Sci Rep.
2019;1:480491.
31. Wang Y, Zhao L, Song Z, Wang M. Organ at risk segmentation in head
and neck CT images by using a two-stage segmentation framework
based on 3D U-Net. IEEE Access. 2019;7:144591144602.
32. van der Veen J, Willems S, Deschuymer S, et al. Benefits of deep
learning for delineation of organs at risk in head and neck cancer.
Radiother Oncol. 2019;138:6874.
33. Sun Y, Shi H, Zhang S, Wang P, Zhao W, Zhou X, Yuan K. Accurate
and rapid CT image segmentation of the eyes and surrounding organs
for precise radiotherapy. Med Phys. 2019;46:22142222.
34. Huang C, Badiei M, Seo H, et al. Atlas based segmentations via semi-
supervised diffeomorphic registrations. arXiv 1911.10417; 2019.
Medical Physics, 47 (9), September 2020
e947 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e947
35. Haq R, Berry S, Deasy J, Hunt M, Veeraraghavan H. Dynamic multi-
atlas selection based consensus segmentation of head and neck struc-
tures from CT images. Med Phys. 2019;46:56125622.
36. Rhee D, Cardenas C, Elhalawani H, et al. Automatic detection of con-
touring errors using convolutional neural networks. Med Phys.
2019;46:50865097.
37. Zhong T, Huang X, Tang F, Liang S, Deng X, Zhang Y. Boosting-based
cascaded convolutional neural networks for the segmentation of CT
organs-at-risk in nasopharyngeal carcinoma. Med Phys. 2019;46:5602
5611.
38. Agn M, Rosensch
old P, Puonti O, et al. A modality-adaptive method
for segmenting brain tumors and organs-at-risk in radiation therapy
planning. Med Image Anal. 2019;54:220237.
39. Qiu B, Guo J, Kraeima J. Automatic segmentation of the mandible
from computed tomography scans for 3D virtual surgical planning
using the convolutional neural network. Phys Med Biol.
2019;64:1750.
40. Tong N, Gou S, Yang S, Cao M, Sheng K. Shape constrained fully con-
volutional DenseNet with adversarial training for multiorgan segmenta-
tion on head and neck CT and low-field MR images. Med Phys.
2019;46:26692682.
41. Torosdagli N, Liberton D, Verma P, Sincan M, Lee J, Bagci U. Deep
geodesic learning for segmentation and anatomical landmarking. IEEE
Trans Med Imaging. 2019;38:919931.
42. Chan J, Kearney V, Haaf S, et al. A convolutional neural network algo-
rithm for automatic segmentation of head and neck organs-at-risk using
deep lifelong learning. Med Phys. 2019;46:22042213.
43. Chen H, Lu W, Chen M, et al. A recursive ensemble organ segmenta-
tion (REOS) framework: application in brain radiotherapy. Phys Med
Biol. 2019;64:025015.
44. Lee H, Lee E, Kim N, et al. Clinical evaluation of commercial atlas-
based auto-segmentation in the head and neck region. Front Oncol.
2019;9:239.
45. H
ansch A, Schwier M, Gass T, et al. Evaluation of deep learning meth-
ods for parotid gland segmentation from CT images. J Med Imaging.
2019;6:011005.
46. Zhu W, Huang Y, Zeng L, et al. AnatomyNet: deep learning for fast
and fully automated whole-volume segmentation of head and neck
anatomy. Med Phys. 2019;46:576589.
47. Liang S, Tang F, Huang X, et al. Deep-learning-based detection and
segmentation of organs at risk in nasopharyngeal carcinoma computed
tomographic images for radiotherapy planning. Eur Radiol. 2019;29:
1961 19 67.
48. Men K, Geng H, Cheng C, et al. Technical note: more accurate and
efficient segmentation of organs-at-risk in radiotherapy with convolu-
tional neural networks cascades. Med Phys. 2019;46:286292.
49. Tappeiner E, Pr
oll S, H
onig M, et al. Multi-organ segmentation of the
head and neck area: an efficient hierarchical neural networks approach.
Int J Comput Assist Radiol Surg. 2019;14:745754.
50. van Rooij W, Dahele M, Ribeiro Brandao Het al. Deep learning-based
delineation of head and neck organs-at-risk: geometric and dosimetric
evaluation. Int J Radiat Oncol Biol Phys. 2019;104:677684.
51. Wu X, Udupa J, Tong Y, et al. AAR-RT a system for auto-contouring
organs at risk on CT images for radiation therapy planning: principles,
design, and large-scale evaluation on head-and-neck and thoracic cancer
cases. Med Image Anal. 2019;54:4562.
52. Ayyalusamy A, Vellaiyan S, Subramanian S, et al. Auto-segmentation
of head and neck organs at risk in radiotherapy and its dependence on
anatomic similarity. Radiat Oncol J. 2019;37:134142.
53. Willems S, Crijns W, La Greca Saint-Esteven A, et al. Clinical imple-
mentation of DeepVoxNet for auto-delineation of organs at risk in head
and neck cancer patients in radiotherapy. In: Clinical Image-Based Pro-
cedures: Translational Research in Medical Imaging - CLIP 2018, vol-
ume 11041 of LNCS. Springer; 2018:223232.
54. Ren X, Xiang L, Nie D, et al. Interleaved 3D-CNNs for joint segmenta-
tion of small-volume structures in head and neck CT images. Med Phys.
2018;45:20632075.
55. Tong N, Gou S, Yang S, Ruan D, Sheng K. Fully automatic multi-organ
segmentation for head and neck cancer radiotherapy using shape repre-
sentation model constrained fully convolutional neural networks. Med
Phys. 2018;45:45584567.
56. Wang Z, Wei L, Wang L, Gao Y, Chen W, Shen D. Hierarchical vertex
regression-based segmentation of head and neck CT images for
radiotherapy planning. IEEE Trans Image Process. 2018;27:
923937.
57. Mo
cnik D, Ibragimov B, Xing L, et al. Segmentation of parotid glands
from registered CT and MR images. Phys Med. 2018;52:3341.
58. Kieselmann J, Kamerling C, Burgos N, et al. Geometric and dosimetric
evaluations of atlas-based segmentation methods of MR images in the
head and neck region. Phys Med Biol. 2018;63:145007.
59. Meillan N, Bibault J-E, Vautier J, et al. Automatic intracranial segmen-
tation: is the clinician still needed? Technol Cancer Res Treat.
2018;17:17.
60. Nikolov S, Blackwell S, Mendes R, et al. Deep learning to achieve clin-
ically applicable segmentation of head and neck anatomy for radiother-
apy. arXiv 1809.04430; 2018.
61. Yang J, Haas B, Fang R, et al. Atlas ranking and selection for automatic
segmentation of the esophagus from CT scans. Phys Med Biol.
2017;62:91409158.
62. Aghdasi N, Li Y, Berens A, Harbison R, Moe K, Hannaford B. Effi-
cient orbital structures segmentation with prior anatomical knowledge.
J Med Imaging. 2017;4:034501.
63. Urban S, Tan
acs A. Atlas-based global and local RF segmentation of
head and neck organs on multimodal MRI images. In: International
Symposium on Image Signal Processing Analysis - ISPA 2017. IEEE;
2017:99103.
64. Wachinger C, Brennan M, Sharp G, Golland P. Efficient descriptor-
based segmentation parotid glands with nonlocal means. IEEE Trans
Biomed Eng. 2017;64:14921502.
65. Ibragimov B, Xing L. Segmentation of organs-at-risks in head and neck
CT images using convolutional neural networks. Med Phys.
2017;44:547557.
66. Raudaschl P, Zaffino P, Sharp GC, et al. Evaluation of segmentation
methods on head and neck CT: auto-segmentation challenge 2015. Med
Phys. 2017;44:20202036.
67. Van de Velde J, Wouters J, Vercauteren T, et al. Optimal number of
atlases and label fusion for automatic multi-atlas-based brachial plexus
contouring in radiotherapy treatment planning. Radiat Oncol.
2016;11:1.
68. Wardman K, Prestwich R, Gooding M, Speight R. The feasibility of
atlas-based automatic segmentation of MRI for H&N radiotherapy
planning. J Appl Clin Med Phys. 2016;17:146154.
69. Zaffino P, Raudaschl P, Fritscher K, Sharp G, Spadea M. Technical
note: plastimatch mabs, an open source tool for automatic image seg-
mentation. Med Phys. 2016;43:5155.
70. Fritscher K, Raudaschl P, Zaffino P, Spadea M, Sharp G. Deep neural
networks for fast segmentation of 3D medical images. In: Medical
Image Computing and Computer-Assisted Intervention - MICCAI 2016,
volume 9901 of LNCS. Springer; 2016:158165.
71. Awan M, Dyer B, Kalpathy-Cramer J, et al. Auto-segmentation of the
brachial plexus assessed with TaCTICS a software platform for rapid
multiple-metric quantitative evaluation of contours. Acta Oncol.
2015;54:562566.
72. Wachinger C, Fritscher K, Sharp G, Golland P. Contour-driven
atlas-based segmentation. IEEE Trans Med Imaging. 2015;34:2492
2505.
73. Hoang DA, Eminowicz G, Mendes R, et al. Validation of clinical
acceptability of an atlas-based segmentation algorithm for the delin-
eation of organs at risk in head and neck cancer. Med Phys. 2015;42:
50275034.
74. Dolz J, Leroy H, Reyns N, Massoptier L, Vermandel M. A fast and
fully automated approach to segment optic nerves on MRI and its appli-
cation to radiosurgery. In: International Symposium on Biomedical
Imaging - ISBI 2015, pages 11021105. IEEE; 2015.
75. Yang X, Wu N, Cheng G, et al. Automated segmentation of the parotid
gland based on atlas registration and machine learning: a longitudinal
MRI study in head-and-neck radiation therapy. Int J Radiat Oncol Biol
Phys. 2014;90:12251233.
76. Fritscher K, Peroni M, Zaffino P, Spadea M, Schubert R, Sharp G.
Automatic segmentation of head and neck CT images for radiotherapy
treatment planning using multiple atlases. Statistical appearance
models, and geodesic active contours. Med Phys. 2014;41:051910.
Medical Physics, 47 (9), September 2020
e948 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e948
77. Thomson D, Boylan C, Liptrot T, et al. Evaluation of an automatic seg-
mentation algorithm for definition of head and neck organs at risk.
Radiat Oncol. 2014;9:173.
78. Sj
oberg C, Johansson S, Ahnesj
o A. How much will linked deformable
registrations decrease the quality of multi-atlas segmentation fusions?
Radiat Oncol. 2014;9:251.
79. Harrigan R, Panda S, Asman A, et al. Robust optic nerve segmentation
on clinically acquired computed tomography. J Med Imaging.
2014;1:034006.
80. Walker G, Awan M, Tao R, et al. Prospective randomized double-
blind study of atlas-based organ-at-risk autosegmentation-assisted
radiation planning in head and neck cancer. Radiother Oncol.
2014;112:321325.
81. Yang J, Amini A, Williamson R, et al. Automatic contouring of bra-
chial plexus using a multi-atlas approach for lung cancer radiation ther-
apy. Pract Radiat Oncol. 2013;3: 139e147.
82. Zhu M, Bzdusek K, Brink C, et al. Multi-institutional quantitative eval-
uation and clinical validation of smart probabilistic image contouring
engine (SPICE) autosegmentation of target structures and normal tis-
sues on computer tomography images in the head and neck, thorax,
liver, and male pelvis areas. Int J Radiat Oncol Biol Phys.
2013;87:809816.
83. Cheng G, Yang X, Wu N, Xu Z, Zhao H, Wang Y, Liu T. Multi-
atlas-based segmentation of the parotid glands of MR images in
patients following head-and-neck cancer radiotherapy. In: Medical
Imaging 2013: Computer-Aided Diagnosis, volume 8670, SPIE;
2013:86702Q.
84. Daisne J-F., Blumhofer A. Atlas-based automatic segmentation of head
and neck organs at risk and nodal target volumes: a clinical validation.
Radiat Oncol. 2013;8:154.
85. Chen A, Niermann K, Deeley M, Dawant B. Evaluation of multiple-at-
las-based strategies for segmentation of the thyroid gland in head and
neck CT images for IMRT. Phys Med Biol. 2012;57:93111.
86. Qazi A, Pekar V, Kim J, Xie J, Breen S, Jaffray D. Auto-segmentation
of normal and target structures in head and neck CT images: a feature-
driven model-based approach. Med Phys. 2011;38:61606170.
87. Teguh D, Levendag P, Voet P, et al. Clinical validation of atlas-based
auto-segmentation of multiple target volumes and normal tissue (swal-
lowing/mastication) structures in the head and neck. Int J Radiat Oncol
Biol Phys. 2011;81:950957.
88. Noble J, Dawant B. An atlas-navigated optimal medial axis and
deformable model algorithm (NOMAD) for the segmentation of the
optic nerves and chiasm in MR and CT images. Med Image Anal.
2011;15:877884.
89. Deeley M, Chen A, Datteri R, et al. Comparison of manual and auto-
matic segmentation methods for brain structures in the presence of
space-occupying lesions: a multi-expert study. Phys Med Biol.
2011;56:45574577.
90. Tsuji S, Hwang A, Weinberg V, Yom S, Quivey J, Xia P. Dosimetric
evaluation of automatic segmentation for adaptive IMRT for head-and-
neck cancer. Int J Radiat Oncol Biol Phys. 2010;77:707714.
91. Pekar V, Allaire S, Qazi A, Kim J, Jaffray D. Head and neck auto-seg-
mentation challenge: segmentation of the parotid glands. In: Medical
Image Analysis for the Clinic: A Grand Challenge 2010, MICCAI;
2010:273280.
92. Pekar V, Allaire S, Kim J, Jaffray D. Head and neck auto-segmentation
challenge. MIDAS J. 2009;5:5.
93. Sims R, Isambert A, Gr
egoire V, et al. A pre-clinical assessment of an
atlas-based automatic segmentation tool for the head and neck. Radio-
ther Oncol. 2009;93:474478.
94. Isambert A, Dhermain F, Bidault F, et al. Evaluation of an atlas-based
automatic segmentation software for the delineation of brain organs at
risk in a radiation therapy clinical context. Radiother Oncol.
2008;87:9399.
95. Han X, Hoogeman M, Levendag P, et al. Atlas-based auto-segmenta-
tion of head and neck CT images. In: Medical Image Computing and
Computer-Assisted Intervention - MICCAI 2008, volume 5242 of
LNCS, Springer; 2008:434441.
96. Bekes G, M
at
eE,Ny
ul L, Kuba A, Fidrich M. Geometrical model-
based segmentation of the organs of sight on CT images. Med Phys.
2008;35:735743.
97. Fortunati V, Verhaart R, Niessen W, Veenland J, Paulides M, van
Walsum T. Automatic tissue segmentation of head and neck MR
images for hyperthermia treatment planning. Phys Med Biol.
2015;60:65476562.
98. Verhaart R, Fortunati V, Verduijn G, van Walsum T, Veenland J, Pau-
lides M. CT-based patient modeling for head and neck hyperthermia
treatment planning: manual versus automatic normal-tissue-segmenta-
tion. Radiother Oncol. 2014;111:158163.
99. Fortunati V, Verhaart R, van der Lijn F, et al. Tissue segmentation of
head and neck CT images for treatment planning: a multiatlas
approach combined with intensity modeling. Med Phys. 2013;40:
071905.
100. Schneider U, Pedroni E, Lomax A. The calibration of CT Hounsfield
units for radiotherapy treatment planning. Phys Med Biol. 1996;41:
111124.
101. Pereira G, Traughber M, Muzic R. The role of imaging in radiation
therapy planning: past, present, and future. Biomed Res Int.2014;
2014:231090.
102. Brouwer C, Steenbakkers R, Bourhis J, et al. CT-based delineation of
organs at risk in the head and neck region: DAHANCA, EORTC,
GORTEC, HKNPCSG, NCIC CTG, NCRI, NRG Oncology and TROG
consensus guidelines. Radiother Oncol. 2015;117:8390.
103. Leibfarth S, M
onnich D, Welz S, et al. A strategy for multimodal
deformable image registration to integrate PET/MR into radiotherapy
treatment planning. Acta Oncol. 2013;52:13531359.
104. Fortunati V, Verhaart R, Angeloni F, et al. Feasibility of multimodal
deformable registration for head and neck tumor treatment planning.
Int J Radiat Oncol Biol Phys. 2014;90:8593.
105. Joint Head and Neck MRI-Radiotherapy Development Cooperative.
Prospective quantitative quality assurance and deformation estimation
of MRI-CT image registration in simulation of head and neck radiother-
apy patients. Clin Transl Radiat Oncol. 2019;18:120127.
106. Peroni M, Ciardo D, Spadea M, et al. Automatic segmentation and
online virtualCT in head-and-neck adaptive radiation therapy. Int J
Radiat Oncol Biol Phys. 2012;84:e427e433.
107. Hvid C, Elstrxxxom C, Jensen K, Alber M, Grau C. Accuracy of
software-assisted contour propagation from planning CT to cone
beam CT in head and neck radiotherapy. Acta Oncol. 2016;55:1324
1330.
108. Wang T, Bradshaw GB, Beitler J, et al. Optimal virtual monoenergetic
image in TwinBeamdual-energy CT for organs-at-risk delineation
based on contrast-noise-ratio in head-and-neck radiotherapy. J Appl
Clin Med Phys. 2019;20:121128.
109. Bhandare N, Mendenhall W. A literature review of late complications
of radiation therapy for head and neck cancers: incidence and dose
response. J Nucl Med Radiat Ther. 2012;S2:009.
110. Siddiqui F, Movsas B. Management of radiation toxicity in head and
neck cancers. Semin Radiat Oncol. 2017;27:340349.
111. Strojan P, Hutcheson K, Eisbruch A, et al. Treatment of late sequelae
after radiotherapy for head and neck cancer. Cancer Treat Rev.
2017;59:7992.
112. Clark K, Vendt B, Smith K, et al. The Cancer Imaging Archive
(TCIA): maintaining and operating a public information repository. J
Digit Imaging. 2013;26:10451057.
113. Prior F, Smith K, Sharma A, et al. The public cancer radiology
imaging collections of The Cancer Imaging Archive. Sci Data.
2017;4:170124.
114. Valli
eres M, Kay-Rivest E, Perrin L, et al. Radiomics strategies for risk
assessment of tumour failure in head-and-neck cancer. Sci Rep.
2017 ; 7: 10117.
115. Grossberg A, Mohamed A, Elhalawani H, et al. Imaging and clinical
data archive for head and neck squamous cell carcinoma patients treated
with radiotherapy. Sci Data. 2018;5:180173.
116. Cardenas C, Mohamed A, Yang J, et al. Head and neck cancer patient
images for determining auto-segmentation accuracy in T2-weighted
magnetic resonance imaging through expert manual segmentations.
Med Phys. 2020;47:23172322.
117. Fedorov A, Clunie D, Ulrich E, et al. DICOM for quantitative imaging
biomarker development: a standards based approach to sharing clinical
data and structured PET/CT analysis results in head and neck cancer
research. PeerJ. 2016;4:e2057.
Medical Physics, 47 (9), September 2020
e949 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e949
118. Beichel R, Smith BJ, Bauer C, et al. Multi-site quality and variability
analysis of 3D FDG PET segmentations based on phantom and clinical
image data. Med Phys. 2017;44:479496.
119. La Macchia M, Fellin F, Amichetti M, et al. Systematic evaluation of
three different commercial software solutions for automatic segmenta-
tion for adaptive therapy in head-and-neck, prostate and pleural cancer.
Radiat Oncol. 2012;7:160.
120. Kearney V, Chan J, Valdes G, Solberg T, Yom S. The application of
artificial intelligence in the IMRT planning process for head and neck
cancer. Oral Oncol. 2018;87:111116.
121. Van de Velde J, Audenaert E, Speleers B, et al. An anatomically vali-
dated brachial plexus contouring method for intensity modulated radia-
tion therapy planning. Int J Radiat Oncol Biol Phys. 2013;87:802808.
122. Sun Y, Yu XL, Luo W, et al. Recommendation for a contouring method
and atlas of organs at risk in nasopharyngeal carcinoma patients receiving
intensity-modulated radiotherapy. Radiother Oncol. 2014;110:390397.
123. Kong F, Ritter T, Quint D, et al. Consideration of dose limits for organs
at risk of thoracic radiotherapy: atlas for lung, proximal bronchial tree,
esophagus, spinal cord, ribs, and brachial plexus. Int J Radiat Oncol
Biol Phys. 2011;81:14421457.
124. Christianen M, Langendijk J, Westerlaan H, van de Water T, Bijl H.
Delineation of organs at risk involved in swallowing for radiotherapy
treatment planning. Radiother Oncol. 2011;101:394402.
125. van de Water T, Bijl H, Westerlaan H, Langendijk J. Delineation guide-
lines for organs at risk involved in radiation-induced salivary dysfunc-
tion and xerostomia. Radiother Oncol. 2009;93:545552.
126. Pacholke H, Amdur R, Schmalfuss I, Louis D, Mendenhall W. Con-
touring the middle and inner ear on radiotherapy planning scans. Am J
Clin Oncol. 2005;28:143147.
127. Hall W, Guiou M, Lee N, et al. Development and validation of a stan-
dardized method for contouring the brachial plexus: preliminary dosi-
metric analysis among patients treated with IMRT for head-and-neck
cancer. Int J Radiat Oncol Biol Phys. 2008;72:13621367.
128. Chen W, Zhang H, Zhang W, et al. Development of a contouring guide
for three different types of optic chiasm: a practical approach. J Med
Imaging Radiat Oncol. 2019;63:657664.
129. Taha A, Hanbury A. Metrics for evaluating 3D medical image segmentation:
analysis, selection, and tool. BMC Med Imaging. 2015;15:29.
130. Maier-Hein L, Eisenmann M, Reinke A, et al. Why rankings of
biomedical image analysis competitions should be interpreted with
care. Nat Commun. 2018;9:5217.
131. Armato S, Tahir B, Sharp G. AAPM grand challenges symposium.
Med Phys. 2019;46:e485e486.
132. Iglesias J, Sabuncu M. Multi-atlas segmentation of biomedical images:
a survey. Med Image Anal. 2015;24: 205219.
133. Edmund J, Nyholm T. A review of substitute CT generation for MRI-
only radiation therapy. Radiat Oncol. 2017;12:28.
134. Adjeiwaah M, Bylund M, Lundman J, et al. Dosimetric impact of MRI
distortions: a study on head and neck cancers. Int J Radiat Oncol Biol
Phys. 2019;103:9941003.
135. Raaymakers BW, J
urgenliemk-Schulz IM, Bol GH, et al. First patients
treated with a 1.5 T MRI-Linac: clinical proof of concept of a high-pre-
cision, high-field MRI guided radiotherapy treatment. Phys Med Biol.
2017;62:L41L50.
136. Lei Y, Harms J, Wang T, et al. MRI-only based synthetic CT generation
using dense cycle consistent generative adversarial networks. Med
Phys. 2019;46:35653581.
137. Klages P, Benslimane I, Riyahi S, et al. Patch-based generative adver-
sarial neural network models for head and neck MR-only planning.
Med Phys. 2020;47:626642.
138. Comelli A, Stefano A, Bignardi S, et al. Active contour algorithm with
discriminant analysis for delineating tumors in positron emission
tomography. Artif Intell Med. 2019;94:6778.
139. Schipaanboord B, Boukerroui D, Peressutti D, et al. Can atlas-based
auto-segmentation ever be perfect? Insights from extreme value theory.
IEEE Trans Med Imaging. 2019;38:99106.
140. Larrue A, Gujral D, Nutting C, Gooding M. The impact of the number
of atlases on the performance of automatic multi-atlas contouring. Phys
Med. 2015;31:e30.
141. Ibtehaz N, Rahman M. MultiResUNet: rethinking the U-Net architec-
ture for multimodal biomedical image segmentation. Neural Netw.
2020;121:7487.
142. Zhang X, Wang L, Yang D, etal. Generalizing deep learning for medi-
cal image segmentation to unseen domains via deep stacked transfor-
mation. IEEE Trans Med Imaging;. 2020;(in press).
143. Nelms B, Tom
e W, Robinson G, Wheeler J. Variations in the contour-
ing of organs at risk: test case from a patient with oropharyngeal can-
cer. Int J Radiat Oncol Biol Phys. 2012;82:368378.
144. Brouwer C, Steenbakkers R, van den Heuvel E, et al. 3D variation
in delineation of head and neck organs at risk. Radiat Oncol.
2012;7:32.
145. Tao C-J, Yi J-L, Chen N-Y, et al. Multi-subject atlas-based auto-
segmentation reduces interobserver variation and improves dosimet-
ric parameter consistency for organs at risk in nasopharyngeal car-
cinoma: a multi-institution clinical study. Radiother Oncol.
2015;115:407411.
146. Krayenbuehl J, Zamburlini M, Ghandour S, et al. Planning comparison
of five automated treatment planning solutions for locally advanced
head and neck cancer. Radiat Oncol. 2018;13:170.
147. Graves Y, Smith AA, McIlvena D, et al. A deformable head and neck
phantom with in-vivo dosimetry for adaptive radiotherapy quality assur-
ance. Med Phys. 2015;42:14901497.
148. Li J, Udupa J, Tong Y, Wang L, Torigian D. LinSEM: linearizing seg-
mentation evaluation metrics for medical images. Med Image Anal.
2020;60:101601.
149. Loo S, Martin W, Smith P, Cherian S, Roques T. Interobserver variation
in parotid gland delineation: a study of its impact on intensity-modu-
lated radiotherapy solutions with a systematic review of the literature.
Br J Radiol. 2012;85:10701077.
150. Voet P, Dirkx M, Teguh D, Hoogeman M, Levendag P, Heijmen B.
Does atlas-based autosegmentation of neck levels require subsequent
manual contour editing to avoid risk of severe target underdosage? A
dosimetric analysis. Radiother Oncol. 2011;98:373377.
151. Delaney A, Dahele M, Slotman B, Verbakel W. Is accurate contouring
of salivary and swallowing structures necessary to spare them in head
and neck VMAT plans? Radiother Oncol. 2018;127:190196.
152. Lim T, Gillespie E, Murphy J, Moore K. Clinically oriented contour
evaluation using dosimetric indices generated from automated knowl-
edge-based planning. Int J Radiat Oncol Biol Phys. 2019;103:1251
1260.
153. Aliotta E, Nourzadeh H, Siebers J. Quantifying the dosimetric impact
of organ-at-risk delineation variability in head and neck radiation ther-
apy in the context of patient setup uncertainty. Phys Med Biol
2019;64:135020.
Medical Physics, 47 (9), September 2020
e950 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e950
... Regions of interest (ROIs) segmentation in medical images is a fundamental and valuable issue that has been extensively studied in the past few years. Before the advent of deep learning in this field, the atlas-based approach was considered as a significant solution (6). The atlas-based segmentation approach employs a predefined template and registers it to the individual QSM map through different algorithms, yielding the label of corresponding ROIs. ...
... Where foc l and dce l are the focal and dice loss function with a detailed description in Eqs. [6] and [7]. where  is a small constant to ensure that the denominator is not zero. ...
... Additionally, manual target contouring is a tedious and laborious work, typically taking more than 20 min, 7,8 which increases the heavy workload on limited time treatment planning.Therefore,computerized autosegmentation has been suggested as an alternative tool to promote the efficiency and repeatability of BT treatment. 9,10 Recently, Atlas-based segmentation and Modelbased segmentation methods are two available modalities for auto-delineating algorithms. 11,12 But the most significant shortcomings of them are limited to the specific anatomy and image contrast present in the atlas or training data, and require a large amount of representative atlas or annotated training data to achieve high accuracy. ...
... Consequently, the feature map was transmitted to PE and formed the position encoding of each point, which was combined with the feature map information to create queries (Q). For example, the feature map size was (10,28,28), the matrix of its threedimensional coordinates was denoted as (3, 10, 28, 28), the 256-dimensional vector was generated by PE for each point, finally, the position encoding of each point and the feature map of the corresponding point were obtained as Q (256, 10, 28, 28). Bounding box, mainly using two coordinate end points of the body diagonal from the smallest outer cuboid of the organ of interest, was shown as "box" in the figure for prompt information including two types. ...
Article
Full-text available
Purpose To create and evaluate a three‐dimensional (3D) Prompt‐nnUnet module that utilizes the prompts‐based model combined with 3D nnUnet for producing the rapid and consistent autosegmentation of high‐risk clinical target volume (HR CTV) and organ at risk (OAR) in high‐dose‐rate brachytherapy (HDR BT) for patients with postoperative endometrial carcinoma (EC). Methods and materials On two experimental batches, a total of 321 computed tomography (CT) scans were obtained for HR CTV segmentation from 321 patients with EC, and 125 CT scans for OARs segmentation from 125 patients. The numbers of training/validation/test were 257/32/32 and 87/13/25 for HR CTV and OARs respectively. A novel comparison of the deep learning neural network 3D Prompt‐nnUnet and 3D nnUnet was applied for HR CTV and OARs segmentation. Three‐fold cross validation and several quantitative metrics were employed, including Dice similarity coefficient (DSC), Hausdorff distance (HD), 95th percentile of Hausdorff distance (HD95%), and intersection over union (IoU). Results The Prompt‐nnUnet included two forms of parameters Predict‐Prompt (PP) and Label‐Prompt (LP), with the LP performing most similarly to the experienced radiation oncologist and outperforming the less experienced ones. During the testing phase, the mean DSC values for the LP were 0.96 ± 0.02, 0.91 ± 0.02, and 0.83 ± 0.07 for HR CTV, rectum and urethra, respectively. The mean HD values (mm) were 2.73 ± 0.95, 8.18 ± 4.84, and 2.11 ± 0.50, respectively. The mean HD95% values (mm) were 1.66 ± 1.11, 3.07 ± 0.94, and 1.35 ± 0.55, respectively. The mean IoUs were 0.92 ± 0.04, 0.84 ± 0.03, and 0.71 ± 0.09, respectively. A delineation time < 2.35 s per structure in the new model was observed, which was available to save clinician time. Conclusion The Prompt‐nnUnet architecture, particularly the LP, was highly consistent with ground truth (GT) in HR CTV or OAR autosegmentation, reducing interobserver variability and shortening treatment time.
... BioMedical Engineering OnLine (2024) 23:52 Recently, only a few comprehensive reviews have provided detailed summaries of existing multi-organ segmentation methods. For example, Fu et al. [30] summarized literature of deep learning-based multi-organ segmentation methods up to 2020, providing a comprehensive overview of developments in this field; Vrtovec et al. [31] systematically analyzed 78 papers published between 2008 and 2020 on the automatic segmentation of OARs in the head and neck. However, these reviews encounter certain issues. ...
Article
Full-text available
Accurate segmentation of multiple organs in the head, neck, chest, and abdomen from medical images is an essential step in computer-aided diagnosis, surgical navigation, and radiation therapy. In the past few years, with a data-driven feature extraction approach and end-to-end training, automatic deep learning-based multi-organ segmentation methods have far outperformed traditional methods and become a new research topic. This review systematically summarizes the latest research in this field. We searched Google Scholar for papers published from January 1, 2016 to December 31, 2023, using keywords “multi-organ segmentation” and “deep learning”, resulting in 327 papers. We followed the PRISMA guidelines for paper selection, and 195 studies were deemed to be within the scope of this review. We summarized the two main aspects involved in multi-organ segmentation: datasets and methods. Regarding datasets, we provided an overview of existing public datasets and conducted an in-depth analysis. Concerning methods, we categorized existing approaches into three major classes: fully supervised, weakly supervised and semi-supervised, based on whether they require complete label information. We summarized the achievements of these methods in terms of segmentation accuracy. In the discussion and conclusion section, we outlined and summarized the current trends in multi-organ segmentation.
Article
Objective Segmentation, the partitioning of patient imaging into multiple, labeled segments, has several potential clinical benefits but when performed manually is tedious and resource intensive. Automated deep learning (DL)‐based segmentation methods can streamline the process. The objective of this study was to evaluate a label‐efficient DL pipeline that requires only a small number of annotated scans for semantic segmentation of sinonasal structures in CT scans. Study Design Retrospective cohort study. Setting Academic institution. Methods Forty CT scans were used in this study including 16 scans in which the nasal septum (NS), inferior turbinate (IT), maxillary sinus (MS), and optic nerve (ON) were manually annotated using an open‐source software. A label‐efficient DL framework was used to train jointly on a few manually labeled scans and the remaining unlabeled scans. Quantitative analysis was then performed to obtain the number of annotated scans needed to achieve submillimeter average surface distances (ASDs). Results Our findings reveal that merely four labeled scans are necessary to achieve median submillimeter ASDs for large sinonasal structures—NS (0.96 mm), IT (0.74 mm), and MS (0.43 mm), whereas eight scans are required for smaller structures—ON (0.80 mm). Conclusion We have evaluated a label‐efficient pipeline for segmentation of sinonasal structures. Empirical results demonstrate that automated DL methods can achieve submillimeter accuracy using a small number of labeled CT scans. Our pipeline has the potential to improve pre‐operative planning workflows, robotic‐ and image‐guidance navigation systems, computer‐assisted diagnosis, and the construction of statistical shape models to quantify population variations. Level of Evidence N/A
Article
Full-text available
Background and Purpose Artificial intelligence (AI) is a technique which tries to think like humans and mimic human behaviors. It has been considered as an alternative in a lot of human-dependent steps in radiotherapy (RT), since the human participation is a principal uncertainty source in RT. The aim of this work is to provide a systematic summary of the current literature on AI application for RT, and to clarify its role for RT practice in terms of clinical views. Materials and Methods A systematic literature search of PubMed and Google Scholar was performed to identify original articles involving the AI applications in RT from the inception to 2022. Studies were included if they reported original data and explored the clinical applications of AI in RT. Results The selected studies were categorized into three aspects of RT: organ and lesion segmentation, treatment planning and quality assurance. For each aspect, this review discussed how these AI tools could be involved in the RT protocol. Conclusions Our study revealed that AI was a potential alternative for the human-dependent steps in the complex process of RT.
Article
Full-text available
An ever-expanding suite of cancer imaging tools is being created with the help of AI and ML. To design the best tool, it's important to include experts from other fields to determine the right use case, then test and refine the tool thoroughly before implementing it into healthcare systems. Showcasing significant advancements in the field, this interdisciplinary study. We go over the pros and downsides of using AI and ML for cancer imaging, some things to keep in mind when turning algorithms into tools for widespread use, and how to build an ecosystem that will help AI and ML expand in this field
Article
Full-text available
Nasopharyngeal carcinoma is a significant health challenge that is particularly prevalent in Southeast Asia and North Africa. MRI is the preferred diagnostic tool for NPC due to its superior soft tissue contrast. The accurate segmentation of NPC in MRI is crucial for effective treatment planning and prognosis. We conducted a search across PubMed, Embase, and Web of Science from inception up to 20 March 2024, adhering to the PRISMA 2020 guidelines. Eligibility criteria focused on studies utilizing DL for NPC segmentation in adults via MRI. Data extraction and meta-analysis were conducted to evaluate the performance of DL models, primarily measured by Dice scores. We assessed methodological quality using the CLAIM and QUADAS-2 tools, and statistical analysis was performed using random effects models. The analysis incorporated 17 studies, demonstrating a pooled Dice score of 78% for DL models (95% confidence interval: 74% to 83%), indicating a moderate to high segmentation accuracy by DL models. Significant heterogeneity and publication bias were observed among the included studies. Our findings reveal that DL models, particularly convolutional neural networks, offer moderately accurate NPC segmentation in MRI. This advancement holds the potential for enhancing NPC management, necessitating further research toward integration into clinical practice.
Article
Full-text available
Background Contrast‐enhanced computed tomography (CECT) provides much more information compared to non‐enhanced CT images, especially for the differentiation of malignancies, such as liver carcinomas. Contrast media injection phase information is usually missing on public datasets and not standardized in the clinic even in the same region and language. This is a barrier to effective use of available CECT images in clinical research. Purpose The aim of this study is to detect contrast media injection phase from CT images by means of organ segmentation and machine learning algorithms. Methods A total number of 2509 CT images split into four subsets of non‐contrast (class #0), arterial (class #1), venous (class #2), and delayed (class #3) after contrast media injection were collected from two CT scanners. Seven organs including the liver, spleen, heart, kidneys, lungs, urinary bladder, and aorta along with body contour masks were generated by pre‐trained deep learning algorithms. Subsequently, five first‐order statistical features including average, standard deviation, 10, 50, and 90 percentiles extracted from the above‐mentioned masks were fed to machine learning models after feature selection and reduction to classify the CT images in one of four above mentioned classes. A 10‐fold data split strategy was followed. The performance of our methodology was evaluated in terms of classification accuracy metrics. Results The best performance was achieved by Boruta feature selection and RF model with average area under the curve of more than 0.999 and accuracy of 0.9936 averaged over four classes and 10 folds. Boruta feature selection selected all predictor features. The lowest classification was observed for class #2 (0.9888), which is already an excellent result. In the 10‐fold strategy, only 33 cases from 2509 cases (∼1.4%) were misclassified. The performance over all folds was consistent. Conclusions We developed a fast, accurate, reliable, and explainable methodology to classify contrast media phases which may be useful in data curation and annotation in big online datasets or local datasets with non‐standard or no series description. Our model containing two steps of deep learning and machine learning may help to exploit available datasets more effectively.
Article
Contouring accuracy is critical in modern radiotherapy. Several tools are available to assist clinicians in this task. This study aims to evaluate the performance of the smoothing tool in the ARIA system to obtain more consistent volumes. Eleven different geometric shapes were delineated in ARIA v15.6 (Sphere, Cube, Square Prism, Six-Pointed Star Prism, Arrow Prism, And Cylinder and the respective volumes at 45° of axis deviation (_45)) in 1, 3, 5, 7, and 10 cm side or diameter each. Post-processing drawing tools to smooth those first-generated volumes were applied in different options (2D-ALL vs 3D) and grades (1, 3, 5, 10, 15, and 20). These volumetric transformations were analyzed by comparing different parameters: volume changes, center of mass, and DICE similarity coefficient index. Then we studied how smoothing affected two different volumes in a head and neck cancer patient: a single rounded node and the volume delineating cervical nodal areas. No changes in data were found between 2D-ALL or 3D smoothing. Minimum deviations were found (range from 0 to 0.45 cm) in the center of mass. Volumes and the DICE index decreased as the degree of smoothing increased. Some discrepancies were found, especially in figures with cleft and spikes that behave differently. In the clinical case, smoothing should be applied only once throughout the target delineation process, preferably in the largest volume (PTV) to minimize errors. Smoothing is a good tool to reduce artifacts due to the manual delineation of radiotherapy volumes. The resulting volumes must be always carefully reviewed.
Article
Full-text available
Purpose The use of magnetic resonance imaging (MRI) in radiotherapy treatment planning has rapidly increased due to its ability to evaluate patient’s anatomy without the use of ionizing radiation and due to its high soft tissue contrast. For these reasons, MRI has become the modality of choice for longitudinal and adaptive treatment studies. Automatic segmentation could offer many benefits for these studies. In this work, we describe a T2‐weighted MRI dataset of head and neck cancer patients that can be used to evaluate the accuracy of head and neck normal tissue auto‐segmentation systems through comparisons to available expert manual segmentations. Acquisition and validation methods T2‐weighted MRI images were acquired for 55 head and neck cancer patients. These scans were collected after radiotherapy computed tomography (CT) simulation scans using a thermoplastic mask to replicate patient treatment position. All scans were acquired on a single 1.5 T Siemens MAGNETOM Aera MRI with two large four‐channel flex phased‐array coils. The scans covered the region encompassing the nasopharynx region cranially and supraclavicular lymph node region caudally, when possible, in the superior–inferior direction. Manual contours were created for the left/right submandibular gland, left/right parotids, left/right lymph node level II, and left/right lymph node level III. These contours underwent quality assurance to ensure adherence to predefined guidelines, and were corrected if edits were necessary. Data format and usage notes The T2‐weighted images and RTSTRUCT files are available in DICOM format. The regions of interest are named based on AAPM’s Task Group 263 nomenclature recommendations (Glnd_Submand_L, Glnd_Submand_R, LN_Neck_II_L, Parotid_L, Parotid_R, LN_Neck_II_R, LN_Neck_III_L, LN_Neck_III_R). This dataset is available on The Cancer Imaging Archive (TCIA) by the National Cancer Institute under the collection “AAPM RT‐MAC Grand Challenge 2019” (https://doi.org/10.7937/tcia.2019.bcfjqfqb). Potential applications This dataset provides head and neck patient MRI scans to evaluate auto‐segmentation systems on T2‐weighted images. Additional anatomies could be provided at a later time to enhance the existing library of contours.
Article
Full-text available
In recent years, significant progress has been made in developing more accurate and efficient machine learning algorithms for segmentation of medical and natural images. In this review article, we highlight the imperative role of machine learning algorithms in enabling efficient and accurate segmentation in the field of medical imaging. We specifically focus on several key studies pertaining to the application of machine learning methods to biomedical image segmentation. We review classical machine learning algorithms such as Markov random fields, k‐means clustering, random forest, etc. Although such classical learning models are often less accurate compared to the deep‐learning techniques, they are often more sample efficient and have a less complex structure. We also review different deep‐learning architectures, such as the artificial neural networks (ANNs), the convolutional neural networks (CNNs), and the recurrent neural networks (RNNs), and present the segmentation results attained by those learning models that were published in the past 3 yr. We highlight the successes and limitations of each machine learning paradigm. In addition, we discuss several challenges related to the training of different machine learning models, and we present some heuristics to address those challenges.
Article
Full-text available
Advances in technical radiotherapy have resulted in significant sparing of organs at risk (OARs), reducing radiation-related toxicities for patients with cancer of the head and neck (HNC). Accurate delineation of target volumes (TVs) and OARs is critical for maximising tumour control and minimising radiation toxicities. When performed manually, variability in TV and OAR delineation has been shown to have significant dosimetric impacts for patients on treatment. Auto-segmentation (AS) techniques have shown promise in reducing both inter-practitioner variability and the time taken in TV and OAR delineation in HNC. Ultimately, this may reduce treatment planning and clinical waiting times for patients. Adaptation of radiation treatment for biological or anatomical changes during therapy will also require rapid re-planning; indeed, the time taken for manual delineation currently prevents adaptive radiotherapy from being implemented optimally. We are therefore standing on the threshold of a transformation of routine radiotherapy planning via the use of artificial intelligence. In this article, we outline the current state-of-the-art for AS for HNC radiotherapy in order to predict how this will rapidly change with the introduction of artificial intelligence. We specifically focus on delineation accuracy and time saving. We argue that, if such technologies are implemented correctly, AS should result in better standardisation of treatment for patients and significantly reduce the time taken to plan radiotherapy.
Article
Full-text available
Accurate segmentation of organs-at-risk (OARs) is necessary for adaptive head and neck (H&N) cancer treatment planning but manual delineation is tedious, slow, and inconsistent. A Self-Channel-and-Spatial-Attention neural network (SCSA-Net) is developed for H&N OARs segmentation on CT images. To simultaneously ease the training and improve the segmentation performance, the proposed SCSA-Net utilizes the self-attention ability of the network. Spatial and channel-wise attention learning mechanisms are both employed to adaptively force the network to emphasize on the meaningful features and weaken the irrelevant features simultaneously. The proposed network was first evaluated on a public dataset, which includes 48 patients, then on a separate serial CT dataset, which contains ten patients who received weekly diagnostic fan-beam CT scans. On the second dataset, the accuracy of using SCSA-Net to track the parotid and submandibular gland volume changes during radiotherapy treatment was quantified. Dice similarity coefficient (DSC), positive predictive value (PPV), sensitivity (SEN), average surface distance (ASD), and 95%maximum surface distance (95SD) were calculated on the brainstem, optic chiasm, optic nerves, mandible, parotid glands, and submandibular glands to evaluate the proposed SCSA-Net. The proposed SCSA-Net consistently outperforms the state-of-the-art methods on the public dataset. Specifically, compared with the Res-Net and SE-Net, which is constructed by the Squeeze-and-Excitation block equipped Residual blocks, the DSC of the optic nerves and submandibular glands is improved by 0.06, 0.03 and 0.05, 0.04 by the SCSA-Net. Moreover, the proposed method achieves statistically significant improvements in terms of DSC on all and 8 of 9 OARs over Res-Net and SE-Net, respectively. The trained network was able to achieve good segmentation results on the serial dataset, but the results were further improved after fine-tuning of the model using the simulation CT images. For the parotids and submandibular glands, the volume changes of individual patients are highly consistent between the automated and manual segmentation (Pearson's Correlation 0.97-0.99). The proposed SCSA-Net is computationally efficient to perform segmentation (~2 seconds/CT).
Article
Full-text available
Recent advances in deep learning for medical image segmentation demonstrate expert-level accuracy. However, application of these models in clinically realistic environments can result in poor generalization and decreased accuracy, mainly due to the domain shift across different hospitals, scanner vendors, imaging protocols, and patient populations etc. Common transfer learning and domain adaptation techniques are proposed to address this bottleneck. However, these solutions require data (and annotations) from the target domain to retrain the model, and is therefore restrictive in practice for widespread model deployment. Ideally, we wish to have a trained (locked) model that can work uniformly well across unseen domains without further training. In this paper, we propose a deep stacked transformation approach for domain generalization. Specifically, a series of n stacked transformations are applied to each image during network training. The underlying assumption is that the “expected” domain shift for a specific medical imaging modality could be simulated by applying extensive data augmentation on a single source domain, and consequently, a deep model trained on the augmented “big” data (BigAug) could generalize well on unseen domains. We exploit four surprisingly effective, but previously understudied, image-based characteristics for data augmentation to overcome the domain generalization problem. We train and evaluate the BigAug model (with n = 9 transformations) on three different 3D segmentation tasks (prostate gland, left atrial, left ventricle) covering two medical imaging modalities (MRI and ultrasound) involving eight publicly available challenge datasets. The results show that when training on relatively small dataset (n=10~32 volumes, depending on the size of the available datasets) from a single source domain: (i) BigAug models degrade an average of 11% (Dice score change) from source to unseen domain, substantially better than conventional augmentation (degrading 39%) and CycleGAN-based domain adaptation method (degrading 25%), (ii) BigAug is better than “shallower" stacked transforms (i.e. those with fewer transforms) on unseen domains and demonstrates modest improvement to conventional augmentation on the source domain, (iii) after training with BigAug on one source domain, performance on an unseen domain is similar to training a model from scratch on that domain when using the same number of training samples. When training on large datasets (n=465 volumes) with BigAug, (iv) application to unseen domains reaches the performance of state-of-the-art fully supervised models that are trained and tested on their source domains. These findings establish a strong benchmark for the study of domain generalization in medical imaging, and can be generalized to the design of highly robust deep segmentation models for clinical deployment.
Article
Full-text available
Purpose Rupture of an arterosclerotic plaque in the carotid artery is a major cause of stroke. Biomechanical analysis of plaques is under development aiming to aid the clinician in the assessment of plaque vulnerability. Patient‐specific three‐dimensional (3D) geometry assessment of the carotid artery, including the bifurcation, is required as input for these biomechanical models. This requires a high‐resolution, 3D, noninvasive imaging modality such as ultrasound (US). In this study, a high‐resolution two‐dimensional (2D) linear array in combination with a magnetic probe tracking device and automatic segmentation method was used to assess the geometry of the carotid artery. The advantages of using this system over a 3D ultrasound probe are its higher resolution (spatial and temporal) and its larger field of view. Methods A slow sweep (v = ± 5 mm/s) was made over the subject’s neck so that the full geometry of the bifurcated geometry of the carotid artery is captured. An automated segmentation pipeline was developed. First, the Star‐Kalman method was used to approximate the center and size of the vessels for every frame. Images were filtered with a Gaussian high‐pass filter before conversion into the 2D monogenic signals, and multiscale asymmetry features were extracted from these data, enhancing low lateral wall‐lumen contrast. These images, in combination with the initial ellipse contours, were used for an active deformable contour model to segment the vessel lumen. To segment the lumen–plaque boundary, Otsu’s automatic thresholding method was used. Distension of the wall due to the change in blood pressure was removed using a filter approach. Finally, the contours were converted into a 3D hexahedral mesh for a patient‐specific solid mechanics model of the complete arterial wall. Results The method was tested on 19 healthy volunteers and on 3 patients. The results were compared to manual segmentation performed by three experienced observers. Results showed an average Hausdorff distance of 0.86 mm and an average similarity index of 0.91 for the common carotid artery (CCA) and 0.88 for the internal and external carotid artery. For the total algorithm, the success rate was 89%, in 4 out of 38 datasets the ICA and ECA were not sufficient visible in the US images. Accurate 3D hexahedral meshes were successfully generated from the segmented images . Conclusions With this method, a subject‐specific biomechanical model can be constructed directly from a hand‐held 2D US measurement, within 10 min, with a minimal user input. The performance of the proposed segmentation algorithm is comparable to or better than algorithms previously described in literature. Moreover, the algorithm is able to segment the CCA, ICA, and ECA including the carotid bifurcation in transverse B‐mode images in both healthy and diseased arteries.
Article
Manual image segmentation is a time-consuming task routinely performed in radiotherapy to identify each patient's targets and anatomical structures. The efficacy and safety of the radiotherapy plan requires accurate segmentations as these regions of interest are generally used to optimize and assess the quality of the plan. However, reports have shown that this process can be subject to significant inter- and intraobserver variability. Furthermore, the quality of the radiotherapy treatment, and subsequent analyses (ie, radiomics, dosimetric), can be subject to the accuracy of these manual segmentations. Automatic segmentation (or auto-segmentation) of targets and normal tissues is, therefore, preferable as it would address these challenges. Previously, auto-segmentation techniques have been clustered into 3 generations of algorithms, with multiatlas based and hybrid techniques (third generation) being considered the state-of-the-art. More recently, however, the field of medical image segmentation has seen accelerated growth driven by advances in computer vision, particularly through the application of deep learning algorithms, suggesting we have entered the fourth generation of auto-segmentation algorithm development. In this paper, the authors review traditional (nondeep learning) algorithms particularly relevant for applications in radiotherapy. Concepts from deep learning are introduced focusing on convolutional neural networks and fully-convolutional networks which are generally used for segmentation tasks. Furthermore, the authors provide a summary of deep learning auto-segmentation radiotherapy applications reported in the literature. Lastly, considerations for clinical deployment (commissioning and QA) of auto-segmentation software are provided.
Article
Background: Deep learning-based auto-segmented contours (DC) aim to alleviate labour intensive contouring of organs at risk (OAR) and clinical target volumes (CTV). Most previous DC validation studies have a limited number of expert observers for comparison and/or use a validation dataset related to the training dataset. We determine if DC models are comparable to Radiation Oncologist (RO) inter-observer variability on an independent dataset. Methods: Expert contours (EC) were created by multiple ROs for central nervous system (CNS), head and neck (H&N), and prostate radiotherapy (RT) OARs and CTVs. DCs were generated using deep learning-based auto-segmentation software trained by a single RO on publicly available data. Contours were compared using Dice Similarity Coefficient (DSC) and 95% Hausdorff distance (HD). Results: Sixty planning CT scans had 2-4 ECs, for a total of 60 CNS, 53 H&N, and 50 prostate RT contour sets. The mean DC and EC contouring times were 0.4 vs 7.7 min for CNS, 0.6 vs 26.6 min for H&N, and 0.4 vs 21.3 min for prostate RT contours. There were minimal differences in DSC and 95% HD involving DCs for OAR comparisons, but more noticeable differences for CTV comparisons. Conclusions: The accuracy of DCs trained by a single RO is comparable to expert inter-observer variability for the RT planning contours in this study. Use of deep learning-based auto-segmentation in clinical practice will likely lead to significant benefits to RT planning workflow and resources.