ArticlePDF AvailableLiterature Review

Auto‐segmentation of organs at risk for head and neck radiotherapy planning: From atlas‐based to deep learning methods

July 2020
Medical Physics 47(9)

July 2020
47(9)

DOI:10.1002/mp.14320

Authors:

Tomaž Vrtovec

University of Ljubljana

Show all 5 authorsHide

Radiotherapy (RT) is one of the basic treatment modalities for cancer of the head and neck (H&N), which requires a precise spatial description of the target volumes and organs at risk (OARs) to deliver a highly conformal radiation dose to the tumor cells while sparing the healthy tissues. For this purpose, target volumes and OARs have to be delineated and segmented from medical images. As manual delineation is a tedious and time‐consuming task subjected to intra/interobserver variability, computerized auto‐segmentation has been developed as an alternative. The field of medical imaging and RT planning has experienced an increased interest in the past decade, with new emerging trends that shifted the field of H&N OAR auto‐segmentation from atlas‐based to deep learning‐based approaches. In this review, we systematically analyzed 78 relevant publications on auto‐segmentation of OARs in the H&N region from 2008 to date, and provided critical discussions and recommendations from various perspectives: image modality — both computed tomography and magnetic resonance image modalities are being exploited, but the potential of the latter should be explored more in the future; OAR — the spinal cord, brainstem, and major salivary glands are the most studied OARs, but additional experiments should be conducted for several less studied soft tissue structures; image database — several image databases with the corresponding ground truth are currently available for methodology evaluation, but should be augmented with data from multiple observers and multiple institutions; methodology — current methods have shifted from atlas‐based to deep learning auto‐segmentation, which is expected to become even more sophisticated; ground truth — delineation guidelines should be followed and participation of multiple experts from multiple institutions is recommended; performance metrics — the Dice coefficient as the standard volumetric overlap metrics should be accompanied with at least one distance metrics, and combined with clinical acceptability scores and risk assessments; segmentation performance — the best performing methods achieve clinically acceptable auto‐segmentation for several OARs, however, the dosimetric impact should be also studied to provide clinically relevant endpoints for RT planning.

The chronological distribution of 78 reviewed publications in the field of organ at risk auto‐segmentation in the head and neck region. [Color figure can be viewed at wileyonlinelibrary.com]

…

Figures - available from: Medical Physics

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Bulat Ibragimov

Content may be subject to copyright.

Auto-segmentation of organs at risk for head and neck radiotherapy

planning: From atlas-based to deep learning methods

Toma

z Vrtovec

and Domen Mo

cnik

Faculty Electrical Engineering, University of Ljubljana, Trza

ska cesta 25, Ljubljana SI-1000, Slovenia

Primo

z Strojan

Institute of Oncology Ljubljana, Zalo

ska cesta 2, Ljubljana SI-1000, Slovenia

Franjo Pernu

Faculty Electrical Engineering, University of Ljubljana, Trza

ska cesta 25, Ljubljana SI-1000, Slovenia

Bulat Ibragimov

Faculty Electrical Engineering, University of Ljubljana, Trza

ska cesta 25, Ljubljana SI-1000, Slovenia

Department of Computer Science, University of Copenhagen, Universitetsparken 1, Copenhagen D-2100, Denmark

(Received 26 October 2019; revised 27 May 2020; accepted for publication 29 May 2020;

published 28 July 2020)

Radiotherapy (RT) is one of the basic treatment modalities for cancer of the head and neck (H&N),

which requires a precise spatial description of the target volumes and organs at risk (OARs) to deliver

a highly conformal radiation dose to the tumor cells while sparing the healthy tissues. For this pur-

pose, target volumes and OARs have to be delineated and segmented from medical images. As man-

ual delineation is a tedious and time-consuming task subjected to intra/interobserver variability,

computerized auto-segmentation has been developed as an alternative. The field of medical imaging

and RT planning has experienced an increased interest in the past decade, with new emerging trends

that shifted the field of H&N OAR auto-segmentation from atlas-based to deep learning-based

approaches. In this review, we systematically analyzed 78 relevant publications on auto-segmentation

of OARs in the H&N region from 2008 to date, and provided critical discussions and recommenda-

tions from various perspectives: image modality —both computed tomography and magnetic reso-

nance image modalities are being exploited, but the potential of the latter should be explored more in

the future; OAR —the spinal cord, brainstem, and major salivary glands are the most studied OARs,

but additional experiments should be conducted for several less studied soft tissue structures; image

database —several image databases with the corresponding ground truth are currently available for

methodology evaluation, but should be augmented with data from multiple observers and multiple

institutions; methodology —current methods have shifted from atlas-based to deep learning auto-

segmentation, which is expected to become even more sophisticated; ground truth —delineation

guidelines should be followed and participation of multiple experts from multiple institutions is rec-

ommended; performance metrics —the Dice coefficient as the standard volumetric overlap metrics

should be accompanied with at least one distance metrics, and combined with clinical acceptability

scores and risk assessments; segmentation performance —the best performing methods achieve clin-

ically acceptable auto-segmentation for several OARs, however, the dosimetric impact should be also

Physicists in Medicine [https://doi.org/10.1002/mp.14320]

Key words: auto-segmentation, deep learning, head and neck, organs at risk, radiotherapy planning

1. INTRODUCTION

Cancer in the region of the head and neck (H&N), compris-

ing malignancies of the lips, oral cavity, pharynx, larynx,

nasal cavity and paranasal sinuses, salivary glands, and thy-

roid has a yearly incidence of approximately 1.5 million

worldwide,

making it one of the most prominent cancers. In

addition to surgery and chemotherapy, radiotherapy (RT) is

an important treatment modality for the H&N cancer, with an

optimal utilization rate in patients presented with this malig-

nancy of around 80%.

The aim of RT is to deliver a high

radiation dose to the targeted cancerous cells to ensure clini-

cally required tumor control probability and, at the same

time, spare the nearby healthy tissues to prevent acute radia-

tion toxicity and serious late complications for the treated

patient. The optimal radiation dose distribution is calculated

in an optimization process using the inverse planning

approach, which requires a precise spatial description of the

target volumes as well as of the organs at risk (OARs). This

knowledge is commonly obtained by trained radiation oncol-

ogists and, in some instances, also other experts from the

field performing manual delineation, or segmentation, of the

target volumes and OARs from the acquired three-dimen-

sional (3D) images of the patient.

Medical image segmentation as the process of partitioning

an image into multiple anatomical structures is, in general, a

challenging task that is hampered by the high variability of

medical images. The source of variability is commonly repre-

sented by different imaging modalities revealing different charac-

teristics of the human anatomy, for example, conventional

radiographic (x rays), computed tomography (CT), and magnetic

resonance (MR) imaging, various imaging artifacts causing weak

or missing boundaries, for example, noise, intensity inhomogene-

ity, partial volume effect and motion, and variable image appear-

ance of anatomical structures under segmentation, for example,

due to pathological changes or the natural biological variability of

the human anatomy. Nevertheless, image segmentation is impor-

tant from the perspective of analyzing the properties of the

obtained structures, and while manual delineation may still be the

approach of choice, it is a time-consuming and tedious task sub-

jected to intra/interobserver variability.

Alternatively, computer-

ized techniques based on medical image processing and analysis

have been developed that replace manual with automated seg-

mentation, or auto-segmentation,

4,5

which eliminates the subjec-

tive bias of the observer, accelerates the whole process and, as a

result, reduces the total workload in terms of human resources.

In the past decade, the field of computerized medical

imaging has experienced an increased interest, with new

emerging trends that are largely focused on deep learning

(DL)

as a subset of machine learning that mimics the data

processing of the human brain for the purpose of decision-

making. In comparison to traditional approaches based on

conventional atlases, shape models and feature classification,

DL has shown superior image segmentation performance that

was conveyed by several milestone auto-segmentation frame-

works,

for example, the U-Net,

3D U-Net,

V-Net,

Seg-

Net,

DeepMedic,

DeepLab,

VoxResNet

and Mask R-

CNN.

Several ideas have been adopted for RT,

16,17

includ-

ing for image segmentation and detection, image phenotyp-

ing, radiomic signature discovery, clinical outcome

prediction, image dose quantification, dose-response model-

ing, radiation adaptation, and image generation,

and there-

fore also impacted the area of auto-segmentation of OARs in

the H&N region

19–21

so as to provide a qualitative support for

guiding critical treatment planning and delivery decisions. In

this review, we provide a detailed overview of the existing

studies for auto-segmentation of OARs in the H&N region by

systematically outlining, analyzing, and categorizing the rele-

vant publications in the field from 2008 to date.

2. METHODOLOGY

In May 2020, a search was conducted on the Web of

Science (https://apps.webofknowledge.com) and PubMed

(https://www.ncbi.nlm.nih.gov/pubmed/) on-line citation

indexing services, with the topic keyword (auto OR auto-

matic) AND (segmentation OR contouring OR delineation)

AND (head AND neck) with a time span from 2008 to date.

Studies not concerned with OAR auto-segmentation in the

H&N region, as well as longitudinal studies and dosimetric

studies without geometric validations were excluded. The

obtained relevant publications were further supplemented

with selected publications found in their list of references. A

detailed analysis of the resulting publications was then con-

ducted from the perspective of image modality,OAR,image

database,methodology,ground truth,performance metrics,

and segmentation performance.

3. RESULTS

In the field of OAR auto-segmentation for RT planning in

the H&N region, the search on the Web of Science and

PubMed yielded, respectively, 281 and 257 results. After

reviewing their abstracts, 49 were considered to be relevant

and were further supplemented with selected publications

from their list of references. In total, we collected 75 publica-

tions

22–96

focused on RT planning and three studies focused

on hyperthermia therapy planning

97–99

from 2008 to date

(Fig. 1), along with three review papers related to auto-seg-

mentation in the H&N region.

19–21

The results of analyzing

these publications from different perspectives are presented

in the following subsections.

3.A. Image modality

The RT planning is primarily performed using CT imag-

ing information because the data on electron density, required

for the calculation of the radiation beam energy absorption

and dose distribution, is derived directly from the CT image

intensities

100,101

. As a result, segmentation of the target vol-

umes and OARs has to be generated from the planning CT

images, therefore making CT the prevailing image modality

also for auto-segmentation approaches (Table I). While CT

images provide a good visibility of the bony anatomy, the

contrast differences between various soft tissues are relatively

low, and can be to a certain degree improved by using an

intravenous contrast enhancement agent.

68,84,95,98,99

On the other hand, MR imaging gained a broad adoption

because of its superior soft tissue contrast resolution

FIG. 1. The chronological distribution of 78 reviewed publications in the

field of organ at risk auto-segmentation in the head and neck region.

Medical Physics, 47 (9), September 2020

e930 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e930

compared to CT images and various imaging setups. In the

recent consensus for CT-based manual delineation guidelines

for OARs in the H&N region,

102

it is strongly recommended

to use, besides CT, also MR images to facilitate the delin-

eation of several soft tissue OARs. Auto-segmentation of

OARs from MR images can be also performed indepen-

dently,

58,63,68,74,94,97

and the resulting segmentation masks

are then propagated to the planning CT images by applying

the geometric transformations of the corresponding MR-to-

CT image registration. Alternatively, image registration can

be performed first, and auto-segmentation is then performed

simultaneously on both image modalities.

57,88,89

While the

obtained results combine the information of the CT and MR

image modality, both approaches rely on an accurate intrapa-

tient multimodal image registration.

103–10 5

Similar challenges are present in the case of adaptive RT,

when cone beam CT (CBCT) images are often obtained

between sessions for verifying the patient setup or adjusting

the treatment plan to anatomical changes, as they can be

acquired faster and at lower radiation doses in comparison

to classical CT images. As a pretreatment planning CT

image is always acquired and segmented to plan the dose

distribution, auto-segmentation of CBCT images can be

obtained by CBCT-to-CT registration followed by propaga-

tion of presegmented OARs back to CBCT images.

106,107

Other image modalities can be optionally provided to

obtain complementary information, for example, positron

emission tomography (PET) images can be acquired simulta-

neously with CT or MR images, however, they are not used

for OAR but rather for target volume auto-segmentation.

On the other hand, specific OARs (e.g., the carotid artery)

can be successfully auto-segmented only from ultrasound

(US) images,

while the feasibility of using dual-energy CT

(DECT) has been recently explored from the perspective of

selecting the optimal energy level for generating the virtual

monoenergetic image,

108

in which different H&N OARs can

be segmented.

3.B. Organ at risk

Auto-segmentation is commonly performed for OARs

whose RT-induced damage proved to be linked to late

complications that may endanger the life of the patient or

considerably reduce its quality (Table II).

109–11 1

Major sali-

vary glands, that is, the parotid and submandibular glands,

are among the most frequently delineated OARs because of

their importance for a sufficient secretion and proper com-

position of saliva, and therefore for the prevention of xeros-

tomia, and associated problems with swallowing, speech,

and oral health. The eyeballs,vitreous humor,optic chi-

asm,optic nerves,lens,sclera,cornea, and lacrimal glands

have to be spared to prevent optic neuropathy leading to an

impaired vision or even blindness, while the commonly

delineated nervous tissues are the spinal cord and brain,

including the brainstem,cerebrum,cerebellum, and pitu-

itary gland. In particular, segmentation of the former is of

critical importance due to potentially devastating conse-

quences (i.e., tetraplegia) of its over-irradiation. The pha-

ryngeal constrictor muscles and cervical esophagus with

the cricopharyngeal inlet have to be spared to prevent the

swallowing dysfunction.

Other relevant OARs include the thyroid,larynx,trachea,

cochlea,chewing muscles,oral cavity,mastoids,temporo-

mandibular joints,mandible, and brachial plexus, as their

malfunction is connected with a variety of problems (e.g.,

hypothyroidism, swallowing problems, including aspiration

with resulted pulmonary morbidity, hearing decrease, osteo-

radionecrosis, brachial plexopathy). Although the lips and

carotid arteries are commonly delineated for the purpose of

RT planning, reports on auto-segmentation of these OARs

are very limited.

3.C. Image database

Auto-segmentation methods are validated on a wide range

of image databases (Table III). Several methods utilize a sub-

set of all available samples as an atlas or as a training set,

while the remaining samples then constitute the test set,

which serves to evaluate the auto-segmentation performance

and accuracy. When the set of all available samples is rela-

tively small, cross-validation (k-fold or, when kequals the

number of samples, leave-one-out) is commonly employed to

enable all available samples to be used for testing.

Among the reviewed publications, one database

stands

out as it was devised from CT images of 3495 patients result-

ing in 8251702 training set samples for each studied OAR.

On the other hand, there are several databases of H&N

images that are publicly available. The Cancer Imaging

Archive (TCIA) (https://www.cancerimagingarchive.net/), an

open-access resource platform of medical images for cancer

research,

112 ,113

currently contains 12 databases of the H&N

region, for example, the Head-Neck Cetuximab (https://doi.

org/10.7937/K9/TCIA.2015.7AKGJUPZ),

22,30,46,60,66

Head-

Neck-PET-CT (https://doi.org/10.7937/K9/TCIA.2017.8oje

5q00),

22,30,46,114

TCGA-HNSC (https://doi.org/10.7937/K9/

TCIA.2016.LXKQ47MS)

22,60

and Data from Head and Neck

Cancer CT Atlas (https://doi.org/10.7937/K9/TCIA.2017.

umz8dv6s)

22,115

CT image databases, the RT-MAC (https://d

oi.org/10.7937/tcia.2019.bcfjqfqb)

116

MR image database, or

TABLE I. Image modalities used for auto-segmentation of organs at risk in the

head and neck region for the purpose of radiotherapy planning, and the corre-

sponding references.

Image modality

Computed tomography (CT)

Conventional CT

22–24,26–28,30–42,44–57,59–62,64–70,72,73,76–82,84–93,95,96,98,99

Dual-energy CT (DECT)

Magnetic resonance (MR)

T1-weighted MR

38,40,43,57–59,63,68,74,88,89,94

T2-weighted MR

38,63,75,83,97

Ultrasound (US)

Medical Physics, 47 (9), September 2020

e931 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e931

the QIN-HEADNECK (https://doi.org/10.7937/K9/TCIA.

2015.K0F5CGLI)

117,118

PET-CT image database.

Although many TCIA databases include reference H&N

OAR delineations, they are associated with considerable variabil-

ity because of the lack of a standardized delineation protocol. As

a result, some of them were augmented and/or combined into

new publicly available databases, for example, the manual delin-

eations of 28 OARs in 140 CT images from the Head-Neck

Cetuximab and Head-Neck-PET-CT databasesaswellasin175

CT images from an in-house database (https://github.com/uci-

cbcl/UaNet#Data),

the manual delineations of 21 OARs in 31

CT images from the Head-Neck Cetuximab and TCGA-HNSC

databases forming the TCIA test & validation radiotherapy CT

planning scan dataset (TCIA-RT) (https://github.com/deepmind/

tcia-ct-scan-dataset) database,

or the manual delineations of

nine OARs in 48 CT images from the Head-Neck Cetuximab

database forming the Public domain database for computational

anatomy (PDDCA) (http://www.imagenglab.com/newsite/pddca/

) database.

Examples of publicly available databases that do not origi-

nate from TCIA include the StructSeg (https://structseg2019.gra

nd-challenge.org/Dataset/) database consisting of 50 CT images

with 22 manually delineated OARs, and the MRI-RT (https://f

igshare.com/s/a5e09113f5c07b3047df) database

105

consisting

of 15 CT and 15 MR images of the same patients with 23 man-

ually delineated OARs from the H&N region.

3.D. Methodology

The most common approach for segmenting OARs from

H&N images is atlas-based auto-segmentation (ABAS),

which has been frequently implemented in commercial

tools.

5,66,119

In ABAS, the image undergoing segmentation is

first registered to images with known reference segmentation

masks that form the atlas, and then these reference masks are,

according to the geometrical transformations obtained from

the registration, propagated back and fused into the final seg-

mentation. To improve the results of ABAS, contour and level

set refinement methods were applied to enhance the bound-

aries of the segmented OARs. Also, models of intensity or

models of shape and appearance were generated to restrain

the registration, and machine learning techniques were used

to improve feature classification (Table IV).

Recently, DL techniques have been applied to various

steps of the RT workflow, including auto-segmen-

tation,

17,18, 12 0

resulting in a superior performance in compar-

ison to other classification and regression methods. The most

popular architecture for DL-based auto-segmentation of med-

ical images is the U-Net,

which originates from the fully

convolutional neural networks (CNNs) and consists of a con-

tracting path and an expansive path in the shape of the letter

U. Through convolution, activation, and pooling, the con-

tracting path reduces spatial while increasing feature informa-

tion, and the expansive path performs up-convolutions of the

feature and spatial information with lateral concatenations of

low- and high-level feature maps. The architecture was

released as open-source (https://lmb.informatik.uni-freib

urg.de/resources/opensource/unet/) and was, with additional

augmentations, extended to the 3D U-Net,

V-Net

and

AnatomyNet.

On the other hand, the DeepMedic

frame-

work is based on 3D CNNs and consists of two parallel con-

volutional paths for processing the input at multiple scales to

achieve a large receptive field for classification while using

small convolutional kernels that are associated with relatively

low computational costs. Although it was originally devel-

oped for segmenting brain lesions, it was also released as

open-source (https://biomedia.doc.ic.ac.uk/software/deepmed

ic/) and consequently applied in many different fields, includ-

ing H&N OAR auto-segmentation, as well as augmented into

new architectures, such as the DeepVoxNet.

TABLE II. Organs at risk in the head and neck region involved in auto-seg-

mentation for the purpose of radiotherapy planning, and the corresponding

references.

Organ at risk

Parotid glands

22–24,26–32,34–37,40,42,45–58,60,63–66,68–70,72,73,75–78,80,82–

84,86,87,90,91,93,95

Submandibular glands

22–24,26,30–

32,34,35,40,42,46,50,51,53,55,60,65,66,69,70,77,78,80,82,86,87,95

Brainstem

22–24,26,27,29–32,35,36,38,40,42,43,46–50,52–

56,59,60,66,68,69,73,76,80,82,84,86,87,89,90,92–95,97–99

Brain, cerebrum and cerebellum

23,36,60,82,94,97–99

Temporal lobes

27,30

Hippocampus

Pituitary gland

30,33,94

Spinal cord and spinal canal

22,23,26–28,30,32,34–36,42,47,48,51–

53,58,60,63,65,68,73,80,82,87,90,95,97–99

Cerebrospinal fluid

Eyeballs and vitreous humor

22,29,30,33,36,38,43,47,48,59,60,62,65,68,73,79,82,89,94,96–

Optic chiasm

22,24,27,30,31,36,38,40,43,46,49,54,55,59,65,66,70,73,80,88,89,94

Optic nerves

22,24,27,29–31,33,34,36–

38,40,43,46,47,49,54,55,59,60,62,65,66,69,74,79,80,88,89,94,96,98,99

Lens

29,30,33,36,47,59,60,96–99

Sclera

97–99

Cornea

Lacrimal glands

Extraocular muscle

Mandible

23,24,26,28,30–32,34–36,39–42,44,46–49,51–

56,58,60,65,66,69,78,80,82,86,90,92,93,95

Oral cavity

23,26,28,30,32,35,42,47,50,52,53,80

Temporo-mandibular joints

30,42,47

Mastoids

Chewing muscles

87,95

Pharyngeal constrictor muscles

23,26,28,32,40,50–53,65,77,80,87

Cervical esophagus and cricopharyngeal inlet

23,26,28,32,36,42,50–53,61

Thyroid

23,30,37,44,85,98,99

Larynx

26,28,30,32,35,40,42,47,50–53,65,77,80

Trachea

30,52,63

Cochlea

26,32,36,53,60,77,80

Brachial plexus

30,67,71,81

Carotid artery

23,25

Medical Physics, 47 (9), September 2020

e932 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e932

Other DL architectures adopt specific mechanisms to improve

the auto-segmentation of OARs in the H&N region. For exam-

ple, the self-channel-and-spatial-attention neural network

(SCSA-Net)

is equipped with attention learning, a technique

for strengthening the discriminative ability of the segmentation

network with minimal or no additional layers, the DenseNet

employs adversarial learning, a technique where two CNNs com-

pete in generating more accurate predictions, while the regional

CNN (R-CNN)

can be used for rapidly detecting the location

of OARs before actual segmentation.

3.E. Ground truth

The quality of the resulting auto-segmentation is evalu-

ated by the comparison against the corresponding refer-

ence segmentation, often referred to as the ground truth.

Manual delineation (contouring) of OARs in images per-

formed by human experts (e.g., radiation oncologists,

diagnostic radiologists) is the main approach for generat-

ing the ground truth. However, it is a time-consuming

(e.g., 3–6 hours per image for up to 20 OARs

19,87,98

tedious, and costly task that is limited by the subjective

human interpretation of organ boundaries, which is mani-

fested through the intra- and interobserver variability in

the delineation (Table V). Most studies therefore rely on

a single set of ground truth per image, nevertheless, stud-

ies report also two,

32,60,63,79,88,93

three,

22,25,41,58,75,97,99

four,

71,98

five

, or even eight

independently obtained

sets of ground truth per image. An anatomically validated

ground truth was introduced for a single OAR, that is,

the brachial plexus,

6,121

so that its manual delineations

obtained from high-resolution MR images of up to 12

cadavers were first validated by dissection and then regis-

tered to corresponding CT images to obtain the ground

truth for the purpose of RT planning.

In some cases, multiple ground truth sets were com-

bined into a consensus by generating probability maps,

(weighted) majority voting,

44,69

performing intensity-based

patch-based label fusion (Patch),

applying the simultane-

ous truth and performance level estimation (STAPLE)

expectation maximization algorithm

67,77,81,89

that estimates

the correct segmentation by weighting each input by its

estimated performance level, or applying the similarity and

truth estimation for propagated segmentations (STEPS)

algorithm

that introduces a spatially variant image simi-

larity term into STAPLE. Alternatively, a less labor inten-

sive but relatively biased approach for generating the

ground truth is to manually correct the auto-segmentation

boundaries

73,77,80,84,85,87,93

or to merge different auto-seg-

mentation results with, for example, the STAPLE algo-

rithm.

To mitigate the intra- and interobserver delineation vari-

ability, well-defined guidelines have been proposed

102,121–128

that help ensuring the consistency and accuracy of manual

delineation. The most established consensus

102

encompasses

a complete set of OARs in the H&N region, with the expert

recommendation to always include the parotid glands, sub-

mandibular glands, spinal cord, and pharyngeal constrictor

muscles in the RT plan. Other guidelines are focused on OARs

involved in the nasopharyngeal carcinoma (i.e., the temporal

lobe, parotid glands, spinal cord, and inner and middle ear),

122

swallowing (i.e., the pharyngeal constrictor muscles,

cricopharyngeal muscle, esophagus inlet muscles, cervical

esophagus, base of tongue, and larynx),

124

salivary function-

ing (i.e., the parotid glands, submandibular glands, sublingual

gland, and minor salivary glands in the soft palate, lips, and

cheeks),

125

hearing and balance (i.e., the inner and middle

ear),

126

brachial plexopathy (i.e., the brachial plexus and

TABLE III. Number of samples included in image databases used for auto-

segmentation of organs at risk in the head and neck region for the purpose of

radiotherapy planning, and the corresponding references.

Image database (number of samples)

510 5:L,

10,

10:L,

10:L

1118 11:L,

12:L,

12,

13,

14:L,

5|10,

15:5F,

8|8,16:L,

16,

18:L,

18:L

2025 20:L,

y 20,

20,

21:L,

14|10 ,

25:LN,

15|10,

10|15

3033 30,

15|15,

30:L,

20|10N,

32,

22|10N,

40,55

33:LN,

33:2FN

3950 25|14N,

25|15N,

30|10,

40,

41,

42,

44:5F,

45:L,32N,

33|15N,

31,54

32+6|10N,

50:5F,

40|10

7095 70,

74,

48+12|20,

70|17,

10|80,

70|20,

70+10|15,

100:L,

100:5F

>100 52+8|49,

100 |10,

100 +20|20,

142 |15,

185:4F,

160 +20|20,

246*,

234|20,15N,

261|10N,

215|100,

328|20,

389+51|46,+6|24•,15 N,

475+5|20

>500 549+40|104

>1000 (660+1651362+340)|(48168),24•

Legend: n–number of cases with a model or without a training set; m|n–mcases

for training, ncases for testing; m+k|n–mcases for training (if omitted, models

are used), kcases for model selection, ncases for testing; n:kF–ncases with the

k-fold cross-validation; n:L –ncases with the leave-one-out validation; * –for 30

patients, 2 or more images available, together 36|262; N–evaluated on the

PDDCA database;

•–evaluated on the TCIA-RT database.

TABLE IV. Methodology applied for auto-segmentation of organs at risk in

the head and neck region for the purpose of radiotherapy planning, and the

corresponding references.

Methodology

Atlas

27,29,34,44,52,58,59,61,68,69,71,73,78–81,84,85,87,89–91,93,94,99

with shape/appearance models

38,66,76,77,82,86,92,95

with intensity models

97–99

with feature classification

35,63,72,75,83,86

with contour refinement

72,76,92

with level set refinement

Feature classification

64,74

Localization model and feature classification

51,56

Level-set statistical model

88,89

Shape models

25,62,96

Deep learning

23,24,37,40,47,49,54,57,65,70

with U-Net and its versions

22,28–31,33,36,39,41–43,45,46,50,55,60

with DeepMedic and its versions

26,32,53

Medical Physics, 47 (9), September 2020

e933 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e933

TABLE V. Observer variability of manual delineations of organs of risk in the head and neck region, and the corresponding references (cf. Table VI for the listof

metrics).

Observer variability

Parotid glands

DC (%) 91m;f(o =5,p =10,S),

91 (o =2,p =32),

89 3,

87 3(o =2,p =24,•),

84 4(o =3,p =12),

91m;f,

83 2

(o =8,p =16),

145

81 (o =2,p =13),

77 8(o =32,p =1)

143

SC (%) sDC: 94.4 2.8 (s=2.85mm,o =2,p =24,•)

HD (mm) HD91m;f: 10.7 4.4 (o =3,p =12)

; DTA91m;f:91

m;f(o =5,p =10,S)

; HD91m;f:91

m;f,

5.0 1.7 (o =3,p =12)

ASD (mm) ASSD: 1.8 0.2

; ASD91m;f:1.40.5 (o =3,p =12)

; DTA91m;f:91

m;f(o =5,p =10,S)

Submandibular glands

DC (%) 91(o =2,p =64)

,91

m;f(o =5,p =10,S),

87 5,

91m;f,

83 20 (o =2,p =24,•),

77 5(o =8,p =16)

145

SC (%) sDC: 89 21.2 (s=2.02mm,o =2,p =24,•)

HD (mm) DTA91m;f:91

m;f(o =5,p =10,S)

; HD91m;f:91

m;f

ASD (mm) ASSD: 1.5 0.2

; DTA91m;f:91

m;f(o =5,p =10,S)

Brainstem

DC (%) 91m;f(o =3,p =11),

92 (o =2,p =45),

90 2(o =2,p =24,•),

91m;f,

84(82,85) (intra,o =4,p =7),

83 3

(o =8,p =16),

145

83 10 (o =8,p =20),

91m;f(o =3,p =13),

78(73,85) (o =4,p =7),

68 12,

66 17 (o =31,

p=1)

143

SC (%) sDC: 96.7 2.5 (s=2.5mm,o =2,p =24,•)

; sPPV: 91m;f(s=2mm,o =8,p =20)

HD (mm) HD91m;f:91

m;f(o =3,p =13)

; HD91m;f:91

m;f(o =3,p =11),

91m;f

ASD (mm) ASSD: 2.2 0.5

; ASD91m;f: 1.1(0.9,1.2) (intra,o =4,p =7)

,91

m;f(o =3,p =13)

, 1.7(1.1,2.4) (o =4,p =7)

SSD (mm) SDTA91m;f: 0.8 (o =8,p =20,p)

; SDTA91m;f:3.9 (o =8,p =20,p)

; SDTA91m;f: 7.5 (o =8,p =20,p)

Brain, cerebrum (CBR) and cerebellum (CBE)

DC (%) 99 0.3 (o =2,p =24,•),

99 (o =2,p =75),

99 (CBR,intra,o =4,p =7),

98 1(o =10,p =1),

143

91m;f(CBR,o =3,

p=13),

91m;f(CBR,o =3,p =11),

94(93,95) (CBR,o =4,p =7),

91m;f(CBE,o =3,p =11),

94(91,95) (CBE,intra,

o=4,p =7),

91m;f(CBE,o =3,p =13),

86(84,88) (CBE,o =4,p =7)

SC (%) sDC: 96.2 1.1 (s=1.01mm,o =2,p =24,•)

HD (mm) HD91m;f:91

m;f(CBE,o =3,p =13),

91m;f(CBR,o =3,p =13)

; HD91m;f:91

m;f(CBR,o =3,p =11),

91m;f(CBE,o =3,

p=11)

ASD (mm) ASD91m;f: 0.4 (CBR,intra,o =4,p =7),

0.9(0.6,1.2) (CBE,intra,o =4,p =7),

91m;f(CBR,o =3,p =13),

91m;f(CBE,

o=3,p =13),

2.2(1.8,2.5) (CBE,o =4,p =7),

2.4(2.0,2.9) (CBR,o =4,p =7)

Temporal lobes

DC (%) 82 2(o =8,p =16)

145

Pituitary gland

DC (%) 65 8(o =8,p =16)

145

Spinal cord and spinal canal

DC (%) 95 (canal,o =2,p =23),

94 2 (canal,o =2,p =24,•),

91m;f(o =2,p =15),

91m;f(o =3,p =11),

88 (o =2,

p=24),

85(84,87) (intra,o =4,p =7),

84 5(o =2,p =24,•),

91m;f,

80 7(o =29,p =1),

143

79 7(o =3,

p=12),

79(73,84) (o =4,p =7),

91m;f(o =3,p =13),

77 4(o =8,p =16),

145

71 7

SC (%) sDC: 99.8 0.4 (s=2.93mm,o =2,p =24,•),

95 2 (canal,s=1.17mm,o =2,p =24,•)

HD (mm) HD91m;f:91

m;f(o =3,p =13),

7.1 5.2 (o =3,p =12)

; HD91m;f:91

m;f(o =3,p =11),

91m;f,

4.6 3.1 (o =3,

p=12)

ASD (mm) ASSD: 4.4 1.9

; ASD91m;f: 0.6 (intra,o =4,p =7),

91m;f(o =3,p =13),

1(0.81,1.3) (o =4,p =7)

; ASD91m;f:

1.6 0.8 (o =3,p =12)

Cerebrospinal fluid

DC (%) 91m;f(o =3,p =11)

HD (mm) HD91m;f:91

m;f(o =3,p =11)

Eyeballs and vitreous humor (VH)

DC (%) 91m;f(VH,o =3,p =11),

95 (o =2,p =19),

93 2(o =2,p =24,•),

91(90,92) (VH,intra,o =4,p =7),

91m;f,

89 1(o =8,p =16),

145

91m;f(VH,o =3,p =13),

86(82,89) (VH,o =4,p =7),

85 3(+eye muscles,o =2,p =15),

83 9(o =8,p =20)

SC (%) sDC: 96 3(s=1.65mm,o =2,p =24,•)

; sPPV: 91m;f(s=2mm,o =8,p =20)

HD (mm) HD91m;f:91

m;f(VH,o =3,p =13),

4.9 0.6 (+eye muscles,o =2,p =15)

; HD91m;f:91

m;f(VH,o =3,p =11),

91m;f

ASD (mm) ASD91m;f: 0.4 (VH,intra,o =4,p =7),

91m;f(VH,o =3,p =13),

0.7(0.5,1.1) (VH,o =4,p =7)

; ASD91m;f: 0.5 0.2 (+

eye muscles,o =2,p =15)

SSD (mm) SDTA91m;f: 0.5 (o =8,p =20,p)

; SDTA91m;f:2.8 (o =8,p =20,p)

; SDTA91m;f: 3.4 (o =8,p =20,p)

Optic chiasm

DC (%) 91m;f(o =2,p =10),

91m;f,

39 23 (o =8,p =20),

38 8(o =8,p =16)

145

Medical Physics, 47 (9), September 2020

e934 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e934

TABLE V. Continued.

Observer variability

SC (%) sPPV: 91m;f(s=2mm,o =8,p =20)

HD (mm) HD91m;f:91

m;f(o =2,p =10)

; HD91m;f:91

m;f

ASD (mm) ASD91m;f:91

m;f(o =2,p =10)

SSD (mm) SDTA91m;f: 0.7 (o =8,p =20,p)

; SDTA91m;f:2.0 (o =8,p =20,p)

; SDTA91m;f: 4.7 (o =8,p =20,p)

Optic nerves

DC (%) 91m;f(o =2,p =10),

79 5(o =2,p =24,•),

77 6(o =2,p =17),

73 4(o =2,p =15),

70(65,76) (intra,o =4,

p=7),

91m;f(o =3,p =13),

60(50,66) (o =4,p =7),

91m;f,

57 9(o =8,p =16),

145

50 17 (o =8,p =20)

SC (%) sDC: 97 3(s=2.5mm,o =2,p =24,•)

; sPPV: 91m;f(s=2mm,o =8,p =20)

HD (mm) HD91m;f:91

m;f(o =2,p =10),

2.9 0.5 (o =2,p =15),

91m;f(o =3,p =13)

; HD91m;f:91

m;f

ASD (mm) ASD91m;f: 0.6(0.4,0.7) (intra,o =4,p =7),

91m;f(o =3,p =13),

0.9(0.6,1.7) (o =4,p =7)

; ASD91m;f:91

m;f(o =2,

p=10),

0.5 0.1 (o =2,p =15)

SSD (mm) SDTA91m;f: 0.3 (o =8,p =20,p)

; SDTA91m;f:2.3 (o =8,p =20,p)

; SDTA91m;f: 4.0 (o =8,p =20,p)

Lens

DC (%) 91m;f(o =3,p =11),

88 10 (o =2,p =73),

87 8(o =2,p =24,•),

80(75,85) (intra,o =4,p =7),

91m;f(o =3,

p=13),

70 5(o =8,p =16),

145

68(55,76) (o =4,p =7)

SC (%) sDC: 98 3(s=0.98mm,o =2,p =24,•)

HD (mm) HD91m;f:91

m;f(o =3,p =13)

; HD91m;f:91

m;f(o =3,p =11)

ASD (mm) ASD91m;f: 0.3(0.2,0.4) (intra,o =4,p =7),

91m;f(o =3,p =13),

0.7(0.4,1.2) (o =4,p =7)

Sclera

DC (%) 91m;f(o =3,p =11),

63(62,67) (intra,o =4,p =7),

91m;f(o =3,p =13),

48(30,56) (o =4,p =7)

HD (mm) HD91m;f:91

m;f(o =3,p =13)

; HD91m;f:91

m;f(o =3,p =11)

ASD (mm) ASD91m;f: 0.5 (intra,o =4,p =7),

91m;f(o =3,p =13),

0.9(0.6,1.8) (o =4,p =7)

Cornea

DC (%) 91m;f(o =3,p =13)

HD (mm) HD91m;f:91

m;f(o =3,p =13)

ASD (mm) ASD91m;f:91

m;f(o =3,p =13)

Lacrimal glands

DC (%) 67 10 ( o =2,p =24,•),

63 13 (o =2,p =75 )

SC (%) sDC: 93.9 4.7 (s=2.5mm,o =2,p =24,•)

Mandible

DC (%) 95 (o =2,p =74),

94 2(o =2,p =24,•),

94 3,

92 (o =3,p =50),

89 2(o =8,p =16),

145

87 7(o =18,

p=1),

143

85 4(o =3,p =12)

SC (%) sDC: 98 2(s=1.01mm,o =2,p =24,•)

HD (mm) HD91m;f: 8.9 3.2 (o =3,p =12)

; HD91m;f: 3.9 1.6 (o =3,p =12)

ASD (mm) ASSD: 1.2 0.2

; ASD91m;f: 0.9 0.5 (o =3,p =12)

Oral cavity

DC (%) 94 5,

81 4(o =8,p =16)

145

ASD (mm) ASSD: 2.9 0.6

Temporo-mandibular joints

DC (%) 50 18 (o =8,p =16)

145

Pharyngeal constrictor muscles

DC (%) 76 8 (inf),

91m;f(o =5,p =10,S),

72 7 (mid),

54 8 (inf),

50 8 (middle,o =8,p =16),

145

50 9 (inferior,

o=8,p =16),

145

44 7 (superior,o =8,p =16 )

145

HD (mm) DTA91m;f:91

m;f(o =5,p =10,S)

ASD (mm) ASSD: 1.5 0.2 (mid),

1.7 0.3 (inf),

2.1 0.3 (sup)

; DTA91m;f:91

m;f(o =5,p =10,S)

Cervical esophagus

DC (%) 64 15

ASD (mm) ASSD: 2.0 0.6

Thyroid

DC (%) 91m;f(o =3,p =13),

84(71,92) (intra,o =4,p =7),

82 3(o =8,p =16),

145

76(53,89) (o =4,p =7)

HD (mm) HD91m;f:91

m;f(o =3,p =13)

ASD (mm) ASD91m;f: 0.8(0.4,1.8) (intra,o =4,p =7),

91m;f(o =3,p =13),

1.9(0.5,4.7) (o =4,p =7)

Larynx

DC (%) 86 11 (supraglottic),

91m;f(o =5,p =10,S),

73 18 (glottic),

60 5 (supraglottic,o =8,p =16),

145

49 9 (glottic,

o=8,p =16)

145

Medical Physics, 47 (9), September 2020

e935 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e935

adjacent structures, esophagus, spinal cord, and tra-

chea),

121,123,127

and optic neuropathy (i.e., the optic chi-

asm).

128

3.F. Performance metrics

The agreement between the ground truth and the resulting

auto-segmentation is quantitatively evaluated by various over-

lap and distance metrics,

129

computed over the corresponding

binary segmentation masks (Table VI). The overlap metrics

originate from the statistical measures of the performance of a

binary classification test, and the Dice coefficient is the stan-

dard and widely accepted metrics for volumetric mask overlap

that measures the harmonic average of the classification preci-

sion and recall (i.e., the F1score). Variations of the volumet-

ric coefficient include the sensitivity and positive predictive

value (often referred to as the inclusion), which measure the

ratio of correctly segmented voxels, while the specificity mea-

sures the ratio of correctly nonsegmented voxels and the false

discovery rate measures the ratio of incorrectly segmented

voxels. On the other hand, surface coefficients measure the

overlap of the corresponding mask surfaces.

Contrary to the overlap metrics, the distance metrics evalu-

ate the mutual proximity of the segmentation mask surfaces.

Within this group, the most established are the Hausdorff dis-

tance and its variations, which measure the maximal distance

between any voxel on the mask surface to the other mask sur-

face, as well as variations of the average surface distance,

which measure the distance between voxels on the mask sur-

face to the closest voxels on the other mask surface.

3.G. Segmentation performance

The performance of different auto-segmentation methods

from the perspective of different metrics and OARs is presented

in Table VII, which summarizes the comparisons of auto-seg-

mentation results to the corresponding ground truth obtained by

manual delineation

. A systematic and relatively unbiased evalu-

ation of different methods can be obtained through computa-

tional challenges, which have in the past decade gained

increased popularity and become the standard for validation of

methods in the field of biomedical image analysis.

130

In such

a competition-oriented setting, the challenge organizers first

release images with the ground truth that are used by the par-

ticipating teams for method development, and then the meth-

ods are evaluated on images for which the ground truth is

knowntoorganizersonly.

To this date, five H&N auto-segmentation challenges have

been organized. In 2009

†

, five different teams attempted to

segment the mandible and brainstem from 25 CT images (10

for training, 15 for testing).

The second challenge was orga-

nized by the same group in 2010

‡

, when the same image data-

base was used but six different teams attempted to segment

the parotid glands instead.

In 2015

, six different teams par-

ticipated in a challenge to segment the brainstem, mandible,

optic chiasm, optic nerves, parotid glands, and

TABLE V. Continued.

Observer variability

HD (mm) DTA91m;f:91

m;f(o =5,p =10,S)

ASD (mm) ASSD: 1.4 0.4,

1.8 0.4 (supraglottic)

; DTA91m;f:91

m;f(o =5,p =10,S)

Trachea

DC (%) 91m;f(o =2,p =12)

Cochlea

DC (%) 78 8(o=2,p =24,•),

76 9(o =2,p =8),

91m;f(o =5,p =10,S),

50 13,

37 10 (o =8,p =16)

145

SC (%) sDC: 96 4(s=1.25mm,o =2,p =24,•)

HD (mm) DTA91m;f:91

m;f(o =5,p =10,S)

ASD (mm) ASSD: 1.1 0.4

; DTA91m;f:91

m;f(o =5,p =10,S)

Brachial plexus

DC (%) 26 (o =5,p =1,S*)

VC (%) TPR: 36 (o =5,p =1,S*)

HD (mm) HD91m;f: 22.2 (o =5,p =1,S*)

Legend: m —median, average not reported; f —value estimated from a figure, exact value not reported; o —number of observers; p —number of patients; intra —intra-

observer variability; S —compared against the STAPLE consensus among other physicians; S* —comparison of trainee contours against the STAPLE consensus among

four other expert physicians; P —compared against the probability map consensus among other physicians; •—evaluated on the TCIA-RT database;

+eye muscles —

the eyes and eye muscles were segmented as one organ; s—size of the volumetric neighborhood.

Table VII does not report comparisons to the ground truth that was

obtained by manually corrected or merged auto-segmentation

results.

32,80,96

In the case the results were reported separately for

multiple versions of a method Table VII reports only the results for

the best performing method version.

†

The Head and Neck Auto-segmentation Challenge was part of the

workshop 3D Segmentation in the Clinic: A Grand Challenge during

the conference on Medical Image Computing and Computer Assisted

Interventions - MICCAI 2009.

‡

The Head and Neck Auto-segmentation Challenge: Segmentation

of the Parotid Glands was part of the workshop Medical Image

Analysis in the Clinic: A Grand Challenge during MICCAI 2010.

The Head and Neck Auto-Segmentation Challenge 2015 was held

as a standalone satellite event during MICCAI 2015.

Medical Physics, 47 (9), September 2020

e936 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e936

submandibular glands from 40 CT images (25 for training, 15

for testing).

In July 2019

, 10 teams attempted to segment

the parotid glands, submandibular glands and lymph nodes

from 55 MR images (31 for training, 24 for testing)

131

,how-

ever, detailed results of this challenge have yet not been pub-

lished and are not publicly available. The last auto-

segmentation challenge was carried out in October 2019

where 12 teams attempted to segment 13 OARs (i.e., the eyes,

lens, optic nerves, optic chiasm, pituitary gland, brainstem,

temporal lobes, spinal cord, parotid glands, inner ear, middle

ear, temporo-mandibular joints, and mandible) as well as the

TABLE VI. Performance metrics applied for measuring the performance of auto-segmentation of organs at risk in the head and neck region for the purpose of

radiotherapy planning, and the corresponding references and mathematical definitions.

Metrics label –name and definition

Overlap metrics, reported in percents (%)

Standard volumetric coefficient

DC Dice coefficient (F1score)

22–63,92–95,97–99

2jA\Bj

jAjþjBj

Variations of the volumetric coefficient (VC)

TPR Sensitivity

24,31,40,41,50,55,56,59,67,68,71,90,93,94,96

jA\Bj

jAj

TNR Specificity

41,93,94,96

jðA[BÞCj

jACj

PPV Positive predictive value (inclusion)

24,31,40,55,56,68

jA\Bj

jBj

FDR False discovery rate (segmented volume)

50,59

jBnAj

jBj

Variations of the surface coefficient (SC)

sDC Surface overlap

j@A\@sBjþj@B\@sAjÞ

j@Ajþj@Bj

sPPV Surface positive predictive value (inclusion)

78,89

j@B\@sAj

j@Bj

Distance metrics, reported in millimeters (mm)

Variations of the Hausdorff distance (HD)

reg

Hausdorff distance, regular

25,36,41,43,44,48,52,53,58,66,70,73,76,79,84,88,99

max

a2@A

b2@B

dða;@BÞ;dðb;@AÞ

DTA

max

Maximum distance to agreement

27,77

maxb2@Bdðb;@AÞ

HD95 95-percentile Hausdorff distance

22,23,29–31,35,37–40,46,49,55,58,66,69,71,97

K95

a2@A

b2@B

dða;@BÞ;dðb;@AÞ

HDmid

95 95-percentile Hausdorff distance, mid-value

24,54,62

2K95

a2@Adða;@BÞþK95

b2@Bdðb;@AÞ

HDsw Slice-wise Hausdorff distance

81,82,85,86,91,92

<HD

reg

aggregated over two dimensions>

Variations of the average surface distance (ASD)

ASSD Average symmetric surface distance

26,53,57

Pa2@Adða;@BÞþPb2@Bdðb;@AÞ

j@Ajþj@Bj

ASD

max

Average surface distance, maximum

35,64,66,72,75,76,98,99

max Pa2@Adða;@BÞ

j@Aj;Pb2@Bdðb;@AÞ

j@Bj

ASDmid Average surface distance, mid-value

24,32,40,55,56,61,81

2Pa2@Adða;@BÞ

j@AjþPb2@Bdðb;@AÞ

j@Bj

ASD

n/a

Average surface distance, unspecified

39,58,75,79,88

DTA

avg

Average distance to agreement

27,42,68,77,84,87

Pb2@Bdðb;@AÞ

j@Bj

Variations of the signed surface distance (SSD)

SSD

avg

Signed surface distance, average

Pa2@Ads

ða;@BÞx02010;Pb2@Bdsðb;@AÞj@Ajþj@Bj

SDTA

avg

Signed distance to agreement, average

Pb2@Bds

ðb;@AÞj@Bj

SDTA

min

Signed distance to agreement, minimum

minb2@Bdsðb;@AÞ

SDTA

max

Signed distance to agreement, maximum

maxb2@Bdsðb;@AÞ

Legend: |A|and |B|are the number of voxels in volumetric masks A(e.g., ground truth) and B(e.g., auto-segmentation), respectively, and |@A|and |@B|are the number of

voxels in the corresponding subsets of surface voxels @Aand @B, respectively. The Euclidean distances of voxels aand bto surfaces @Band @A, respectively, are defined as

dða;@BÞ¼minb2@Bkax02010;bkand dðb;@AÞ¼mina2@Akbx02010;ak, respectively. The signed Euclidean distance dsða;@BÞis defined as d(a,@B)ifa2BCand as d

(a,@B)ifa2B, and the signed Euclidean distance dsðb;@AÞis defined as d(b,@A)ifb2ACand as d(b,@A)ifb2A. The volumetric neighborhoods within distance s

from surfaces @Aand @Bare defined as @sA¼fx2R3;9a2@A;kxx02010;aksgand @sB¼fx2R3;9b2@B;kxx02010;bksg, respectively.

The AAPM RT-MAC challenge was part of the 2019 American Asso-

ciation of Physicists in Medicine (AAPM) Annual Meeting (https://

www.aapm.org/GrandChallenge/RT-MAC/; http://aapmchallenges.c

loudapp.net/competitions/34).

The StructSeg2019: Automatic Structure Segmentation for Radio-

therapy Planning Challenge was held as a standalone satellite event

during MICCAI 2019 (https://structseg2019.grand-challenge.org;

http://www.structseg-challenge.org).

Medical Physics, 47 (9), September 2020

e937 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e937

TABLE VII. Performance of auto-segmentation for the purpose of radiotherapy planning, and the corresponding references (cf. Table VI for the list of metrics).

Results

Parotid glands

DC (%) 92 4,

91 2,

88 2,

91m;f(N),

88,

87 3(N),

87 4(N),

87,

86 2(N),

86 3,

86 4,

86 5(N),

86 5,

86 7,

91m;f,

85 2,

85 3,

85 4,

85 5,

91m;f(DL),

85,

84 3,

84 3(N),

84 4(•),

84 7(N,IM),

84,

91m;f,

83 2,

83 3,

83 5(•),

83 5,

83 6,

83 6(N),

91m;f,

81 4(N),

81 5,

81 8(N),

81 8,

81 (N),

91m;f,

91m;f(ABAS),

79 (MR),

91m;f,

79,

77 6,

91m;f(N),

76 6,

76 (CT),

91m;f,

75,

72 10,

72 12,

91m;f,

91m;f

VC (%) TPR: 97 4,

91 9,

88 5(N),

86 7,

85 5(N),

85 7(N),

85 7,

84 (MR),

83 10 (N),

82 5(N),

72 9,

71 (CT)

; TNR: 91 7

; PPV: 88 5(N),

87 3(N),

87 6(N),

86 2(N),

84 7(N),

83 7,

83 (CT),

80 6,

77 (MR)

; FDR: 18 6

SC (%) sDC: 95 3(s=2.85mm,N),

90 6(s=2.85mm,•)

HD (mm) HD91m;f:1.4 0.6,

1.7 0.7 (•),

91m;f,

5.1 1.1,

91m;f,

10.7,

91m;f,

12.1 3.9,

91m;f,

91m;f(N,IM),

14.2 6.6 (N)

;

DTA91m;f: 6.8 2.5,

91m;f(N),

91m;f,

91m;f

; HD91m;f:91

m;f(N),

2.6 1.4,

2.7 1.1 (N),

3.2 0.6,

3.8 1.1 (N),

4.0 2.2(N),

91m;f,

4.6 1.2,

5.0 2.4 (N,IM),

91m;f,

5.2 1.8 ( N),

91m;f(DL),

6.6 3.3,

91m;f,

91m;f(N),

91m;f

(ABAS),

9.3 3.3

; HD91m;f: 3.3 1.0 (N),

3.9 2.0,

3.9 (N)

; HD91m;f: 5.0 1.0,

5.8 1.6,

91m;f

ASD (mm) ASSD: 0.9 0.3,

1.2,

1.6

; ASD91m;f:91

m;f,

91m;f,

91m;f(N,IM),

3.6 1.4

; ASD91m;f:1.0 0.3 (N),

1.0 0.4,

1.2 0.3 (N),

1.3 0.4,

1.4 0.4 (N),

1.8 0.6 (N)

; ASD91m;f: 0.3 0.1,

1.4 0.4

; DTA91m;f:91

m;f,

1.6 0.6,

1.7 1.1,

91m;f,

2.5 2.8,

4.8 (MR),

6.2 (CT)

SSD (mm) SSD91m;f:91

m;f,

91m;f(N)

Submandibular glands

DC (%) 91m;f,

85 10,

85,

84 6,

83,

82 5,

82 5(N),

82 7,

81 4,

81 6(N),

80 7(N),

80 7,

80 8(•),

91m;f,

78 7(N),

78 8(N,IM),

77 6,

75 13 (N),

73,

71 12,

91m;f,

70 12,

70,

65 8

(N),

91m;f(N),

91m;f,

91m;f

VC (%) TPR: 87 5,

85 6(N),

80 11,

79 8(N),

79 9(N),

72 16 (N)

; PPV: 85 9(N),

83 11,

82 9(N),

82 11

(N),

80 8(N)

; FDR: 14 8

SC (%) sDC: 84 10 (s=2mm,•),

82 10 (s=2mm,N)

HD (mm) HD91m;f: 6.6,

91m;f(N,IM),

9.7 4.8 (N)

; DTA91m;f:91

m;f,

91m;f(N),

91m;f

; HD91m;f:91

m;f,

3.2 1.6 (N),

4.0 2.7 (N),

91m;f,

4.8 1.8 (N,IM),

4.8 1.7 (N),

91m;f(N),

6.0 1.8,

6.2 4.3,

91m;f

; HD91m;f: 3.2 2.3,

3.9 1.2

(N)

; HD91m;f: 3.8 1.0,

91m;f

ASD (mm) ASSD: 1.2,

1.3 1.2

; ASD91m;f:91

m;f(N,IM)

; ASD91m;f: 0.9 0.5 (N),

1.2 0.7,

1.4 1. 0 (N),

2.0 1.9 (N)

; DTA91m;f:

91m;f,

1.2 1.3,

1.9 1. 4

Brainstem

DC (%) 93 1,

93 3,

92 3,

92,

91 1,

91 3,

90 1,

90 2,

90 3,

90 4(N),

89 3,

88 2(N),

88 3,

88,

88 3(•),

87 3(N),

87 4(N,IM),

91m;f,

91m;f(DL),

86 4,

86 8,

86,

(80,88),

91m;f,

84 (N),

91m;f,

83 6,

91m;f,

82 4(N),

91m;f(ABAS),

80 8(N),

79 6,

79 10

(•),

91m;f(N),

78,

91m;f(N),

77 7,

77 8,

91m;f,

76(68,81),

91m;f,

75 12,

73 (MR),

69 (CT),

67 2,

64 16

VC (%) TPR: 95 3,

91 4,

90 4(N),

90 4,

89 3(N),

88 3(N),

88 6(N),

86 14,

87 5(N),

79 9,

75 14,

69 (CT),

64 (MR),

63 10

; TNR: 98 2

; PPV: 91 4(N),

89 4,

89 6(N),

89 (MR),

88 4(N),

87 5

(N),

85 2(N),

74 (CT)

; FDR: 15 8,

42 23

SC (%) sDC: 83 13 (s=2.5mm,N),

83 14 (s=2.5mm,•)

; sPPV: 91m;f(s=2mm)

HD (mm) HD91m;f: 0.6 0.1 (•),

0.9 2.0,

2.7 0.9,

2.9 0.3,

91m;f,

6.5,

91m;f(N,IM),

91m;f,

8.7,

91m;f

; DTA91m;f:

3.5 1.2,

91m;f(N),

91m;f

; HD91m;f:1.30.5,

2.0 0.3 (N),

91m;f,

3.6 0.8 (N),

91m;f,

4.0 0.9 (N),

4.0 2.0

(N,IM),

91m;f(ABAS),

4.8 1.6,

91m;f,

91m;f(N),

91m;f(DL),

91m;f,

91m;f(N),

6.4 2.4,

12.4 26.3 (N)

; HD91m;f:

2.6 0.8,

2.9 (N),

3.0 0.6 (N)

; HD91m;f: 2.8 0.5,

2.8 0.5,

91m;f

ASD (mm) ASSD: 0.6 0.1,

0.8

; ASD91m;f:91

m;f,

91m;f(N,IM),

2.1,

2.2(1.7,3.1)

; ASD91m;f: 0.7 0.3,

0.9 0.3 (N),

1.0 0.2,

1.2 0.6 (N),

1.2 0.2 (N),

1.4 0.3 (N)

; DTA91m;f: 0.9 0.4,

1.0 0.5,

91m;f,

3.2 (MR),

4.3 (CT)

SSD (mm) SDTA91m;f: 0.2

; SDTA91m;f:4.3

; SDTA91m;f: 5.4

Brain, cerebrum (CBR) and cerebellum (CBE)

DC (%) 99 0.2 (•),

99,

98 0.3,

98 (CBR),

97 0.5 (•),

91m;f(CBR),

96 1 (CBR),

96 2,

94 1 (CBE),

94(93,95)

(CBR),

91m;f(CBE),

92 (CBE),

87(80,91) (CBE),

84(79,86) (CBE)

SC (%) sDC: 95 2(s=1mm,•)

HD (mm) HD91m;f:1.2 1.5,

3.6 0.2 (•),

10.8 (CBE),

18.4 (CBR)

; HD91m;f:91

m;f(CBR),

91m;f(CBE),

91m;f(CBR),

91m;f(CBE)

;

HD91m;f:91

m;f

ASD (mm) ASD91m;f: 0.8 (CBR),

1.2 (CBE),

1.9(1.3,3.4) (CBE),

2.9(2.5,3.2) (CBR)

Temporal lobes

DC (%) 93 4,

84 3

HD (mm) DTA91m;f: 4.7 2.2

; HD91m;f: 12.5 4.1

ASD (mm) DTA91m;f:1.1 0.6

Medical Physics, 47 (9), September 2020

e938 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e938

TABLE VII. Continued.

Results

Hippocampus

DC (%) 91m;f

HD (mm) HD91m;f:91

m;f

Pituitary gland

DC (%) 90,

64 9,

30(0,72)

HD (mm) HD91m;f: 3.2 0.8

Spinal cord and spinal canal

DC (%) 96,

95 (canal),

92 2 (canal,•),

91 1,

88 2,

88 7,

88,

91m;f,

87 3,

86 6,

86 9,

85 2,

85,

91m;f,

83 6,

91m;f,

82 5,

80 5,

80 8(•),

80 (CT),

79 8(•),

78 (+brainstem),

76 8,

(66,82),

91m;f,

75,

74 8,

91m;f,

37 (MR)

VC (%) TPR: 80 (CT),

76 12,

26 (MR)

; PPV: 93 (MR),

81 (CT)

SC (%) sDC: 99 1(s=2.93mm,•)

,93 3 (canal,s=1.17mm,•)

HD (mm) HD91m;f: 0.5 0.1 (•),

0.7 1.3,

1.7 0.2,

91m;f,

4.3,

6.6,

10.4 3.8

; DTA91m;f: 3.3 0.3,

91m;f

; HD91m;f:

91m;f,

4.3 1.4,

91m;f,

6.9 22.0

; HD91m;f:91

m;f

ASD (mm) ASSD: 0.4,

2.6 1.6

; ASD91m;f: 0.8,

1.5(0.8,2.4)

; ASD91m;f:1.20.4

; DTA91m;f: 0.9 0.1,

1.6 0.9,

2.3 1.4

(+brainstem),

3.5 (CT),

17.5 (MR)

Cerebrospinal fluid

DC (%) 82 7

HD (mm) HD91m;f:91

m;f

Eyeballs and vitreous humor (VH)

DC (%) 96 1 (VH),

95,

95 2,

94,

91m;f(DL),

93 1,

93 4,

92 2(•),

92 2,

91 2(•),

91m;f(ABAS),

91 (MR),

89 4,

91m;f,

88 3,

87 (CT),

91m;f,

85 8,

84 5,

84 7,

84(19) (+eye muscles),

81 5,

81 (VH),

81(78,85),

(72,84) (VH),

91m;f

VC (%) TPR: 93 (MR),

91 (CT),

83 8

; PPV: 89 (MR),

84 (CT)

; FDR: 10 8

SC (%) sDC: 95 3(s=1.65mm,•)

; sPPV: 91m;f(s=2mm)

HD (mm) HD91m;f: 0.3 0.1 (•),

0.4 1.0,

1.3 0.3,

1.7 0.3,

91m;f(DL),

91m;f(ABAS),

5.0 (VH),

5.3(4.7) (+eye muscles),

91m;f

;

HD91m;f:91

m;f(VH),

91m;f,

91m;f

; HD91m;f: 2.4 0.5,

2.4 1.0

; HD91m;f:91

m;f

ASD (mm) ASD91m;f: 1.0 (VH),

1.2(0.9,1.8) (VH)

; ASD91m;f: 0.6(0.8) (+eye muscles)

; DTA91m;f: 2.0 (MR),

3.3 (CT)

SSD (mm) SDTA91m;f: 0.8

; SDTA91m;f:2.3

; SDTA91m;f: 3.8

Optic chiasm

DC (%) 91m;f,

71 9,

64 16,

62 17,

61 6(N),

59 7,

59 10 (N),

59 14,

58 10 (N),

58 17 (N),

57 13 (N,

UB),

91m;f,

53 15,

52 11 (N),

91m;f,

45 17 (N),

42 17 (N),

91m;f,

41(0,58),

41 14,

37 13,

37 18,

91m;f

(N),

24 15

VC (%) TPR: 68 8(N),

64 11 (N),

64 15,

61 5,

61 10 (N),

50 25 (N),

48 31

; PPV: 65 8,

61 12 (N),

56 10

(N),

56 11 (N),

56 16,

47 18 (N)

; FDR: 77 24

SC (%) sPPV: 91m;f(s=2mm)

HD (mm) HD91m;f:91

m;f,

1.0 0.4,

2.5 1.0,

91m;f(N,UB),

5.6 1.6 (N),

91m;f

; DTA91m;f: 3.7 1.4,

91m;f(N)

; HD91m;f:

2.1 1.4,

2.2 1.0 (N),

2.6 0.8 (N,UB),

2.8 1.4 (N),

3.8 1.2 (N),

4.4 3(N),

4.6 2.4,

91m;f,

91m;f(N),

5.8 2.5,

91m;f

; HD91m;f: 2.7 0.5 (N),

2.8 1.6 (N),

3.9 2.2

ASD (mm) ASD91m;f:91

m;f(N,UB)

; ASD91m;f: 0.7 0.2 (N),

0.8 0.4,

0.9 0.2 (N),

1.3 0.3 (N),

1.5 0.7

; ASD91m;f:91

m;f

;

DTA91m;f:1.1 0.7

SSD (mm) SDTA91m;f: 0.04

; SDTA91m;f:2.4

; SDTA91m;f: 3.0

Optic nerves

DC (%) 90 4,

82 6,

81,

79 6,

91m;f,

78 5(•),

77 6,

76 7,

76(73,82),

75 5(•),

74 6,

74 8(N),

74(41),

72 4,

72 5(N),

72 6,

72 6(N),

72 6,

71 8(N),

70 4(N),

69 5(N),

69 9,

69 10,

91m;f

(ABAS),

64 7,

64 8(N),

63 10 (N,UB),

62,

60 12,

91m;f,

58(49,63),

91m;f,

52 14,

91m;f(DL),

48 11,

91m;f(N),

38(0,53),

91m;f

VC (%) TPR: 85 8(N),

80 8(N),

77 11 (N),

74 6(N),

71 10,

70 6,

64 16

; PPV: 80 9,

76 7,

72 9(N),

70 8(N),

66 8(N),

64 6(N)

; FDR: 57 12

SC (%) sDC: 98 3(s=2.5mm,•),

92 6(s=2.5mm,N)

; sPPV: 91m;f(s=2mm)

HD (mm) HD91m;f: 0.5 0.3 (•),

0.7 0.8,

91m;f,

1.8 0.7,

3.8(6.9),

91m;f(N,UB),

6.5

; DTA91m;f: 3.7 1.0,

91m;f(N)

; HD91m;f:

1.4 0.4,

2.0 0.5 (N),

2.1 0.3,

2.3 2.4 (N),

2.5 1.0 ( N),

2.6 0.4 (N),

3.0 1.0 (N,UB),

91m;f(ABAS),

91m;f

(N),

3.7 1.1,

4.8 4.3,

91m;f(DL),

91m;f,

91m;f

; HD91m;f:1.9 1. 9 (N),

1.9 1.3,

2.2 0.9 (N),

3.3 1.6

ASD (mm) ASD91m;f:91

m;f(N,UB),

1(0.8,1.4),

1.0

; ASD91m;f: 0.4 0.3,

0.6 0.3,

0.7 0.2 (N),

1.1 0.8 (N)

;

ASD91m;f:91

m;f,

0.6(2.0)

; DTA91m;f:1.2 0.5

SSD (mm) SDTA91m;f:0.4

; SDTA91m;f:2.7

; SDTA91m;f: 2.4

Medical Physics, 47 (9), September 2020

e939 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e939

TABLE VII. Continued.

Results

Lens

DC (%) 88 5,

84 7,

,82 6

,81 12

,91

m;f(DL)

,80 18 (•)

,79 11 (•)

,72 14

,67

, 50(37,66)

,91

m;f(ABAS)

35 25

VC (%) TPR: 50 32

; FDR: 73 21

SC (%) sDC: 93 20 (s=0.98mm,•)

HD (mm) HD91m;f: 0.2 0.1 (•)

, 0.4 0.9

, 3.7

; HD91m;f:91

m;f

, 2.0 1.1

,91

m;f(DL)

,91

m;f(ABAS)

ASD (mm) ASD91m;f:1.0

, 1.6(0.7,2.9)

Sclera

DC (%) 69 5

,46

, 38(24,55)

HD (mm) HD91m;f: 5.9

; HD91m;f:91

m;f

ASD (mm) ASD91m;f:1.1

, 1.8(1.0,3.8)

Cornea

DC (%) 43

HD (mm) HD91m;f: 6.4

ASD (mm) ASD91m;f:1.7

Lacrimal glands

DC (%) 70 12

,62 13 (•)

SC (%) sDC: 92 7(s=2.5mm,•)

Extraocular muscle

DC (%) 76 6

HD (mm) HD91m;f:2.1 0.5

Mandible

DC (%) 96

,96

,91

m;f

,94 1(N)

,94 2(•)

,94 2(N)

,94(N)

,94

,93 1

93 1

,93 1(N,IM)

,93 1(N)

,93 1

,93 2

,93 2(N)

,92 1

,92 2

,92(N)

,91 2(N)

91 4

,91 9

,90 2(•)

,90 4

,91

m;f

,89 4

,88 3

,89

,88

,91

m;f(N)

,87 3

,85 2

,91

m;f

,91

m;f

91m;f

,91

m;f(N)

,82 4

,80 4

,78 8

VC (%) TPR: 95 2(N)

,95(N)

,93 2(N)

,93

,92 2(N)

,92 3

,92 3(N)

,91 3(N)

,87 5

,83 13

,79 11

;

TNR: 100 (N)

,100

,95 3

; PPV: 97 2(N)

,95 2(N)

,95 5(N)

,94 2(N)

,94 3

,79 4

SC (%) sDC: 97 2(s=1mm,•)

,97 2(s=1mm,N)

HD (mm) HD91m;f:1.3 1. 0

,1.3 0.4 (•)

, 2.4 0.4

,91

m;f(N,IM)

, 4.6 (N)

, 6.4

, 6.5

, 6.7 1.3

,91

m;f

, 10.9 2.1

; DTA91m;f:

91m;f

,91

m;f(N)

; HD91m;f:91

m;f

,1.3 0.5 (N)

,1.4 0.6 (N)

,1.5 0.3 (N)

,1.7 0.6 (N,IM)

,1.9 0.6 (N)

, 2.4 0.6

(N)

, 2.5 0.8

, 2.7 1.7

,91

m;f

,91

m;f(N)

,91

m;f(N)

, 4.3 1.1

, 6.3 2.2

; HD91m;f:1.30.1

,1.4 0.02 (N)

, 1.9 (N)

;

HD91m;f:2.1 0.1

, 2.6 0.6

,91

m;f

ASD (mm) ASSD: 0.2 0.1

, 0.6

; ASD91m;f:91

m;f(N,IM)

; ASD91m;f: 0.4 0.1 (N)

, 0.4 0.1 (N)

, 0.5 0.1 (N)

, 0.5 0.1

(N)

,1.1 0.7

; ASD91m;f: 0.6

,1.1 0.3

; DTA91m;f: 0.7 0.3

Oral cavity

DC (%) 93 3

,91 2

,89 2

,91

m;f

,87 5

,91

m;f

,787

VC (%) TPR: 68 11

; FDR: 5 3

HD (mm) HD91m;f:91

m;f

; DTA91m;f:91

m;f

; HD91m;f:91

m;f

,7.4 2.1

ASD (mm) ASSD: 1.0 0.3

; DTA91m;f: 0.8 0.4

Temporo-mandibular joints

DC (%) 87 3

,87 6

,85 5

HD (mm) HD91m;f: 2.8 0.9

ASD (mm) DTA91m;f: 0.4 0.3

Mastoids

DC (%) 82 6

Chewing muscles

DC (%) 91m;f(pterygoid)

,91

m;f(masseter)

,71

ASD (mm) DTA91m;f:1.6 1. 4

Pharyngeal constrictor muscles (PCM), cricopharynx (CP), orohypopharynx constrictor muscle (OPCM)

DC (%) 81 4 (PCM)

,73 11 (CP)

,71 8 (PCM)

,69 6 (PCM)

,68 9 (PCM)

,91

m;f(PCM)

,91

m;f

, 61 (middle) & 58 (inferior) &

46 (superior)

, 58 (OPCM)

,54 26 (inferior) & 58 18 (middle) & 52 11 (superior) (PCM)

,91

m;f(PCM)

, 50 (PCM)

VC (%) TPR: 78 7 (PCM)

,70 11 (CP)

,66 9 (PCM)

; PPV: 69 8 (PCM)

; FDR: 20 16 (CP)

,29 9 (PCM)

HD (mm) HD91m;f: 9.6 (inferior) & 12.7 (middle) & 14.7 (superior)

,91

m;f

; DTA91m;f:91

m;f(PCM)

; HD91m;f: 2.8 1.3 (PCM)

,91

m;f(PCM)

Medical Physics, 47 (9), September 2020

e940 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e940

tumor gross target volumes of the nasopharyngeal cancer from

60 CT images (50 for training, 10 for testing). While detailed

results for this challenge are yet to be published, the publicly

available data indicate that best ranking method achieved an

average Dice coefficient of 81% and 95-percentile Hausdorff

distance of 2.8 mm across all OARs. Moreover, a new edition

of this challenge is scheduled for October 2020

4. DISCUSSION

The field of RT planning in the H&N region expands

beyond auto-segmentation of OARs that was presented in this

review, for example to (auto-)segmentation of target volumes

(including gross target volume, clinical target volume, and

planning target volume), analysis of commercial solutions for

RT planning, dosimetric evaluations, and longitudinal stud-

ies. For additional information, we kindly refer the reader to

specific reviews that include the topics of segmentation

methodology,

8,21

target volume segmentation,

ABAS,

19,1 32

commercial segmentation tools,

5,66,119

MR-only RT

133

and

observer variability in OAR delineation

In this review, we focused on auto-segmentation of OARs

in the H&N region, and provided a comprehensive and sys-

tematic overview with a complete list of relevant references

from 2008 to date along with a systematic analysis from dif-

ferent perspectives that we consider relevant: image modality,

OAR,image database,methodology,ground truth,perfor-

mance metrics, and segmentation performance. In this sec-

tion we discuss the advantages and limitations of the

TABLE VII. Continued.

Results

ASD (mm) ASSD: 1.6 1.7 (inferior) & 1.9 1.7 (middle) & 3.7 5.2 (superior) (PCM)

, 2.0 (middle) & 2.0 (inferior) & 2.1 (superior)

; ASD91m;f:

1.0 0.5 (PCM)

; DTA91m;f:91

m;f(PCM)

, 2.0 1.9 (PCM)

Cervical esophagus with the cricopharyngeal inlet, upper esophageal sphincter (UES)

DC (%) 86 3

,82 6

,81 14 (UES)

,81 7

,70 7

,69 10

,62

,60 11

,91

m;f

,91

m;f

,35

VC (%) TPR: 80 16 (UES)

,50 15

; FDR: 15 14 (UES)

,21 14

HD (mm) HD91m;f:1.1 1.1

,91

m;f

, 35.8

; HD91m;f:91

m;f

ASD (mm) ASSD: 1.3 0.6

,7.7

; ASD91m;f:1.90.7

; DTA91m;f:1.0 0.7

Thyroid

DC (%) 92 3.7

,86 5

,91

m;f

,80

,79 6

,68

, 57(37,80)

HD (mm) HD91m;f: 10.2 2.9

,17.5

; HD91m;f: 2.7 0.6

,91

m;f

, 3.9 2.4

; HD91m;f:91

m;f

ASD (mm) ASD91m;f: 2.5

, 5.1(1.1,9.3)

Larynx

DC (%) 89 3

,87 4

,86 4

,86 7

,83 8

,80 5

,78 4

,91

m;f

,77 7

,74

,91

m;f

,71

,91

m;f

VC (%) TPR: 88 6

,83 8

; PPV: 77 6

; FDR: 25 10

HD (mm) HD91m;f:11.1

,91

m;f

; DTA91m;f:91

m;f

,91

m;f

; HD91m;f: 3.2 2.7

, 6.2 5.8

,91

m;f

ASD (mm) ASSD: 1.0 0.4

, 2.2

; ASD91m;f:1.71.6

; DTA91m;f:1.3 1.0

,91

m;f

Trachea

DC (%) 84 8

,81 5

,91

m;f

HD (mm) HD91m;f:91

m;f

; HD91m;f: 20.9 9.0

Cochlea

DC (%) 95 10

,82 7(•)

,74

,66 13

,65 7

,41 8(•)

,91

m;f

SC (%) sDC: 99 2(s=1.25mm,•)

HD (mm) HD91m;f: 0.5 0.4

, 0.7 0.1 (•)

,1.7

; DTA91m;f:91

m;f

ASD (mm) ASSD: 0.4

, 0.6 0.2

; DTA91m;f:91

m;f

Brachial plexus

DC (%) 77

,56 11

,53 12

,32

VC (%) TPR: 49

,47 12

HD (mm) HD91m;f: 15.4

; HD91m;f:91

m;f

ASD (mm) ASD91m;f:1.6

Carotid artery

DC (%) 91

,91

m;f

HD (mm) HD91m;f: 0.9

; HD91m;f:91

m;f

, 18.3 14.5

Legend: m –median, average not reported; f –value estimated from a figure, exact value not reported; o1/o2 —compared against observer 1/observer 2; N—evaluated on

the PDDCA database;

•—evaluated on the TCIA-RT database;

CT, MR —the results in

are obtained from CT or MR images; IM, UB —winning teams of the 2015

computational challenge

;+brainstem —the spinal cord and brainstem were segmented as one organ; +eye muscles —the eyes and eye muscles were segmented as one

organ; +chiasm —optic nerves and optic chiasm were segmented as one organ; s—size of the volumetric neighborhood.

The Automatic Structure Segmentation for Radiotherapy Planning

Challenge 2020 is planned as a standalone satellite event during

MICCAI 2020 (https://miccai2020.org/en/MICCAI-2020-CHAL

LENGES.html).

Medical Physics, 47 (9), September 2020

e941 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e941

reviewed methods, and provide corresponding recommenda-

tions from the relevant perspectives.

4.A. Image modality

For the purpose of RT planning, CT images are always

acquired because they contain information about the electron

density that is required to calculate the interaction of radia-

tion beams with tissues, and further used to define radiation

dose distribution maps. Although MR images proved to be

advantageous for RT planning because they can provide

anatomical information complementary to CT images, espe-

cially in the case of soft tissues, they are not commonly used

in clinical practice. Moreover, the structures in MR images

may be subjected to geometrical distortions,

134

for example,

due to the magnetic field inhomogeneities.

101

However, as

MR imaging has become more accessible in the past decade,

it can be expected that its utilization will increase toward

making MR images an integral part of RT planning, and that

auto-segmentation approaches exploring both CT and MR

image modalities simultaneously will be further developed.

The start of this trend is already indicated by the recent

increase in the number of studies that include the MR image

modality.

38,40,43,57-59

In a single study where OARs were

independently auto-segmented from CT and MR images of

the same patients, the results for MR images outperformed

those for CT images in the case of the parotid glands, eye-

balls, and brainstem.

Although methods for MR-only RT planning are being

developed,

135

their routine clinical implementation is still

very limited, as challenges remain of how to assign data on

electron density to MR images for the purpose of dose calcu-

lation

133

by means of synthetic CT image generation

136

MR-to-CT image registration.

105,137

In general, better perfor-

mance is achieved by applying deformable (i.e., nonrigid)

image registration and using rigid registration as the first

step,

103,104

however, this may not always be the case.

105

further improve the registration process, DL approaches have

recently started to emerge.

137

Complementary information can be obtained from PET-

CT and PET-MR scanners, which combine the CT or MR

with the PET modality and acquire coregistered images.

However, as PET images enable functional investigation

through the radiolabeling of tissues with a high metabolic

activity (i.e., cancerous cells), they are more appropriate for

target volume than for OAR segmentation.

118,138

On the other

hand, monoenergetic images generated from DECT were

shown to be adequate for H&N OAR segmentation

108

because they can exhibit superior image quality in compar-

ison to classical 120 keV CT, especially in terms of a better

contrast-to-noise ratio, reduced influence of the beam harden-

ing phenomenon and metal artifact suppression. For several

OARs, it was shown that ABAS and DL-based auto-segmen-

tation can be successfully applied to monoenergetic images

of 40 and 70 keV.

However, a study on a larger DECT data-

base with a complete set of OARs and comparison to

classical CT images needs to be performed in order to objec-

tively assess and identify eventual advantages.

To conclude, both CT and MR image modalities are being

explored for H&N OAR auto-segmentation, but the potential

of the MR image modality for auto-segmentation of several

soft tissues should be explored more in the future.

4.B. Organ at risk

The relatively small area of the H&N region comprises a

large number of OARs with a relatively complex and variable

anatomy. The decision of which OAR needs to be delineated is

based on a number of factors, including the proximity of the

OAR to the tumor, its susceptibility to the radiation and impor-

tance for life functions. Auto-segmentation was therefore com-

monly performed for OARs whose RT-induced damage

proved to be linked to post-RT complications that may endan-

ger the life of the patient or notably jeopardize its quality.

109-111

Due to the potentially devastating morbidity resulting

from over-irradiation of the spinal cord and brainstem,

delineation of these two anatomical structures is a manda-

tory part of any segmentation process in the H&N

region.

102

The parotid and submandibular glands are by far

the most represented of the remaining OARs, although their

poor boundary distinction in CT images makes segmenta-

tion very challenging. On the other hand, the optic chiasm

and optic nerves are also demanding to segment because of

their small size and tubular geometry. The mandible is the

only well visible bony structure, and due to its excellent

visibility in CT images it can act as a spatial reference for

segmenting other neighboring OARs.

51,66

As the definition

of exact OAR boundaries is subjected to observer interpre-

tation, new studies should adhere to existing delineation

guidelines.

102

Nevertheless, with the introduction of addi-

tional image modalities, such as the MR, the boundaries of

OARs should become easier to interpret.

To conclude, the spinal cord, brainstem and major salivary

glands (the parotid and submandibular glands) are the most

studied OARs in the H&N region, however, more experi-

ments should be conducted in the future for auto-segmenta-

tion of the pharyngeal constrictor muscles, larynx and

cervical esophagus with the cricopharyngeal inlet that are

important for RT planning.

4.C. Image database

To account for the anatomical and disease-related vari-

ability among different patients as well as for the variability

in the image acquisition settings, auto-segmentation methods

must be validated on a preferably large number of images

and patients to ensure reliable statistical results. In general,

the current trend shows an increasing number of cases being

included in evaluation databases, which is mostly due to the

application of state-of-the-art machine learning methods,

such as DL, which require relatively large training datasets.

Image databases should include representative clinical

Medical Physics, 47 (9), September 2020

e942 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e942

samples, with images from various acquisition setups and of

patients with different tumors according to their localization

and stage. However, images should retain certain common

characteristics (e.g., imaging sequence, field of view, image

noise), otherwise auto-segmentation may become too chal-

lenging. Still, objective comparison of different auto-seg-

mentation methods is often difficult, because they were

evaluated on different image databases, or on a different set

of annotations representing reference OAR delineations. As

the construction of a representative set of samples requires a

lot of effort, many such databases remain proprietary and

represent a valuable research advantage.

Besides using proprietary databases, evaluation should be

performed also on publicly available image databases to

ensure an objective comparison to existing approaches.

Among the publicly available CT image databases, PDDCA

has been already used in several studies

45,54–56,60,69,70

because it was devised for a computational challenge that set

benchmarks for auto-segmentation of OARs in the H&N

region, while TCIA-RT

and StructSeg have yet to gain visi-

bility. As it was shown that MR images provide valuable sup-

port to CT image auto-segmentation, or can be treated as

standalone in the case of MR-only RT planning, public MR

image databases have recently surfaced, such as the RT-

MAC

116

or MRI-RT,

105

which is augmented with CT images

of the same patients.

To conclude, several image databases with the correspond-

ing ground truth are currently publicly available and should

be used for an independent performance evaluation of OAR

auto-segmentation approaches. In the future, there is a need

for such databases to evolve, that is, to include a large number

of cases and reference delineations, preferably performed by

multiple observers from different institutions and at multiple

times, so as to enable a proper evaluation of multimodal

auto-segmentation methods.

4.D. Methodology

For OAR auto-segmentation in the H&N region, ABAS

is still the prevailing methodological approach, and has

been as such implemented in several commercial tools for

RT planning.

5,66,119

However, its segmentation performance

highly depends on the range of anatomical variations that

can be observed in the library of atlases, which can be

built up from previously treated patients or, if used, built

into the commercial software. As a result, ABAS may per-

form poorly for cases that differ from the library of

atlases,

therefore making the selection of the most appro-

priate atlases a challenging task. For most OARs, perfect

ABAS results cannot be reasonably expected, however, the

performance of a level corresponding to clinical quality

can be consistently expected given a large atlas database

under the assumption of perfect atlas selection.

139

It was

shown that ABAS reaches its upper performance limit with

the inclusion of 10–20 atlases,

23,67,140

and that it generally

underperforms for small and/or thin OARs (e.g., swallow-

ing muscles).

Another drawback is its long execution

time due to atlas registration, which limits on-line clinical

applications.

Recently, the focus has shifted toward machine learning,

with DL approaches for H&N OAR auto-segmentation start-

ing to emerge as early as in 2016,

and have been consider-

ably increasing in number since (Fig. 1). When compared to

ABAS, DL-based auto-segmentation requires considerably

less time for on-line applications, but is associated with a

high computational burden in the off-line training phase,

where currently up to a few days or more may be required to

complete the model training. Moreover, the training set of

images has to be quite large, but the actual number depends

on image quality and representativeness, and can be reduced

by applying different training set augmentation techniques

(e.g., intensity and geometrical transformations of original

images). The underlying DL model is, in comparison to

ABAS, also more robust because it can be trained with all

available data, including patients with metal artifacts and

diverse anatomy.

The main advantage of DL-based auto-

segmentation is in its ability to systematically learn the most

adequate features for segmentation from a set of annotated

training images, and then automatically search for the same

features in a previously unseen image. Although this proved

to result in the best overall segmentation performance,

it is

not without drawbacks. For example, the most popular DL-

based medical image auto-segmentation architecture, the U-

Net,

can result in many false positives if the approximate

location and size of the observed OAR is not constrained

beforehand. As a result, state-of-the-art techniques from the

field of artificial intelligence (e.g., attention learning,

adversarial learning

) are constantly being explored and uti-

lized to improve its performance.

141

Both ABAS and DL-based auto-segmentation are based

on reference OAR delineations in the given image database,

which may, however, not represent the ground truth. If the

cases included in the image database are not representative

for the actual OAR segmentation task, or if the corresponding

manual delineations are of low quality and inconsistent, the

underlying DL model will either fail to train or produce

inconsistent segmentations. Therefore, attention needs to be

given to the choice of image database and to reduce the intra-

and interobserver variability of reference delineations, for

example, by including publicly available databases

112 ,113

and

adhering to OAR delineation guidelines.

102

To conclude, while ABAS was the dominating approach for

segmenting OARs in the H&N region in the past, current

approaches have shifted to DL, resulting in a superior segmen-

tation performance. Moreover, DL-based auto-segmentation is

expected to become even more sophisticated through the

inclusion of methodological advances in the field of artificial

intelligence,

142

and even more powerful from the perspective

of being trained on larger and more diverse image databases.

4.E. Ground truth

To generate the ground truth, manual delineation of OARs

by human experts is still the most common approach,

Medical Physics, 47 (9), September 2020

e943 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e943

although it has been recognized as a very tedious and time-

consuming task. For the delineation of ground truth contours,

it is strongly recommended to follow the recently introduced

guidelines,

102

which have been formed as a consensus of dif-

ferent professional associations and groups,

††

and also incor-

porate guidelines that have been introduced in the

past.

124,125,127

However, even if guidelines are followed, the

delineation is still biased by subjective observer interpreta-

tion, and therefore it is strongly recommended to perform

basic observer training with joint delineation review ses-

sions,

143,144

and to include additional modalities to improve

the visibility of structure boundaries.

144

Moreover, to increase the reliability of statistical results

related to the methodology testing in the clinical context, the

ground truth should be provided from multiple experts per-

forming the delineation on multiple time occasions, therefore

enabling the evaluation of the variability among and within

the observers, that is, the inter- and intraobserver variability,

respectively. In a study where manual H&N OAR delin-

eations of eight different observers from CT and MR images

of 20 subjects were compared to ABAS, it was reported that

manual delineations and ABAS generated structures of simi-

lar volume with no statistically significant difference in vol-

ume overlap, however, the observers exhibited higher

variation with respect to tubular structures (e.g., optic chiasm,

optic nerves).

On the other hand, a different study evaluated

32 multi-institution delineations of six OARs from a single

CT image, and reported a significant delineation variability

among observers that consequently caused large differences

in the planned radiation doses, with the most variable organs

being the brainstem and the two parotid glands.

143

Similarly,

in a multi-institutional study where eight observers manually

delineated 20 OARs from 16 CT images, statistically signifi-

cant interobserver delineation variability as well as differ-

ences in dosimetric parameters were reported for all OARs,

however, both could be reduced for most OARs by manually

editing the results of ABAS, in particular for the brainstem,

spinal cord, cochleae, temporo-mandibular joints, larynx, and

pharyngeal constrictor muscles.

145

On the other hand, a high

agreement was reported for auto-segmentations of 13 OARs

from 125 CT images that were independently obtained at

seven different institutions with the same commercial RT

planning system but with different institution-specific set-

tings.

Nevertheless, the variability in manual as well as auto-seg-

mentation results cannot be completely eliminated because

each individual observer is exposed to his/her subjective bias

that is conditioned by experience (i.e., novice vs expert), and

because imaging protocols and setups as well as RT protocols

and planning systems vary greatly across institutions.

146

For a

particular OAR, the observer variability imposes the upper

limit for auto-segmentation performance, as we cannot expect

any auto-segmentation result to overcome the obtained con-

sensus among the ground truth delineations. Although man-

ual correction of auto-segmentation boundaries is a less labor

intensive approach for ground truth generation, it contains

auto-segmentation bias and is therefore not the most appro-

priate reference for performing auto-segmentation evaluation.

On the other hand, the ground truth can be relatively easily

obtained by using phantom objects, synthetic images, or

cadaver sections,

67,89,121,147

however, they represent unrealis-

tic surrogates for patient imaging and were in fact not present

in the reviewed studies.

To conclude, delineation guidelines should be followed for

the ground truth generation, and participation of multiple

experts from multiple institutions is recommended for a reli-

able reporting of the intra/interobserver variability.

4.F. Performance metrics

When reporting the geometric accuracy of auto-segmenta-

tion results, there is unfortunately no universal consensus

about the corresponding performance metrics. Moreover, var-

ious mutually incompatible definitions and different nomen-

clatures make the comparison of auto-segmentation results

relatively difficult.

129

As there is a strong need for an agreed-

upon metrics, which would allow an exact comparison of

results and eliminate the need for specifying its definition in

each new study, we would recommend the nomenclature and

definitions presented in Table VI.

For reporting the volumetric overlap of two segmentation

masks, we advise a mandatory use of the Dice coefficient.

Although the Jaccard index is an established volumetric coef-

ficient and has been reported in a few studies,

59,67,96

it is

redundant because it can be calculated from the Dice coeffi-

cient

‡‡

. Other variations of the volumetric coefficient provide

additional insight into the segmentation performance from

the perspective of binary classification, specifically the

degree of over- or under-segmentation, but their interpretation

may be ambiguous. For example, in the case of reporting the

specificity, a dilemma about the calculation of true negatives

(the set complement in its definition in Table VI) may arise.

On the other hand, sensitivity is the metrics of choice in the

case we want to reduce the number of voxels that are missing

from the resulting segmentation (i.e., false negatives), even if

at the expense of adding voxels (i.e., false positives).

Although volumetric metrics may result in a high overlap,

clinically relevant differences between segmentation bound-

aries may still exist, which are important in RT planning

because they are used to compute the radiation dose distribu-

tion. The mismatches in boundary segments that encompass

††

Radiotherapy Oncology Group for Head and Neck (GORTEC),

France; The Danish Head and Neck Cancer Group (DAHANCA),

Denmark; Head and Neck Cancer Group of the European Organiza-

tion for Research and Treatment of Cancer (EORTC), European

Union; Hong Kong Nasopharyngeal Cancer Study Group

(HKNPCSG), Hong Kong; National Cancer Research Institute

(NCRI), UK; National Cancer Institute of Canada Clinical Trials

Group (NCIC CTG), Canada; NRG Oncology Group (NRG), USA;

Trans Tasman Radiation Oncology Group (TROG), Australia.

‡‡

Jaccard index: JI =|A∩B|/|A∪B|; Dice coefficient: DC =2|A∩B|/(|

A|+|B|); DC =200%JI/(100%+JI); JI =DC/(200%DC).

Medical Physics, 47 (9), September 2020

e944 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e944

a volumetrically small but eventually important regions of

interest can be, to a certain degree, captured by surface coef-

ficients,

which measure the overlap of the corresponding

mask surfaces. While surface coefficients may gain a wider

adoption among the overlap metrics in the future, especially

if different values of the neighborhood distance sare

explored simultaneously, a consensus needs to be made about

their usage, with the surface Dice coefficient being the most

appropriate due to its bidirectional (i.e., symmetric) proper-

ties.

Any overlap metrics should be accompanied with at least

one distance metrics, which provides complementary infor-

mation about the segmentation boundaries by measuring the

spatial separation between the corresponding surfaces. The

Hausdorff distance measures the maximum point-to-point

distance between two segmentation masks, and it originates

from a proper mathematical metrics to measure the distance

between two subsets in a metric space. However, because it is

very sensitive to outliers, the 95-percentile version of this

metrics may be alternatively used to robustly suppress their

influence. On the other hand, two-dimensional computation

of metrics, such as in the case of the slice-wise Hausdorff dis-

tance, is not appropriate for volumetric segmentation. In the

case of the average surface distance, we recommend to report

the average symmetric surface distance because it equally

takes into account all possible point-to-surface distances and

is bidirectional (i.e., symmetric). On the other hand, both the

maximum and mid-value versions of the average surface dis-

tance unnecessarily use two different point-to-surface weight-

ing factors, while the average distance to agreement is

unidirectional. The variations of the signed surface distance

can be used to deduce consistent over- or under-segmenta-

tion, however, they are unable to detect the overall boundary

mismatch when either over- or under-segmentation regions

are present in an approximately equal quantity, because they

cancel out. In general, distance metrics perform better when

the observed structures are small, and are especially efficient

for structures with a high surface-to-volume ratio (e.g., tubu-

lar structures such as the spinal cord, optic nerve and optic

chiasm, and the pharyngeal constrictor muscles) and cases

where otherwise acceptable small boundary variations result

in a large relative volume discrepancy (e.g., the pharyngeal

constrictor muscles). Other reported metrics, such as the vol-

ume difference

35,93,94

or distance/variation of mass cen-

ters,

29,52,94

do not represent meaningful overlap or distance

measurements, and are therefore not proper to evaluate seg-

mentation results.

It has to be noted that, for a specific OAR, the reported per-

formance metrics only evaluate how close is the obtained seg-

mentation mask to its corresponding ground truth. Although

they represent a powerful tool for general method comparison,

they overlook the potential consequences of segmentation

errors from the clinical perspective. However, a method named

LinSEM

148

has been recently developed from the premise that

an ideal segmentation metrics should reflect the degree of clin-

ical acceptability directly from its values, and show the same

acceptability meaning with the same value for structures of

different shape, size, and form. The method combines, in a lin-

ear manner, the commonly used segmentation performance

metrics (i.e., the Dice coefficient, Jaccard index, and Haus-

dorff distance) with the clinical acceptability, which was pro-

vided by an expert observer (i.e., a subjective score from 1 to

5). By performing experiments on CT images including OARs

from the H&N region (i.e., the right parotid gland, mandible,

and cervical esophagus), it was concluded that the Jaccard

index has the most linear relationship with the acceptability

before actual linearization, while the Dice coefficient and

Hausdorff distance exhibit a significant improvement in

acceptability meaning from the perspective of an ideal met-

rics-to-acceptability relationship.

148

To conclude, the Dice coefficient is the standard volumet-

ric coefficient for reporting the overlap of two segmentation

masks, and it should be always accompanied with at least one

distance metrics, preferably the Hausdorff distance (or its 95-

percentile version) and the average symmetric surface dis-

tance. Future research should focus on combining existing

geometrical performance metrics with clinical acceptability

scores and risk assessments into a new class of metrics for

the purpose of augmenting the quantitative evaluation of seg-

mentation performance.

4.G. Segmentation performance

Although the auto-segmentation methods do not always

provide clinically acceptable results, their performance is

constantly improving due to the application of new technolo-

gies. The auto-segmentation of OARs and subsequent manual

corrections require considerably less time than direct manual

delineation

19,119

and reduce the intra/interobserver variabil-

ity.

145

However, a direct comparison of the segmentation per-

formance among different methods is difficult, mostly

because they were, in general, not evaluated on the same

image databases. The comparison is therefore often affected

by different image acquisition setups (e.g., imaging sequence,

field of view), image properties (e.g., size, resolution, noise),

manual delineation guidelines and patient cohorts. Moreover,

the studies report different performance metrics, focus on dif-

ferent OARs or even do not provide a detailed statistical

description of the corresponding ground truth.

The results reported by state-of-the-art techniques indicate

that auto-segmentation of OARs in the H&N region is feasi-

ble to be clinically implemented into an automated RT plan-

ning system. However, from the perspective of RT, both

target volume and OAR segmentation has direct clinical

implications. Apart from the geometrical agreement with the

corresponding ground truth, auto-segmentation results have

to be evaluated also from the perspective of their dosimetric

impact, because even if the geometric differences are small,

the impact on the final dose distribution may still be clinically

relevant. As a result, the geometrical performance metrics are

not sufficient to predict the dosimetric impact of auto-seg-

mentation inaccuracies. For example, it was shown that the

interobserver variability in manual delineations of OARs

from the H&N region (e.g., the brainstem, brain, parotid

Medical Physics, 47 (9), September 2020

e945 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e945

glands, mandible, and spinal cord) can lead to substantially

different dosimetric plans.

143,145,149

However, for several

OARs (e.g., the brainstem, spinal cord, cochlea, temporo-

mandibular joint, larynx and pharyngeal constrictor muscles),

the consistency in dosimetric plans can be improved by

reducing the interobserver variability, for example, by manu-

ally editing the results of ABAS,

90,145,150

which was shown to

produce clinically acceptable RT plans from the perspective

of dosimetric impact.

Similar conclusions were drawn in a

study that applied DL-based auto-segmentation,

and

reported little effect on the OAR dose despite the variation in

the Dice coefficient, indicating that imperfect geometrical

performance metrics do not necessarily result in inferior

OAR dosimetry.

Although the average radiation dose was,

for specific OARs (i.e., the pharyngeal constrictor muscles),

significantly higher for the DL-based than for manually

defined RT plans, these differences were not considered to be

clinically relevant.

On the other hand, a study evaluated RT

plans, obtained from expert manual delineations of several

H&N OARs, against those obtained by a knowledge-based

planning system, which is based on a preconfigured model

inferred from a cohort of past RT plans that were judged as

optimal.

151,152

A weak correlation between the geometric per-

formance metrics (i.e., the Dice coefficient, Hausdorff dis-

tances, volume differences, and centroid distances) and

dosimetric indices (i.e., dose to the hottest 98% of the planning

target volume and mean OAR dose) was reported, indicating

that the geometric performance metrics are not appropriate for

estimating the dosimetric impact.

152

However, besides obser-

ver variability in manual delineation, other factors may affect

the RT plan, such as the changes in the location and size of the

observed OARs due to RT effects, or the random and system-

atic patient setup er rors due to multiple RT sessions. In a study

where reference manual delineations were randomly perturbed

to simulate delineation variability and combined with simu-

lated patient setup variability at random magnitudes, it was

concluded that the dosimetric impact of the delineation vari-

ability is overstated when considered in isolation from the

setup variability, and that it depends largely on the OAR dis-

tance from the target volume.

153

Nevertheless, it has to be

noted that the dosimetric impact of OAR auto-segmentation is

always compared to the dosimetric impact of manual OAR

delineation, which is inherently subjected to observer variabil-

ity. Future studies on H&N OAR auto-segmentation should

therefore report, besides multiple geometric performance met-

rics, also metrics related to the dosimetric impact to encom-

pass clinically relevant endpoints for RT planning.

Nevertheless, the analysis of the reported results indicates

that the performance of OAR auto-segmentation in the H&N

region is, if we consider as clinically acceptable the results

with the Dice coefficient above 90% and average surface dis-

tance below 1.5 mm, currently adequate for several OARs,

including the parotid glands, brainstem, brain, cerebrum and

cerebellum, temporal lobes, spinal cord, eyeballs and vitreous

humor, mandible, oral cavity, and cochlea (Table VII).

48,60,97

According to the reported interobserver variability, there may

still be room for improvements in auto-segmentation of the

salivary glands, especially if performed on MR images.

the other hand, the eyeballs can be segmented relatively accu-

rately due to their spherical geometry, while the optic nerves

and optic chiasm can come close to the ground truth in terms

of the distance but not overlap metrics.

66,88

For the pharyn-

geal constrictor muscles, larynx and cervical esophagus with

the cricopharyngeal inlet, unfortunately not enough studies

have been conducted to draw relevant conclusions. Therefore,

it is expected that these OARs will receive more focus in the

future, especially because of their importance in the process

of the H&N RT planning. On the other hand, it has to be

again pointed out that all auto-segmentation results are com-

pared to corresponding reference segmentations, and their

definition is subjected to observer variability, meaning that

the reasonably achievable performance is not ideal segmenta-

tion, for example, it is not realistic to expect that the Dice

coefficient will reach 100% or that the Hausdorff and average

surface distance will drop to zero.

To conclude, the best performing methods achieve clini-

cally acceptable auto-segmentation for several H&N OARs,

even if manual corrections may still be needed, but certainly

they reduce the overall delineation time and observer variabil-

ity. To better evaluate the segmentation performance, future

studies should focus also on the dosimetric impact to provide

clinically relevant endpoints for RT planning.

5. CONCLUSIONS

We performed a systematic review of OAR auto-segmenta-

tion for H&N RT planning from 2008 to date. Besides outlin-

ing, analyzing and categorizing the relevant publications

within this field, we have provided also a critical discussion

of the corresponding advantages and limitations. The main

conclusions that may not only assist in the introduction to the

field but also be a valuable resource for studying existing or

developing new methods and evaluation strategies are as fol-

lows: (a) Image modality —Both CT and MR image modali-

ties are being exploited for the task, but the potential of the

MR image modality for auto-segmentation of several soft tis-

sues should be explored more in the future. (b) OAR —The

spinal cord, brainstem, and major salivary glands (the parotid

and submandibular glands) are the most studied OARs, how-

ever, more experiments should be conducted for auto-seg-

mentation of the pharyngeal constrictor muscles, larynx, and

cervical esophagus with the cricopharyngeal inlet that are

important for RT planning. (c) Image database —Several

image databases with the corresponding ground truth are cur-

rently publicly available and should be used for an indepen-

dent performance evaluation of OAR auto-segmentation

approaches, however, they should be augmented with data

from multiple observers and multiple institutions. (d)

Methodology —While ABAS was dominating in the past,

current approaches have shifted to DL, which resulted in

superior performance, and are expected to become even more

methodologically sophisticated and trained on larger image

databases. (e) Ground truth —Delineation guidelines should

be followed for the ground truth generation, and participation

Medical Physics, 47 (9), September 2020

e946 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e946

of multiple experts from multiple institutions is recom-

mended for a reliable reporting of the intra/inter-observer

variability. (f) Performance metrics —The Dice coefficient

as the standard volumetric overlap metrics should be always

accompanied with at least one distance metrics, preferably

the Hausdorff distance (or its 95-percentile version) and the

average symmetric surface distance, and future research

should focus on combining them with clinical acceptability

scores and risk assessments. (g) Segmentation performance

—The best performing methods achieve clinically acceptable

auto-segmentation for several OARs, even if manual correc-

tions may still be needed, but certainly they reduce the overall

delineation time and observer variability, however, future

studies should focus also on the dosimetric impact to provide

clinically relevant endpoints for RT planning.

ACKNOWLEDGMENTS

This work was supported by the Slovenian Research

Agency (ARRS) under grants J2-1732, P2-0232 and P3-0307.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

Author to whom correspondence should be addressed. Electronic mail:

tomaz.vrtovec@fe.uni-lj.si.

REFERENCES

1. Bray F, Ferlay J, Soerjomataram I, Siegel R, Torre L, Jemal A. Global

cancer statistics 2018: GLOBOCAN estimates of incidence and mortal-

ity worldwide for 36 cancers in 185 countries. CA Cancer J Clin.

2018;68:394–424.

2. Borras J, Barton M, Grau C, et al. The impact of cancer incidence and

stage on optimal utilization of radiotherapy: methodology of a popula-

tion based analysis by the ESTRO-HERO project. Radiother Oncol.

2015;116:45–50.

3. Vinod S, Jameson M, Min M, Holloway L. Uncertainties in volume

delineation in radiation oncology: a systematic review and recommen-

dations for future studies. Radiother Oncol. 2016;121:169–179.

4. Chaney E, Pizer S. Autosegmentation of images in radiation oncology.

J Am Coll Radiol. 2009;6:455–458.

5. Sharp G, Fritscher K, Pekar V, et al. Vision 20/20: perspectives on

automated image segmentation for radiotherapy. Med Phys.

2014;41:050902.

6. Sahiner B, Pezeshk A, Hadjiiski L, et al. Deep learning in medical

imaging and radiation therapy. Med Phys. 2019;46:e1–e36.

7. Seo H, Khuzani M, Vasudevan V, et al. Machine learning techniques for

biomedical image segmentation: an overview of technical aspects and

introduction to state-of-art applications. Med Phys. 2020;47:e148–e167.

8. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional neural net-

works for biomedical image segmentation. In:Medical Image Comput-

ing and Computer-Assisted Intervention - MICCAI 2015. Volume 9351

of LNCS. Springer; 2015:234–241.

9. C

ßicßek O, Abdulkadir A, Lienkamp S, Brox T, Ronneberger O. 3D U-

Net: learning dense volumetric segmentation from sparse annotation.

In: Medical Image Computing and Computer-Assisted Intervention -

MICCAI 2016, volume 9901 of LNCS. Springer; 2016:424–432.

10. Milletari F, Navab N, Ahmadi S-A. V-Net: fully convolutional neural

networks for volumetric medical image segmentation. In: Fourth Inter-

national Conference on 3D Vision - 3DV 2016. IEEE; 2016:565–571.

11. Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional

encoder-decoder architecture for image segmentation. IEEE Trans Pat-

tern Anal Mach Intell. 2017;39:2481–2495.

12. Kamnitsas K, Ledig C, Newcombe V, et al. Efficient multi-scale 3D

CNN with fully connected CRF for accurate brain lesion segmentation.

Med Image Anal. 2017 36:61–78.

13. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille A. DeepLab:

semantic image segmentation with deep convolutional nets. Atrous con-

volution, and fully connected CRFs. IEEE Trans Pattern Anal Mach

Intell. 2018;40:834–848.

14. Chen H, Dou Q, Yu L, Qin J, Heng P-A. VoxResNet: deep voxelwise

residual networks for brain segmentation from 3D MR images. Neu-

roimage. 2018;170:446–455.

15. He K, Gkioxari G, Doll

ar P, Girshick R. Mask R-CNN. IEEE Trans

Pattern Anal Mach Intell. 2020;42:386–397.

16. Meyer P, Noblet V, Mazzara C, Lallement A. Survey on deep learning

for radiotherapy. Comput Biol Med. 2018;98:126–146.

17. Thompson R, Valdes G, Fuller C, et al. Artificial intelligence in

radiation oncology imaging. Int J Radiat Oncol Biol Phys.2018;

102:1159–1161.

18. Boldrini L, Bibault J-E, Masciocchi C, Shen Y, Bittner MI. Deep

learning: a review for the radiation oncologist. Front Oncol.2019;

9:977.

19. Lim J, Leech M. Use of auto-segmentation in the delineation of target

volumes and organs at risk in head and neck. Acta Oncol. 2016;55:

799–806.

20. Kosmin M, Ledsam J, Romera-Paredes B, et al. Rapid advances in

auto-segmentation of organs at risk and target volumes in head and

neck cancer. Radiother Oncol. 2019;135:130–140.

21. Cardenas C, Yang J, Anderson B, Court L, Brock K. Advances in auto-

segmentation. Semin Radiat Oncol. 2019;29:185–197.

22. Wong J, Fong A, McVicar N, et al. Comparing deep learning-based

auto-segmentation of organs at risk and clinical target volumes to

expert inter-observer variability in radiotherapy planning. Radiother

Oncol. 2020;144:152–158.

23. van Dijk L, Van den Bosch L, Aljabar P et al. Improving automatic

delineation for head and neck organs at risk by deep learning contour-

ing. Radiother Oncol. 2020;142:115–123.

24. Gou S, Tong N, Qi S, Yang S, Chin R, Sheng K. Self-channel-and-spa-

tial-attention neural network for automated multi-organ segmentation

on head and neck CT images. Phys Med Biol. 2020.

25. de Ruijter J, van Sambeek M, van de Vosse F, Lopata R. Automated 3D

geometry segmentation of the healthy and diseased carotid artery in free-

hand, probe tracked ultrasound images. Med Phys. 2020;47:1034–1047.

26. Vandewinckele L, Willems S, Robben D, etal. Segmentation of head-

and-neck organs-at-risk in longitudinal CT scans combining deformable

registrations and convolutional neural networks. Comput Methods Bio-

mech Biomed Eng Imaging Vis. 2020.

27. Fung N, Hung W, Sze C, Lee M, Ng W. Automatic segmentation for

adaptive planning in nasopharyngeal carcinoma IMRT: time, geometri-

cal, and dosimetric analysis. Med Dosim. 2020;45:60–65.

28. Lei Y, Harms J, Dong X, et al. Organ-at-risk (OAR) segmentation in

head and neck CT using U-RCNN. In: SPIE Medical Imaging 2020:

Computer-Aided Diagnosis. Volume 11314. SPIE; 2020:1131444.

29. van der Heyden B, Wohlfahrt P, Eekers D, et al. Dual-energy CT for

automatic organs-at-risk segmentation in brain-tumor patients using a

multi-atlas and deep-learning approach. Sci Rep. 2019;9:4126.

30. Tang H, Chen X, Liu Y, et al. Clinically applicable deep learning

framework for organs at risk delineation in CT images. Sci Rep.

2019;1:480–491.

31. Wang Y, Zhao L, Song Z, Wang M. Organ at risk segmentation in head

and neck CT images by using a two-stage segmentation framework

based on 3D U-Net. IEEE Access. 2019;7:144591–144602.

32. van der Veen J, Willems S, Deschuymer S, et al. Benefits of deep

learning for delineation of organs at risk in head and neck cancer.

Radiother Oncol. 2019;138:68–74.

33. Sun Y, Shi H, Zhang S, Wang P, Zhao W, Zhou X, Yuan K. Accurate

and rapid CT image segmentation of the eyes and surrounding organs

for precise radiotherapy. Med Phys. 2019;46:2214–2222.

34. Huang C, Badiei M, Seo H, et al. Atlas based segmentations via semi-

supervised diffeomorphic registrations. arXiv 1911.10417; 2019.

Medical Physics, 47 (9), September 2020

e947 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e947

35. Haq R, Berry S, Deasy J, Hunt M, Veeraraghavan H. Dynamic multi-

atlas selection based consensus segmentation of head and neck struc-

tures from CT images. Med Phys. 2019;46:5612–5622.

36. Rhee D, Cardenas C, Elhalawani H, et al. Automatic detection of con-

touring errors using convolutional neural networks. Med Phys.

2019;46:5086–5097.

37. Zhong T, Huang X, Tang F, Liang S, Deng X, Zhang Y. Boosting-based

cascaded convolutional neural networks for the segmentation of CT

organs-at-risk in nasopharyngeal carcinoma. Med Phys. 2019;46:5602–

5611.

38. Agn M, Rosensch€

old P, Puonti O, et al. A modality-adaptive method

for segmenting brain tumors and organs-at-risk in radiation therapy

planning. Med Image Anal. 2019;54:220–237.

39. Qiu B, Guo J, Kraeima J. Automatic segmentation of the mandible

from computed tomography scans for 3D virtual surgical planning

using the convolutional neural network. Phys Med Biol.

2019;64:1750.

40. Tong N, Gou S, Yang S, Cao M, Sheng K. Shape constrained fully con-

volutional DenseNet with adversarial training for multiorgan segmenta-

tion on head and neck CT and low-field MR images. Med Phys.

2019;46:2669–2682.

41. Torosdagli N, Liberton D, Verma P, Sincan M, Lee J, Bagci U. Deep

geodesic learning for segmentation and anatomical landmarking. IEEE

Trans Med Imaging. 2019;38:919–931.

42. Chan J, Kearney V, Haaf S, et al. A convolutional neural network algo-

rithm for automatic segmentation of head and neck organs-at-risk using

deep lifelong learning. Med Phys. 2019;46:2204–2213.

43. Chen H, Lu W, Chen M, et al. A recursive ensemble organ segmenta-

tion (REOS) framework: application in brain radiotherapy. Phys Med

Biol. 2019;64:025015.

44. Lee H, Lee E, Kim N, et al. Clinical evaluation of commercial atlas-

based auto-segmentation in the head and neck region. Front Oncol.

2019;9:239.

45. H€

ansch A, Schwier M, Gass T, et al. Evaluation of deep learning meth-

ods for parotid gland segmentation from CT images. J Med Imaging.

2019;6:011005.

46. Zhu W, Huang Y, Zeng L, et al. AnatomyNet: deep learning for fast

and fully automated whole-volume segmentation of head and neck

anatomy. Med Phys. 2019;46:576–589.

47. Liang S, Tang F, Huang X, et al. Deep-learning-based detection and

segmentation of organs at risk in nasopharyngeal carcinoma computed

tomographic images for radiotherapy planning. Eur Radiol. 2019;29:

1961 –19 67.

48. Men K, Geng H, Cheng C, et al. Technical note: more accurate and

efficient segmentation of organs-at-risk in radiotherapy with convolu-

tional neural networks cascades. Med Phys. 2019;46:286–292.

49. Tappeiner E, Pr€

oll S, H€

onig M, et al. Multi-organ segmentation of the

head and neck area: an efficient hierarchical neural networks approach.

Int J Comput Assist Radiol Surg. 2019;14:745–754.

50. van Rooij W, Dahele M, Ribeiro Brandao Het al. Deep learning-based

delineation of head and neck organs-at-risk: geometric and dosimetric

evaluation. Int J Radiat Oncol Biol Phys. 2019;104:677–684.

51. Wu X, Udupa J, Tong Y, et al. AAR-RT –a system for auto-contouring

organs at risk on CT images for radiation therapy planning: principles,

design, and large-scale evaluation on head-and-neck and thoracic cancer

cases. Med Image Anal. 2019;54:45–62.

52. Ayyalusamy A, Vellaiyan S, Subramanian S, et al. Auto-segmentation

of head and neck organs at risk in radiotherapy and its dependence on

anatomic similarity. Radiat Oncol J. 2019;37:134–142.

53. Willems S, Crijns W, La Greca Saint-Esteven A, et al. Clinical imple-

mentation of DeepVoxNet for auto-delineation of organs at risk in head

and neck cancer patients in radiotherapy. In: Clinical Image-Based Pro-

cedures: Translational Research in Medical Imaging - CLIP 2018, vol-

ume 11041 of LNCS. Springer; 2018:223–232.

54. Ren X, Xiang L, Nie D, et al. Interleaved 3D-CNNs for joint segmenta-

tion of small-volume structures in head and neck CT images. Med Phys.

2018;45:2063–2075.

55. Tong N, Gou S, Yang S, Ruan D, Sheng K. Fully automatic multi-organ

segmentation for head and neck cancer radiotherapy using shape repre-

sentation model constrained fully convolutional neural networks. Med

Phys. 2018;45:4558–4567.

56. Wang Z, Wei L, Wang L, Gao Y, Chen W, Shen D. Hierarchical vertex

regression-based segmentation of head and neck CT images for

radiotherapy planning. IEEE Trans Image Process. 2018;27:

923–937.

57. Mo

cnik D, Ibragimov B, Xing L, et al. Segmentation of parotid glands

from registered CT and MR images. Phys Med. 2018;52:33–41.

58. Kieselmann J, Kamerling C, Burgos N, et al. Geometric and dosimetric

evaluations of atlas-based segmentation methods of MR images in the

head and neck region. Phys Med Biol. 2018;63:145007.

59. Meillan N, Bibault J-E, Vautier J, et al. Automatic intracranial segmen-

tation: is the clinician still needed? Technol Cancer Res Treat.

2018;17:1–7.

60. Nikolov S, Blackwell S, Mendes R, et al. Deep learning to achieve clin-

ically applicable segmentation of head and neck anatomy for radiother-

apy. arXiv 1809.04430; 2018.

61. Yang J, Haas B, Fang R, et al. Atlas ranking and selection for automatic

segmentation of the esophagus from CT scans. Phys Med Biol.

2017;62:9140–9158.

62. Aghdasi N, Li Y, Berens A, Harbison R, Moe K, Hannaford B. Effi-

cient orbital structures segmentation with prior anatomical knowledge.

J Med Imaging. 2017;4:034501.

63. Urban S, Tan

acs A. Atlas-based global and local RF segmentation of

head and neck organs on multimodal MRI images. In: International

Symposium on Image Signal Processing Analysis - ISPA 2017. IEEE;

2017:99–103.

64. Wachinger C, Brennan M, Sharp G, Golland P. Efficient descriptor-

based segmentation parotid glands with nonlocal means. IEEE Trans

Biomed Eng. 2017;64:1492–1502.

65. Ibragimov B, Xing L. Segmentation of organs-at-risks in head and neck

CT images using convolutional neural networks. Med Phys.

2017;44:547–557.

66. Raudaschl P, Zaffino P, Sharp GC, et al. Evaluation of segmentation

methods on head and neck CT: auto-segmentation challenge 2015. Med

Phys. 2017;44:2020–2036.

67. Van de Velde J, Wouters J, Vercauteren T, et al. Optimal number of

atlases and label fusion for automatic multi-atlas-based brachial plexus

contouring in radiotherapy treatment planning. Radiat Oncol.

2016;11:1.

68. Wardman K, Prestwich R, Gooding M, Speight R. The feasibility of

atlas-based automatic segmentation of MRI for H&N radiotherapy

planning. J Appl Clin Med Phys. 2016;17:146–154.

69. Zaffino P, Raudaschl P, Fritscher K, Sharp G, Spadea M. Technical

note: plastimatch mabs, an open source tool for automatic image seg-

mentation. Med Phys. 2016;43:5155.

70. Fritscher K, Raudaschl P, Zaffino P, Spadea M, Sharp G. Deep neural

networks for fast segmentation of 3D medical images. In: Medical

Image Computing and Computer-Assisted Intervention - MICCAI 2016,

volume 9901 of LNCS. Springer; 2016:158–165.

71. Awan M, Dyer B, Kalpathy-Cramer J, et al. Auto-segmentation of the

brachial plexus assessed with TaCTICS –a software platform for rapid

multiple-metric quantitative evaluation of contours. Acta Oncol.

2015;54:562–566.

72. Wachinger C, Fritscher K, Sharp G, Golland P. Contour-driven

atlas-based segmentation. IEEE Trans Med Imaging. 2015;34:2492–

2505.

73. Hoang DA, Eminowicz G, Mendes R, et al. Validation of clinical

acceptability of an atlas-based segmentation algorithm for the delin-

eation of organs at risk in head and neck cancer. Med Phys. 2015;42:

5027–5034.

74. Dolz J, Leroy H, Reyns N, Massoptier L, Vermandel M. A fast and

fully automated approach to segment optic nerves on MRI and its appli-

cation to radiosurgery. In: International Symposium on Biomedical

Imaging - ISBI 2015, pages 1102–1105. IEEE; 2015.

75. Yang X, Wu N, Cheng G, et al. Automated segmentation of the parotid

gland based on atlas registration and machine learning: a longitudinal

MRI study in head-and-neck radiation therapy. Int J Radiat Oncol Biol

Phys. 2014;90:1225–1233.

76. Fritscher K, Peroni M, Zaffino P, Spadea M, Schubert R, Sharp G.

Automatic segmentation of head and neck CT images for radiotherapy

treatment planning using multiple atlases. Statistical appearance

models, and geodesic active contours. Med Phys. 2014;41:051910.

Medical Physics, 47 (9), September 2020

e948 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e948

77. Thomson D, Boylan C, Liptrot T, et al. Evaluation of an automatic seg-

mentation algorithm for definition of head and neck organs at risk.

Radiat Oncol. 2014;9:173.

78. Sj€

oberg C, Johansson S, Ahnesj€

o A. How much will linked deformable

registrations decrease the quality of multi-atlas segmentation fusions?

Radiat Oncol. 2014;9:251.

79. Harrigan R, Panda S, Asman A, et al. Robust optic nerve segmentation

on clinically acquired computed tomography. J Med Imaging.

2014;1:034006.

80. Walker G, Awan M, Tao R, et al. Prospective randomized double-

blind study of atlas-based organ-at-risk autosegmentation-assisted

radiation planning in head and neck cancer. Radiother Oncol.

2014;112:321–325.

81. Yang J, Amini A, Williamson R, et al. Automatic contouring of bra-

chial plexus using a multi-atlas approach for lung cancer radiation ther-

apy. Pract Radiat Oncol. 2013;3: 139–e147.

82. Zhu M, Bzdusek K, Brink C, et al. Multi-institutional quantitative eval-

uation and clinical validation of smart probabilistic image contouring

engine (SPICE) autosegmentation of target structures and normal tis-

sues on computer tomography images in the head and neck, thorax,

liver, and male pelvis areas. Int J Radiat Oncol Biol Phys.

2013;87:809–816.

83. Cheng G, Yang X, Wu N, Xu Z, Zhao H, Wang Y, Liu T. Multi-

atlas-based segmentation of the parotid glands of MR images in

patients following head-and-neck cancer radiotherapy. In: Medical

Imaging 2013: Computer-Aided Diagnosis, volume 8670, SPIE;

2013:86702Q.

84. Daisne J-F., Blumhofer A. Atlas-based automatic segmentation of head

and neck organs at risk and nodal target volumes: a clinical validation.

Radiat Oncol. 2013;8:154.

85. Chen A, Niermann K, Deeley M, Dawant B. Evaluation of multiple-at-

las-based strategies for segmentation of the thyroid gland in head and

neck CT images for IMRT. Phys Med Biol. 2012;57:93–111.

86. Qazi A, Pekar V, Kim J, Xie J, Breen S, Jaffray D. Auto-segmentation

of normal and target structures in head and neck CT images: a feature-

driven model-based approach. Med Phys. 2011;38:6160–6170.

87. Teguh D, Levendag P, Voet P, et al. Clinical validation of atlas-based

auto-segmentation of multiple target volumes and normal tissue (swal-

lowing/mastication) structures in the head and neck. Int J Radiat Oncol

Biol Phys. 2011;81:950–957.

88. Noble J, Dawant B. An atlas-navigated optimal medial axis and

deformable model algorithm (NOMAD) for the segmentation of the

optic nerves and chiasm in MR and CT images. Med Image Anal.

2011;15:877–884.

89. Deeley M, Chen A, Datteri R, et al. Comparison of manual and auto-

matic segmentation methods for brain structures in the presence of

space-occupying lesions: a multi-expert study. Phys Med Biol.

2011;56:4557–4577.

90. Tsuji S, Hwang A, Weinberg V, Yom S, Quivey J, Xia P. Dosimetric

evaluation of automatic segmentation for adaptive IMRT for head-and-

neck cancer. Int J Radiat Oncol Biol Phys. 2010;77:707–714.

91. Pekar V, Allaire S, Qazi A, Kim J, Jaffray D. Head and neck auto-seg-

mentation challenge: segmentation of the parotid glands. In: Medical

Image Analysis for the Clinic: A Grand Challenge 2010, MICCAI;

2010:273–280.

92. Pekar V, Allaire S, Kim J, Jaffray D. Head and neck auto-segmentation

challenge. MIDAS J. 2009;5:5.

93. Sims R, Isambert A, Gr

egoire V, et al. A pre-clinical assessment of an

atlas-based automatic segmentation tool for the head and neck. Radio-

ther Oncol. 2009;93:474–478.

94. Isambert A, Dhermain F, Bidault F, et al. Evaluation of an atlas-based

automatic segmentation software for the delineation of brain organs at

risk in a radiation therapy clinical context. Radiother Oncol.

2008;87:93–99.

95. Han X, Hoogeman M, Levendag P, et al. Atlas-based auto-segmenta-

tion of head and neck CT images. In: Medical Image Computing and

Computer-Assisted Intervention - MICCAI 2008, volume 5242 of

LNCS, Springer; 2008:434–441.

96. Bekes G, M

at

eE,Ny



ul L, Kuba A, Fidrich M. Geometrical model-

based segmentation of the organs of sight on CT images. Med Phys.

2008;35:735–743.

97. Fortunati V, Verhaart R, Niessen W, Veenland J, Paulides M, van

Walsum T. Automatic tissue segmentation of head and neck MR

images for hyperthermia treatment planning. Phys Med Biol.

2015;60:6547–6562.

98. Verhaart R, Fortunati V, Verduijn G, van Walsum T, Veenland J, Pau-

lides M. CT-based patient modeling for head and neck hyperthermia

treatment planning: manual versus automatic normal-tissue-segmenta-

tion. Radiother Oncol. 2014;111:158–163.

99. Fortunati V, Verhaart R, van der Lijn F, et al. Tissue segmentation of

head and neck CT images for treatment planning: a multiatlas

approach combined with intensity modeling. Med Phys. 2013;40:

071905.

100. Schneider U, Pedroni E, Lomax A. The calibration of CT Hounsfield

units for radiotherapy treatment planning. Phys Med Biol. 1996;41:

111–124.

101. Pereira G, Traughber M, Muzic R. The role of imaging in radiation

therapy planning: past, present, and future. Biomed Res Int.2014;

2014:231090.

102. Brouwer C, Steenbakkers R, Bourhis J, et al. CT-based delineation of

organs at risk in the head and neck region: DAHANCA, EORTC,

GORTEC, HKNPCSG, NCIC CTG, NCRI, NRG Oncology and TROG

consensus guidelines. Radiother Oncol. 2015;117:83–90.

103. Leibfarth S, M€

onnich D, Welz S, et al. A strategy for multimodal

deformable image registration to integrate PET/MR into radiotherapy

treatment planning. Acta Oncol. 2013;52:1353–1359.

104. Fortunati V, Verhaart R, Angeloni F, et al. Feasibility of multimodal

deformable registration for head and neck tumor treatment planning.

Int J Radiat Oncol Biol Phys. 2014;90:85–93.

105. Joint Head and Neck MRI-Radiotherapy Development Cooperative.

Prospective quantitative quality assurance and deformation estimation

of MRI-CT image registration in simulation of head and neck radiother-

apy patients. Clin Transl Radiat Oncol. 2019;18:120–127.

106. Peroni M, Ciardo D, Spadea M, et al. Automatic segmentation and

online virtualCT in head-and-neck adaptive radiation therapy. Int J

Radiat Oncol Biol Phys. 2012;84:e427–e433.

107. Hvid C, Elstrxxxom C, Jensen K, Alber M, Grau C. Accuracy of

software-assisted contour propagation from planning CT to cone

beam CT in head and neck radiotherapy. Acta Oncol. 2016;55:1324–

1330.

108. Wang T, Bradshaw GB, Beitler J, et al. Optimal virtual monoenergetic

image in “TwinBeam”dual-energy CT for organs-at-risk delineation

based on contrast-noise-ratio in head-and-neck radiotherapy. J Appl

Clin Med Phys. 2019;20:121–128.

109. Bhandare N, Mendenhall W. A literature review of late complications

of radiation therapy for head and neck cancers: incidence and dose

response. J Nucl Med Radiat Ther. 2012;S2:009.

110. Siddiqui F, Movsas B. Management of radiation toxicity in head and

neck cancers. Semin Radiat Oncol. 2017;27:340–349.

111. Strojan P, Hutcheson K, Eisbruch A, et al. Treatment of late sequelae

after radiotherapy for head and neck cancer. Cancer Treat Rev.

2017;59:79–92.

112. Clark K, Vendt B, Smith K, et al. The Cancer Imaging Archive

(TCIA): maintaining and operating a public information repository. J

Digit Imaging. 2013;26:1045–1057.

113. Prior F, Smith K, Sharma A, et al. The public cancer radiology

imaging collections of The Cancer Imaging Archive. Sci Data.

2017;4:170124.

114. Valli

eres M, Kay-Rivest E, Perrin L, et al. Radiomics strategies for risk

assessment of tumour failure in head-and-neck cancer. Sci Rep.

2017 ; 7: 10117.

115. Grossberg A, Mohamed A, Elhalawani H, et al. Imaging and clinical

data archive for head and neck squamous cell carcinoma patients treated

with radiotherapy. Sci Data. 2018;5:180173.

116. Cardenas C, Mohamed A, Yang J, et al. Head and neck cancer patient

images for determining auto-segmentation accuracy in T2-weighted

magnetic resonance imaging through expert manual segmentations.

Med Phys. 2020;47:2317–2322.

117. Fedorov A, Clunie D, Ulrich E, et al. DICOM for quantitative imaging

biomarker development: a standards based approach to sharing clinical

data and structured PET/CT analysis results in head and neck cancer

research. PeerJ. 2016;4:e2057.

Medical Physics, 47 (9), September 2020

e949 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e949

118. Beichel R, Smith BJ, Bauer C, et al. Multi-site quality and variability

analysis of 3D FDG PET segmentations based on phantom and clinical

image data. Med Phys. 2017;44:479–496.

119. La Macchia M, Fellin F, Amichetti M, et al. Systematic evaluation of

three different commercial software solutions for automatic segmenta-

tion for adaptive therapy in head-and-neck, prostate and pleural cancer.

Radiat Oncol. 2012;7:160.

120. Kearney V, Chan J, Valdes G, Solberg T, Yom S. The application of

artificial intelligence in the IMRT planning process for head and neck

cancer. Oral Oncol. 2018;87:111–116.

121. Van de Velde J, Audenaert E, Speleers B, et al. An anatomically vali-

dated brachial plexus contouring method for intensity modulated radia-

tion therapy planning. Int J Radiat Oncol Biol Phys. 2013;87:802–808.

122. Sun Y, Yu XL, Luo W, et al. Recommendation for a contouring method

and atlas of organs at risk in nasopharyngeal carcinoma patients receiving

intensity-modulated radiotherapy. Radiother Oncol. 2014;110:390–397.

123. Kong F, Ritter T, Quint D, et al. Consideration of dose limits for organs

at risk of thoracic radiotherapy: atlas for lung, proximal bronchial tree,

esophagus, spinal cord, ribs, and brachial plexus. Int J Radiat Oncol

Biol Phys. 2011;81:1442–1457.

124. Christianen M, Langendijk J, Westerlaan H, van de Water T, Bijl H.

Delineation of organs at risk involved in swallowing for radiotherapy

treatment planning. Radiother Oncol. 2011;101:394–402.

125. van de Water T, Bijl H, Westerlaan H, Langendijk J. Delineation guide-

lines for organs at risk involved in radiation-induced salivary dysfunc-

tion and xerostomia. Radiother Oncol. 2009;93:545–552.

126. Pacholke H, Amdur R, Schmalfuss I, Louis D, Mendenhall W. Con-

touring the middle and inner ear on radiotherapy planning scans. Am J

Clin Oncol. 2005;28:143–147.

127. Hall W, Guiou M, Lee N, et al. Development and validation of a stan-

dardized method for contouring the brachial plexus: preliminary dosi-

metric analysis among patients treated with IMRT for head-and-neck

cancer. Int J Radiat Oncol Biol Phys. 2008;72:1362–1367.

128. Chen W, Zhang H, Zhang W, et al. Development of a contouring guide

for three different types of optic chiasm: a practical approach. J Med

Imaging Radiat Oncol. 2019;63:657–664.

129. Taha A, Hanbury A. Metrics for evaluating 3D medical image segmentation:

analysis, selection, and tool. BMC Med Imaging. 2015;15:29.

130. Maier-Hein L, Eisenmann M, Reinke A, et al. Why rankings of

biomedical image analysis competitions should be interpreted with

care. Nat Commun. 2018;9:5217.

131. Armato S, Tahir B, Sharp G. AAPM grand challenges symposium.

Med Phys. 2019;46:e485–e486.

132. Iglesias J, Sabuncu M. Multi-atlas segmentation of biomedical images:

a survey. Med Image Anal. 2015;24: 205–219.

133. Edmund J, Nyholm T. A review of substitute CT generation for MRI-

only radiation therapy. Radiat Oncol. 2017;12:28.

134. Adjeiwaah M, Bylund M, Lundman J, et al. Dosimetric impact of MRI

distortions: a study on head and neck cancers. Int J Radiat Oncol Biol

Phys. 2019;103:994–1003.

135. Raaymakers BW, J€

urgenliemk-Schulz IM, Bol GH, et al. First patients

treated with a 1.5 T MRI-Linac: clinical proof of concept of a high-pre-

cision, high-field MRI guided radiotherapy treatment. Phys Med Biol.

2017;62:L41–L50.

136. Lei Y, Harms J, Wang T, et al. MRI-only based synthetic CT generation

using dense cycle consistent generative adversarial networks. Med

Phys. 2019;46:3565–3581.

137. Klages P, Benslimane I, Riyahi S, et al. Patch-based generative adver-

sarial neural network models for head and neck MR-only planning.

Med Phys. 2020;47:626–642.

138. Comelli A, Stefano A, Bignardi S, et al. Active contour algorithm with

discriminant analysis for delineating tumors in positron emission

tomography. Artif Intell Med. 2019;94:67–78.

139. Schipaanboord B, Boukerroui D, Peressutti D, et al. Can atlas-based

auto-segmentation ever be perfect? Insights from extreme value theory.

IEEE Trans Med Imaging. 2019;38:99–106.

140. Larrue A, Gujral D, Nutting C, Gooding M. The impact of the number

of atlases on the performance of automatic multi-atlas contouring. Phys

Med. 2015;31:e30.

141. Ibtehaz N, Rahman M. MultiResUNet: rethinking the U-Net architec-

ture for multimodal biomedical image segmentation. Neural Netw.

2020;121:74–87.

142. Zhang X, Wang L, Yang D, etal. Generalizing deep learning for medi-

cal image segmentation to unseen domains via deep stacked transfor-

mation. IEEE Trans Med Imaging;. 2020;(in press).

143. Nelms B, Tom

e W, Robinson G, Wheeler J. Variations in the contour-

ing of organs at risk: test case from a patient with oropharyngeal can-

cer. Int J Radiat Oncol Biol Phys. 2012;82:368–378.

144. Brouwer C, Steenbakkers R, van den Heuvel E, et al. 3D variation

in delineation of head and neck organs at risk. Radiat Oncol.

2012;7:32.

145. Tao C-J, Yi J-L, Chen N-Y, et al. Multi-subject atlas-based auto-

segmentation reduces interobserver variation and improves dosimet-

ric parameter consistency for organs at risk in nasopharyngeal car-

cinoma: a multi-institution clinical study. Radiother Oncol.

2015;115:407–411.

146. Krayenbuehl J, Zamburlini M, Ghandour S, et al. Planning comparison

of five automated treatment planning solutions for locally advanced

head and neck cancer. Radiat Oncol. 2018;13:170.

147. Graves Y, Smith AA, McIlvena D, et al. A deformable head and neck

phantom with in-vivo dosimetry for adaptive radiotherapy quality assur-

ance. Med Phys. 2015;42:1490–1497.

148. Li J, Udupa J, Tong Y, Wang L, Torigian D. LinSEM: linearizing seg-

mentation evaluation metrics for medical images. Med Image Anal.

2020;60:101601.

149. Loo S, Martin W, Smith P, Cherian S, Roques T. Interobserver variation

in parotid gland delineation: a study of its impact on intensity-modu-

lated radiotherapy solutions with a systematic review of the literature.

Br J Radiol. 2012;85:1070–1077.

150. Voet P, Dirkx M, Teguh D, Hoogeman M, Levendag P, Heijmen B.

Does atlas-based autosegmentation of neck levels require subsequent

manual contour editing to avoid risk of severe target underdosage? A

dosimetric analysis. Radiother Oncol. 2011;98:373–377.

151. Delaney A, Dahele M, Slotman B, Verbakel W. Is accurate contouring

of salivary and swallowing structures necessary to spare them in head

and neck VMAT plans? Radiother Oncol. 2018;127:190–196.

152. Lim T, Gillespie E, Murphy J, Moore K. Clinically oriented contour

evaluation using dosimetric indices generated from automated knowl-

edge-based planning. Int J Radiat Oncol Biol Phys. 2019;103:1251–

1260.

153. Aliotta E, Nourzadeh H, Siebers J. Quantifying the dosimetric impact

of organ-at-risk delineation variability in head and neck radiation ther-

apy in the context of patient setup uncertainty. Phys Med Biol

2019;64:135020.

Medical Physics, 47 (9), September 2020

e950 Vrtovec et al.: Auto-segmentation of OARs for H&N RT planning e950

A preview of this full-text is provided by Wiley.

Learn more

Content available from Medical Physics

This content is subject to copyright. Terms and conditions apply.

Quantitative susceptibility mapping based basal ganglia segmentation via AGSeg: leveraging active gradient guiding mechanism in deep learning

Article

Jan 2023

A deep learning-based 3D Prompt-nnUnet model for automatic segmentation in brachytherapy of postoperative endometrial carcinoma

Article

Full-text available

Apr 2024
J APPL CLIN MED PHYS

Purpose To create and evaluate a three‐dimensional (3D) Prompt‐nnUnet module that utilizes the prompts‐based model combined with 3D nnUnet for producing the rapid and consistent autosegmentation of high‐risk clinical target volume (HR CTV) and organ at risk (OAR) in high‐dose‐rate brachytherapy (HDR BT) for patients with postoperative endometrial carcinoma (EC). Methods and materials On two experimental batches, a total of 321 computed tomography (CT) scans were obtained for HR CTV segmentation from 321 patients with EC, and 125 CT scans for OARs segmentation from 125 patients. The numbers of training/validation/test were 257/32/32 and 87/13/25 for HR CTV and OARs respectively. A novel comparison of the deep learning neural network 3D Prompt‐nnUnet and 3D nnUnet was applied for HR CTV and OARs segmentation. Three‐fold cross validation and several quantitative metrics were employed, including Dice similarity coefficient (DSC), Hausdorff distance (HD), 95th percentile of Hausdorff distance (HD95%), and intersection over union (IoU). Results The Prompt‐nnUnet included two forms of parameters Predict‐Prompt (PP) and Label‐Prompt (LP), with the LP performing most similarly to the experienced radiation oncologist and outperforming the less experienced ones. During the testing phase, the mean DSC values for the LP were 0.96 ± 0.02, 0.91 ± 0.02, and 0.83 ± 0.07 for HR CTV, rectum and urethra, respectively. The mean HD values (mm) were 2.73 ± 0.95, 8.18 ± 4.84, and 2.11 ± 0.50, respectively. The mean HD95% values (mm) were 1.66 ± 1.11, 3.07 ± 0.94, and 1.35 ± 0.55, respectively. The mean IoUs were 0.92 ± 0.04, 0.84 ± 0.03, and 0.71 ± 0.09, respectively. A delineation time < 2.35 s per structure in the new model was observed, which was available to save clinician time. Conclusion The Prompt‐nnUnet architecture, particularly the LP, was highly consistent with ground truth (GT) in HR CTV or OAR autosegmentation, reducing interobserver variability and shortening treatment time.

Towards more precise automatic analysis: a systematic review of deep learning-based multi-organ segmentation

Article

Full-text available

Jun 2024
BIOMED ENG ONLINE

Accurate segmentation of multiple organs in the head, neck, chest, and abdomen from medical images is an essential step in computer-aided diagnosis, surgical navigation, and radiation therapy. In the past few years, with a data-driven feature extraction approach and end-to-end training, automatic deep learning-based multi-organ segmentation methods have far outperformed traditional methods and become a new research topic. This review systematically summarizes the latest research in this field. We searched Google Scholar for papers published from January 1, 2016 to December 31, 2023, using keywords “multi-organ segmentation” and “deep learning”, resulting in 327 papers. We followed the PRISMA guidelines for paper selection, and 195 studies were deemed to be within the scope of this review. We summarized the two main aspects involved in multi-organ segmentation: datasets and methods. Regarding datasets, we provided an overview of existing public datasets and conducted an in-depth analysis. Concerning methods, we categorized existing approaches into three major classes: fully supervised, weakly supervised and semi-supervised, based on whether they require complete label information. We summarized the achievements of these methods in terms of segmentation accuracy. In the discussion and conclusion section, we outlined and summarized the current trends in multi-organ segmentation.

A Label‐Efficient Framework for Automated Sinonasal CT Segmentation in Image‐Guided Surgery

Article

Jun 2024

Objective Segmentation, the partitioning of patient imaging into multiple, labeled segments, has several potential clinical benefits but when performed manually is tedious and resource intensive. Automated deep learning (DL)‐based segmentation methods can streamline the process. The objective of this study was to evaluate a label‐efficient DL pipeline that requires only a small number of annotated scans for semantic segmentation of sinonasal structures in CT scans. Study Design Retrospective cohort study. Setting Academic institution. Methods Forty CT scans were used in this study including 16 scans in which the nasal septum (NS), inferior turbinate (IT), maxillary sinus (MS), and optic nerve (ON) were manually annotated using an open‐source software. A label‐efficient DL framework was used to train jointly on a few manually labeled scans and the remaining unlabeled scans. Quantitative analysis was then performed to obtain the number of annotated scans needed to achieve submillimeter average surface distances (ASDs). Results Our findings reveal that merely four labeled scans are necessary to achieve median submillimeter ASDs for large sinonasal structures—NS (0.96 mm), IT (0.74 mm), and MS (0.43 mm), whereas eight scans are required for smaller structures—ON (0.80 mm). Conclusion We have evaluated a label‐efficient pipeline for segmentation of sinonasal structures. Empirical results demonstrate that automated DL methods can achieve submillimeter accuracy using a small number of labeled CT scans. Our pipeline has the potential to improve pre‐operative planning workflows, robotic‐ and image‐guidance navigation systems, computer‐assisted diagnosis, and the construction of statistical shape models to quantify population variations. Level of Evidence N/A

A Review of Artificial Intelligence Application for Radiotherapy

Article

Full-text available

Jun 2024

Background and Purpose Artificial intelligence (AI) is a technique which tries to think like humans and mimic human behaviors. It has been considered as an alternative in a lot of human-dependent steps in radiotherapy (RT), since the human participation is a principal uncertainty source in RT. The aim of this work is to provide a systematic summary of the current literature on AI application for RT, and to clarify its role for RT practice in terms of clinical views. Materials and Methods A systematic literature search of PubMed and Google Scholar was performed to identify original articles involving the AI applications in RT from the inception to 2022. Studies were included if they reported original data and explored the clinical applications of AI in RT. Results The selected studies were categorized into three aspects of RT: organ and lesion segmentation, treatment planning and quality assurance. For each aspect, this review discussed how these AI tools could be involved in the RT protocol. Conclusions Our study revealed that AI was a potential alternative for the human-dependent steps in the complex process of RT.

DEVELOPING NOVEL IMAGING MODALITIES FOR EARLY CANCER DETECTION

Article

Full-text available

Mar 2024

Shatha F. Murad

An ever-expanding suite of cancer imaging tools is being created with the help of AI and ML. To design the best tool, it's important to include experts from other fields to determine the right use case, then test and refine the tool thoroughly before implementing it into healthcare systems. Showcasing significant advancements in the field, this interdisciplinary study. We go over the pros and downsides of using AI and ML for cancer imaging, some things to keep in mind when turning algorithms into tools for widespread use, and how to build an ecosystem that will help AI and ML expand in this field

Deep Learning for Nasopharyngeal Carcinoma Segmentation in Magnetic Resonance Imaging: A Systematic Review and Meta-Analysis

Article

Full-text available

May 2024

Nasopharyngeal carcinoma is a significant health challenge that is particularly prevalent in Southeast Asia and North Africa. MRI is the preferred diagnostic tool for NPC due to its superior soft tissue contrast. The accurate segmentation of NPC in MRI is crucial for effective treatment planning and prognosis. We conducted a search across PubMed, Embase, and Web of Science from inception up to 20 March 2024, adhering to the PRISMA 2020 guidelines. Eligibility criteria focused on studies utilizing DL for NPC segmentation in adults via MRI. Data extraction and meta-analysis were conducted to evaluate the performance of DL models, primarily measured by Dice scores. We assessed methodological quality using the CLAIM and QUADAS-2 tools, and statistical analysis was performed using random effects models. The analysis incorporated 17 studies, demonstrating a pooled Dice score of 78% for DL models (95% confidence interval: 74% to 83%), indicating a moderate to high segmentation accuracy by DL models. Significant heterogeneity and publication bias were observed among the included studies. Our findings reveal that DL models, particularly convolutional neural networks, offer moderately accurate NPC segmentation in MRI. This advancement holds the potential for enhancing NPC management, necessitating further research toward integration into clinical practice.

Fully automated explainable abdominal CT contrast media phase classification using organ segmentation and machine learning

Article

Full-text available

Apr 2024
MED PHYS

Background Contrast‐enhanced computed tomography (CECT) provides much more information compared to non‐enhanced CT images, especially for the differentiation of malignancies, such as liver carcinomas. Contrast media injection phase information is usually missing on public datasets and not standardized in the clinic even in the same region and language. This is a barrier to effective use of available CECT images in clinical research. Purpose The aim of this study is to detect contrast media injection phase from CT images by means of organ segmentation and machine learning algorithms. Methods A total number of 2509 CT images split into four subsets of non‐contrast (class #0), arterial (class #1), venous (class #2), and delayed (class #3) after contrast media injection were collected from two CT scanners. Seven organs including the liver, spleen, heart, kidneys, lungs, urinary bladder, and aorta along with body contour masks were generated by pre‐trained deep learning algorithms. Subsequently, five first‐order statistical features including average, standard deviation, 10, 50, and 90 percentiles extracted from the above‐mentioned masks were fed to machine learning models after feature selection and reduction to classify the CT images in one of four above mentioned classes. A 10‐fold data split strategy was followed. The performance of our methodology was evaluated in terms of classification accuracy metrics. Results The best performance was achieved by Boruta feature selection and RF model with average area under the curve of more than 0.999 and accuracy of 0.9936 averaged over four classes and 10 folds. Boruta feature selection selected all predictor features. The lowest classification was observed for class #2 (0.9888), which is already an excellent result. In the 10‐fold strategy, only 33 cases from 2509 cases (∼1.4%) were misclassified. The performance over all folds was consistent. Conclusions We developed a fast, accurate, reliable, and explainable methodology to classify contrast media phases which may be useful in data curation and annotation in big online datasets or local datasets with non‐standard or no series description. Our model containing two steps of deep learning and machine learning may help to exploit available datasets more effectively.

Semi-supervised model based on implicit neural representation and mutual learning (SIMN) for multi-center nasopharyngeal carcinoma segmentation on MRI

Article

Apr 2024
COMPUT BIOL MED

Contouring aid tools in radiotherapy. Smoothing: the false friend

Article

Mar 2024
CLIN TRANSL ONCOL

Contouring accuracy is critical in modern radiotherapy. Several tools are available to assist clinicians in this task. This study aims to evaluate the performance of the smoothing tool in the ARIA system to obtain more consistent volumes. Eleven different geometric shapes were delineated in ARIA v15.6 (Sphere, Cube, Square Prism, Six-Pointed Star Prism, Arrow Prism, And Cylinder and the respective volumes at 45° of axis deviation (_45)) in 1, 3, 5, 7, and 10 cm side or diameter each. Post-processing drawing tools to smooth those first-generated volumes were applied in different options (2D-ALL vs 3D) and grades (1, 3, 5, 10, 15, and 20). These volumetric transformations were analyzed by comparing different parameters: volume changes, center of mass, and DICE similarity coefficient index. Then we studied how smoothing affected two different volumes in a head and neck cancer patient: a single rounded node and the volume delineating cervical nodal areas. No changes in data were found between 2D-ALL or 3D smoothing. Minimum deviations were found (range from 0 to 0.45 cm) in the center of mass. Volumes and the DICE index decreased as the degree of smoothing increased. Some discrepancies were found, especially in figures with cleft and spikes that behave differently. In the clinical case, smoothing should be applied only once throughout the target delineation process, preferably in the largest volume (PTV) to minimize errors. Smoothing is a good tool to reduce artifacts due to the manual delineation of radiotherapy volumes. The resulting volumes must be always carefully reviewed.

Head and neck cancer patient images for determining auto‐segmentation accuracy in T2‐weighted magnetic resonance imaging through expert manual segmentations

Article

Full-text available

Mar 2021
MED PHYS

Published in Medical Physics, Vol. 47, No. 5

Head and neck cancer patient images for determining auto‐segmentation accuracy in T2‐weighted magnetic resonance imaging through expert manual segmentations

Article

Full-text available

May 2020
MED PHYS

Purpose The use of magnetic resonance imaging (MRI) in radiotherapy treatment planning has rapidly increased due to its ability to evaluate patient’s anatomy without the use of ionizing radiation and due to its high soft tissue contrast. For these reasons, MRI has become the modality of choice for longitudinal and adaptive treatment studies. Automatic segmentation could offer many benefits for these studies. In this work, we describe a T2‐weighted MRI dataset of head and neck cancer patients that can be used to evaluate the accuracy of head and neck normal tissue auto‐segmentation systems through comparisons to available expert manual segmentations. Acquisition and validation methods T2‐weighted MRI images were acquired for 55 head and neck cancer patients. These scans were collected after radiotherapy computed tomography (CT) simulation scans using a thermoplastic mask to replicate patient treatment position. All scans were acquired on a single 1.5 T Siemens MAGNETOM Aera MRI with two large four‐channel flex phased‐array coils. The scans covered the region encompassing the nasopharynx region cranially and supraclavicular lymph node region caudally, when possible, in the superior–inferior direction. Manual contours were created for the left/right submandibular gland, left/right parotids, left/right lymph node level II, and left/right lymph node level III. These contours underwent quality assurance to ensure adherence to predefined guidelines, and were corrected if edits were necessary. Data format and usage notes The T2‐weighted images and RTSTRUCT files are available in DICOM format. The regions of interest are named based on AAPM’s Task Group 263 nomenclature recommendations (Glnd_Submand_L, Glnd_Submand_R, LN_Neck_II_L, Parotid_L, Parotid_R, LN_Neck_II_R, LN_Neck_III_L, LN_Neck_III_R). This dataset is available on The Cancer Imaging Archive (TCIA) by the National Cancer Institute under the collection “AAPM RT‐MAC Grand Challenge 2019” (https://doi.org/10.7937/tcia.2019.bcfjqfqb). Potential applications This dataset provides head and neck patient MRI scans to evaluate auto‐segmentation systems on T2‐weighted images. Additional anatomies could be provided at a later time to enhance the existing library of contours.

Machine learning techniques for biomedical image segmentation: An overview of technical aspects and introduction to state‐of‐art applications

Article

Full-text available

May 2020
MED PHYS

In recent years, significant progress has been made in developing more accurate and efficient machine learning algorithms for segmentation of medical and natural images. In this review article, we highlight the imperative role of machine learning algorithms in enabling efficient and accurate segmentation in the field of medical imaging. We specifically focus on several key studies pertaining to the application of machine learning methods to biomedical image segmentation. We review classical machine learning algorithms such as Markov random fields, k‐means clustering, random forest, etc. Although such classical learning models are often less accurate compared to the deep‐learning techniques, they are often more sample efficient and have a less complex structure. We also review different deep‐learning architectures, such as the artificial neural networks (ANNs), the convolutional neural networks (CNNs), and the recurrent neural networks (RNNs), and present the segmentation results attained by those learning models that were published in the past 3 yr. We highlight the successes and limitations of each machine learning paradigm. In addition, we discuss several challenges related to the training of different machine learning models, and we present some heuristics to address those challenges.

Rapid advances in auto-segmentation of organs at risk and target volumes in head and neck cancer

Article

Full-text available

Mar 2019
RADIOTHER ONCOL

Advances in technical radiotherapy have resulted in significant sparing of organs at risk (OARs), reducing radiation-related toxicities for patients with cancer of the head and neck (HNC). Accurate delineation of target volumes (TVs) and OARs is critical for maximising tumour control and minimising radiation toxicities. When performed manually, variability in TV and OAR delineation has been shown to have significant dosimetric impacts for patients on treatment. Auto-segmentation (AS) techniques have shown promise in reducing both inter-practitioner variability and the time taken in TV and OAR delineation in HNC. Ultimately, this may reduce treatment planning and clinical waiting times for patients. Adaptation of radiation treatment for biological or anatomical changes during therapy will also require rapid re-planning; indeed, the time taken for manual delineation currently prevents adaptive radiotherapy from being implemented optimally. We are therefore standing on the threshold of a transformation of routine radiotherapy planning via the use of artificial intelligence. In this article, we outline the current state-of-the-art for AS for HNC radiotherapy in order to predict how this will rapidly change with the introduction of artificial intelligence. We specifically focus on delineation accuracy and time saving. We argue that, if such technologies are implemented correctly, AS should result in better standardisation of treatment for patients and significantly reduce the time taken to plan radiotherapy.

Self-channel-and-spatial-attention neural network for automated multi-organ segmentation on head and neck CT images

Article

Full-text available

Dec 2020
PHYS MED BIOL

Accurate segmentation of organs-at-risk (OARs) is necessary for adaptive head and neck (H&N) cancer treatment planning but manual delineation is tedious, slow, and inconsistent. A Self-Channel-and-Spatial-Attention neural network (SCSA-Net) is developed for H&N OARs segmentation on CT images. To simultaneously ease the training and improve the segmentation performance, the proposed SCSA-Net utilizes the self-attention ability of the network. Spatial and channel-wise attention learning mechanisms are both employed to adaptively force the network to emphasize on the meaningful features and weaken the irrelevant features simultaneously. The proposed network was first evaluated on a public dataset, which includes 48 patients, then on a separate serial CT dataset, which contains ten patients who received weekly diagnostic fan-beam CT scans. On the second dataset, the accuracy of using SCSA-Net to track the parotid and submandibular gland volume changes during radiotherapy treatment was quantified. Dice similarity coefficient (DSC), positive predictive value (PPV), sensitivity (SEN), average surface distance (ASD), and 95%maximum surface distance (95SD) were calculated on the brainstem, optic chiasm, optic nerves, mandible, parotid glands, and submandibular glands to evaluate the proposed SCSA-Net. The proposed SCSA-Net consistently outperforms the state-of-the-art methods on the public dataset. Specifically, compared with the Res-Net and SE-Net, which is constructed by the Squeeze-and-Excitation block equipped Residual blocks, the DSC of the optic nerves and submandibular glands is improved by 0.06, 0.03 and 0.05, 0.04 by the SCSA-Net. Moreover, the proposed method achieves statistically significant improvements in terms of DSC on all and 8 of 9 OARs over Res-Net and SE-Net, respectively. The trained network was able to achieve good segmentation results on the serial dataset, but the results were further improved after fine-tuning of the model using the simulation CT images. For the parotids and submandibular glands, the volume changes of individual patients are highly consistent between the automated and manual segmentation (Pearson's Correlation 0.97-0.99). The proposed SCSA-Net is computationally efficient to perform segmentation (~2 seconds/CT).

Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation

Article

Full-text available

Feb 2020

Recent advances in deep learning for medical image segmentation demonstrate expert-level accuracy. However, application of these models in clinically realistic environments can result in poor generalization and decreased accuracy, mainly due to the domain shift across different hospitals, scanner vendors, imaging protocols, and patient populations etc. Common transfer learning and domain adaptation techniques are proposed to address this bottleneck. However, these solutions require data (and annotations) from the target domain to retrain the model, and is therefore restrictive in practice for widespread model deployment. Ideally, we wish to have a trained (locked) model that can work uniformly well across unseen domains without further training. In this paper, we propose a deep stacked transformation approach for domain generalization. Specifically, a series of n stacked transformations are applied to each image during network training. The underlying assumption is that the “expected” domain shift for a specific medical imaging modality could be simulated by applying extensive data augmentation on a single source domain, and consequently, a deep model trained on the augmented “big” data (BigAug) could generalize well on unseen domains. We exploit four surprisingly effective, but previously understudied, image-based characteristics for data augmentation to overcome the domain generalization problem. We train and evaluate the BigAug model (with n = 9 transformations) on three different 3D segmentation tasks (prostate gland, left atrial, left ventricle) covering two medical imaging modalities (MRI and ultrasound) involving eight publicly available challenge datasets. The results show that when training on relatively small dataset (n=10~32 volumes, depending on the size of the available datasets) from a single source domain: (i) BigAug models degrade an average of 11% (Dice score change) from source to unseen domain, substantially better than conventional augmentation (degrading 39%) and CycleGAN-based domain adaptation method (degrading 25%), (ii) BigAug is better than “shallower" stacked transforms (i.e. those with fewer transforms) on unseen domains and demonstrates modest improvement to conventional augmentation on the source domain, (iii) after training with BigAug on one source domain, performance on an unseen domain is similar to training a model from scratch on that domain when using the same number of training samples. When training on large datasets (n=465 volumes) with BigAug, (iv) application to unseen domains reaches the performance of state-of-the-art fully supervised models that are trained and tested on their source domains. These findings establish a strong benchmark for the study of domain generalization in medical imaging, and can be generalized to the design of highly robust deep segmentation models for clinical deployment.

Automated 3D geometry segmentation of the healthy and diseased carotid artery in free‐hand, probe tracked ultrasound images

Article

Full-text available

Jan 2020
MED PHYS

Purpose Rupture of an arterosclerotic plaque in the carotid artery is a major cause of stroke. Biomechanical analysis of plaques is under development aiming to aid the clinician in the assessment of plaque vulnerability. Patient‐specific three‐dimensional (3D) geometry assessment of the carotid artery, including the bifurcation, is required as input for these biomechanical models. This requires a high‐resolution, 3D, noninvasive imaging modality such as ultrasound (US). In this study, a high‐resolution two‐dimensional (2D) linear array in combination with a magnetic probe tracking device and automatic segmentation method was used to assess the geometry of the carotid artery. The advantages of using this system over a 3D ultrasound probe are its higher resolution (spatial and temporal) and its larger field of view. Methods A slow sweep (v = ± 5 mm/s) was made over the subject’s neck so that the full geometry of the bifurcated geometry of the carotid artery is captured. An automated segmentation pipeline was developed. First, the Star‐Kalman method was used to approximate the center and size of the vessels for every frame. Images were filtered with a Gaussian high‐pass filter before conversion into the 2D monogenic signals, and multiscale asymmetry features were extracted from these data, enhancing low lateral wall‐lumen contrast. These images, in combination with the initial ellipse contours, were used for an active deformable contour model to segment the vessel lumen. To segment the lumen–plaque boundary, Otsu’s automatic thresholding method was used. Distension of the wall due to the change in blood pressure was removed using a filter approach. Finally, the contours were converted into a 3D hexahedral mesh for a patient‐specific solid mechanics model of the complete arterial wall. Results The method was tested on 19 healthy volunteers and on 3 patients. The results were compared to manual segmentation performed by three experienced observers. Results showed an average Hausdorff distance of 0.86 mm and an average similarity index of 0.91 for the common carotid artery (CCA) and 0.88 for the internal and external carotid artery. For the total algorithm, the success rate was 89%, in 4 out of 38 datasets the ICA and ECA were not sufficient visible in the US images. Accurate 3D hexahedral meshes were successfully generated from the segmented images . Conclusions With this method, a subject‐specific biomechanical model can be constructed directly from a hand‐held 2D US measurement, within 10 min, with a minimal user input. The performance of the proposed segmentation algorithm is comparable to or better than algorithms previously described in literature. Moreover, the algorithm is able to segment the CCA, ICA, and ECA including the carotid bifurcation in transverse B‐mode images in both healthy and diseased arteries.

Advances in Auto-Segmentation

Article

Jul 2019
SEMIN RADIAT ONCOL

Manual image segmentation is a time-consuming task routinely performed in radiotherapy to identify each patient's targets and anatomical structures. The efficacy and safety of the radiotherapy plan requires accurate segmentations as these regions of interest are generally used to optimize and assess the quality of the plan. However, reports have shown that this process can be subject to significant inter- and intraobserver variability. Furthermore, the quality of the radiotherapy treatment, and subsequent analyses (ie, radiomics, dosimetric), can be subject to the accuracy of these manual segmentations. Automatic segmentation (or auto-segmentation) of targets and normal tissues is, therefore, preferable as it would address these challenges. Previously, auto-segmentation techniques have been clustered into 3 generations of algorithms, with multiatlas based and hybrid techniques (third generation) being considered the state-of-the-art. More recently, however, the field of medical image segmentation has seen accelerated growth driven by advances in computer vision, particularly through the application of deep learning algorithms, suggesting we have entered the fourth generation of auto-segmentation algorithm development. In this paper, the authors review traditional (nondeep learning) algorithms particularly relevant for applications in radiotherapy. Concepts from deep learning are introduced focusing on convolutional neural networks and fully-convolutional networks which are generally used for segmentation tasks. Furthermore, the authors provide a summary of deep learning auto-segmentation radiotherapy applications reported in the literature. Lastly, considerations for clinical deployment (commissioning and QA) of auto-segmentation software are provided.

Organ-at-Risk (OAR) segmentation in head and neck CT using U-RCNN

Article

Mar 2020
Proceedings of SPIE

Comparing deep learning-based auto-segmentation of organs at risk and clinical target volumes to expert inter-observer variability in radiotherapy planning

Article

Dec 2019

Background: Deep learning-based auto-segmented contours (DC) aim to alleviate labour intensive contouring of organs at risk (OAR) and clinical target volumes (CTV). Most previous DC validation studies have a limited number of expert observers for comparison and/or use a validation dataset related to the training dataset. We determine if DC models are comparable to Radiation Oncologist (RO) inter-observer variability on an independent dataset. Methods: Expert contours (EC) were created by multiple ROs for central nervous system (CNS), head and neck (H&N), and prostate radiotherapy (RT) OARs and CTVs. DCs were generated using deep learning-based auto-segmentation software trained by a single RO on publicly available data. Contours were compared using Dice Similarity Coefficient (DSC) and 95% Hausdorff distance (HD). Results: Sixty planning CT scans had 2-4 ECs, for a total of 60 CNS, 53 H&N, and 50 prostate RT contour sets. The mean DC and EC contouring times were 0.4 vs 7.7 min for CNS, 0.6 vs 26.6 min for H&N, and 0.4 vs 21.3 min for prostate RT contours. There were minimal differences in DSC and 95% HD involving DCs for OAR comparisons, but more noticeable differences for CTV comparisons. Conclusions: The accuracy of DCs trained by a single RO is comparable to expert inter-observer variability for the RT planning contours in this study. Use of deep learning-based auto-segmentation in clinical practice will likely lead to significant benefits to RT planning workflow and resources.

Auto‐segmentation of organs at risk for head and neck radiotherapy planning: From atlas‐based to deep learning methods

Abstract and Figures

Recommended publications

Clinically Applicable Segmentation of Head and Neck Anatomy for Radiotherapy: Deep Learning Algorith...

Parotid gland segmentation with nnU-Net: deployment scenario and inter-observer variability analysis

vOARiability: Interobserver and intermodality variability analysis in OAR contouring from head and n...

A slice classification model-facilitated 3D encoder-decoder network for segmenting organs at risk in...