Content uploaded by Claudio Sampieri
Author content
All content in this area was uploaded by Claudio Sampieri on Dec 27, 2022
Content may be subject to copyright.
ORIGINAL ARTICLE
Development of machine learning models for the
prediction of positive surgical margins in transoral
robotic surgery (TORS)
Andrea Costantino MD
1,2,3
| Claudio Sampieri MD
3,4,5
|
Francesca Pirola MD
1,2,3
| Armando De Virgilio MD, PhD
1,2
|
Se-Heon Kim MD, PhD
3
1
Department of Biomedical Sciences,
Humanitas University, Pieve Emanele
(MI), Italy
2
Otorhinolaryngology Unit, IRCCS
Humanitas Research Hospital, Rozzano
(MI), Italy
3
Department of Otorhinolaryngology,
Yonsei University College of Medicine,
Seoul, Korea
4
Unit of Otorhinolaryngology –Head and
Neck Surgery, IRCCS Ospedale Policlinico
San Martino, Genoa, Italy
5
Department of Surgical Sciences and
Integrated Diagnostics (DISC), University
of Genoa, Genoa, Italy
Correspondence
Se-Heon Kim, Department of
Otorhinolaryngology, Yonsei University
College of Medicine, 50-1 Yonsei-ro,
Seodaemun-gu, Seoul 03722, Republic of
Korea.
Email: shkimmd@yuhs.ac
Abstract
Purpose: To develop machine learning (ML) models for predicting positive
margins in patients undergoing transoral robotic surgery (TORS).
Methods: Data from 453 patients with laryngeal, hypopharyngeal, and oro-
pharyngeal squamous cell carcinoma were retrospectively collected at a
tertiary referral center to train (n=316) and validate (n=137) six two-class
supervised ML models employing 14 variables available pre-operatively.
Results: The accuracy of the six ML models ranged between 0.67 and 0.75,
while the measured AUC between 0.68 and 0.75. The ML algorithms showed
high specificity (range: 0.75–0.89) and low sensitivity (range: 0.26–0.64) in
detecting patients with positive margins after TORS. NPV was higher (range:
0.73–0.83) compared to PPV (range: 0.45–0.63). T classification and tumor site
were the most important predictors of positive surgical margins.
Conclusions: ML algorithms can identify patients with low risk of positive
margins and therefore amenable to TORS.
KEYWORDS
artificial intelligence, head and neck cancer, personalized medicine, robotic surgical
procedures, squamous cell carcinoma
1|INTRODUCTION
Transoral robotic surgery (TORS) was approved by the
Food and Drug Administration (FDA) in 2009, progres-
sively changing the treatment paradigm of head and neck
cancers.
1
TORS is now the most common primary treat-
ment for early-stage oropharyngeal cancer,
2–4
and it can
be also proposed for minimally invasive resections of
laryngeal and hypopharyngeal tumors.
5–7
Recent studies
showed that TORS is feasible in selected locally-advanced
(T3–T4) tumors, particularly when neoadjuvant chemo-
therapy (NCT) is applied, as proposed by our group.
8,9
The reduced surgical invasiveness that accompanies
robotic surgery allows radical treatment with excellent func-
tional results, if compared to most open approaches.
10,11
However, treatment-related toxicity is also dependent on
adjuvant treatments, that are administered in up to 70% of
patients based on pathological adverse features not always
predictable before surgery.
3
Although doses of postoperative
radiotherapy (RT) are usually lower than those of definitive
treatment, almost full RT dose with the addition of concur-
rent chemotherapy (CT)
12,13
is recommended in cases ofAndrea Costantino and Claudio Sampieri with equal contribution.
Received: 29 August 2022 Revised: 10 November 2022 Accepted: 5 December 2022
DOI: 10.1002/hed.27283
Head & Neck. 2022;1–10. wileyonlinelibrary.com/journal/hed © 2022 Wiley Periodicals LLC. 1
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
positive surgical margins. In this context, despite treatment
intensification might determine a survival advantage in
those patients with locally-advanced tumors,
14
multi-
modality treatment should be limited to reduce the effects
of its greater toxicity.
15
Especially, this applies to patients
with a preoperative higher risk of positive margins who
might benefit more from upfront chemo-radiotherapy
(CRT). Although the importance of negative surgical mar-
gins is clear, few data are available in the current literature
about predictors of margin status after TORS.
16,17
Moreover,
no studies have ever built a prediction model able to guide
the decision-making process in a more personalized
approach.
In this setting, machine learning (ML) models have
been recently proposed to improve the management of
patients with head and neck cancer.
18,19
In fact, ML is
being used to enhance current modeling by providing
more accurate and precise predictions for outcomes of
interest. In particular, ML algorithms enable computers
to learn from data and experiences, and to make predic-
tions about previously unanalyzed data.
20–23
Since the
presence of positive margins is key to the indication for
adjuvant therapy and knowing this risk in advance might
determine a preferable change in the treatment strategy,
the purpose of the present study was to develop different
ML models to predict the surgical margin status in
patients undergoing TORS.
2|METHODS
This study adheres to the Transparent Reporting of a
Multivariable Prediction Model for Individual Prognosis
or Diagnosis (TRIPOD) reporting guideline.
24
2.1 |Study design and population
A single-center retrospective study was carried out at
the Department of Otorhinolaryngology at Severance
Hospital—Yonsei University, Seoul (Republic of Korea).
The study followed the principles of the Helsinki Decla-
ration and was approved by the Institutional Review
Board. Informed consent was obtained from all patients.
All patients who underwent TORS between April
2008 and May 2022 were included. The inclusion criteria
were as follows: (1) age 18 years or older; (2) oropharyn-
geal, hypopharyngeal and laryngeal squamous cell carci-
noma; (3) successfully completed TORS; (4) surgical
margin status available from the final histopathological
report. Exclusion criteria were as follows: (1) occult
primary tumors; (2) re-resections for margins clearance;
(3) margin status not available or unclear. Patients who
underwent either upfront TORS or receive NCT before
surgery (according to the protocol previously described
8,9
)
were included, and this variable was taken into account
in the analysis. All robotic procedures were performed by
the senior surgeon (S.H.K.), which allowed the assess-
ment of the impact of the surgeon's experience on surgical
margins status.
2.2 |Data collection and pre-processing
Variables included in the analysis were: age, gender,
smoking and alcohol status, tumor site and subsite, clini-
cal T classification, mouth opening (in mm) and exposure
(3-point Likert scale as previously described
25
), previous
NCT and number of cycles, previous radiotherapy, the
sequence number of the procedure performed by the sur-
geon, and the robot type used during surgery (da Vinci
Si/Xi or da Vinci SP). Missing data for variables of inter-
est were analyzed and distinguished into the categories
“missing completely at random,”“missing at random,”
and “missing not at random.”Variables determined to be
missing at random were imputed using a linear or a logis-
tic regression method for continuous and categorical vari-
ables, respectively.
26,27
The final histopathological report
was used to define the surgical margin status. A margin
was labeled as positive if invasive carcinoma was found
at the inked margin of the resected specimen.
The dataset was randomly split using a 70:30 ratio,
whereby the ML algorithm was trained using 70% of the
available cases (training dataset) and tested using the
remaining 30% (testing dataset).
2.3 |Model training and validation
Six 2-class supervised ML decision models, which were
selected as the current and most frequently adopted predic-
tive model types in the literature, were used to predict the
presence of positive margins at the final histopathological
report. In particular, the training dataset was used to train
the following six ML-based models: C5.0 decision trees
(C50),
28
flexible discriminant analysis (FDA),
29
k-nearest
neighbor (kNN),
30
random forest (RF),
31
support vector
machines (SVM),
32
and extreme gradient boosting (XGB).
33
To ensure model stability and reduce bias, a 10-fold
cross-validation was performed during the training of all ML
algorithms. A dataset augmentation technique (Synthetic
Minority Oversampling Technique, or SMOTE) was adopted
during the training process to generate more samples for the
minority class to correct for class imbalance.
34
Random
hyperparameter tuning was performed to maximize the area
under the receiver operating characteristic (ROC) curve.
35
2COSTANTINO ET AL.
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
The classification performance of the ML algorithms
was measured on the testing data by comparing accuracy,
area under the curve (AUC), sensitivity, specificity,
positive predictive value (PPV), negative predictive value
(NPV), and F1 score (that is, a standard metric for ML
classifiers that combines precision and recall). A positive
TABLE 1 Characteristics of included patients
Variable Overall, N=453
Margin status
p-valueNegative,N=315 Positive,N=138
Age 61.8 (9.7) 61.2 (9.4) 63.2 (10.4) 0.072
Gender 0.008
Female 54 (12%) 46 (15%) 8 (5.8%)
Male 399 (88%) 269 (85%) 130 (94%)
Smoking 327 (72%) 219 (70%) 108 (78%) 0.056
Alcohol 278 (61%) 189 (60%) 89 (64%) 0.37
Site <0.001
Oropharynx 268 (59%) 207 (66%) 61 (44%)
Hypopharynx 104 (23%) 60 (19%) 44 (32%)
Larynx 81 (18%) 48 (15%) 33 (24%)
Subsite <0.001
Ariepiglottic fold 18 (4.0%) 11 (3.5%) 7 (5.1%)
Base of tongue 66 (15%) 47 (15%) 19 (14%)
Epiglottis 11 (2.4%) 5 (1.6%) 6 (4.3%)
False vocal cord 10 (2.2%) 5 (1.6%) 5 (3.6%)
Postcricoid area 5 (1.1%) 1 (0.3%) 4 (2.9%)
Posterior pharyngeal wall 25 (5.5%) 18 (5.7%) 7 (5.1%)
Pyriform sinus 76 (17%) 41 (13%) 35 (25%)
Soft palate 4 (0.9%) 3 (1.0%) 1 (0.7%)
Tonsil 198 (44%) 157 (50%) 41 (30%)
True vocal cord 40 (8.8%) 27 (8.6%) 13 (9.4%)
T classification <0.001
1 117 (26%) 95 (30%) 22 (16%)
2 174 (38%) 127 (40%) 47 (34%)
3 107 (24%) 64 (20%) 43 (31%)
4 55 (12%) 29 (9.2%) 26 (19%)
Exposure 0.15
Good 101 (22%) 76 (24%) 25 (18%)
Fair 265 (58%) 175 (56%) 90 (65%)
Poor 87 (19%) 64 (20%) 23 (17%)
Mouth opening (mm) 43.7 (14.5) 43.9 (13.9) 43.5 (15.8) 0.14
NCT 331 (73%) 234 (74%) 97 (70%) 0.38
NCT Cycles 2.4 (1.7) 2.4 (1.6) 2.4 (1.8) 0.48
Previous RT 13 (2.9%) 7 (2.2%) 6 (4.3%) 0.23
Robot 0.14
Multi-port 279 (62%) 187 (59%) 92 (67%)
Single-port 174 (38%) 128 (41%) 46 (33%)
Note: Data are presented as counts and percentage, or as means and standard deviations (SD).
Abbreviations: NCT, neo-adjuvant chemotherapy; RT, radiotherapy.
COSTANTINO ET AL.3
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 2 Results of the univariable and multivariable binary logistic regression model
Univariable analysis Multivariable analysis
Variable OR 95% CI p-value OR 95% CI p-value
Procedure 1 1.00–1.00 0.2
Age 1.02 1.00–1.04 0.047 1 0.98–1.02 0.9
Gender
Female –– ––
Male 2.78 1.34–6.51 0.01 1.88 0.87–4.56 0.13
Smoking
No ––
Yes 1.58 1.00–2.56 0.057
Alcohol
No ––
Yes 1.21 0.80–1.84 0.4
Site
Oropharynx –– ––
Hypopharynx 2.49 1.53–4.04 <0.001 2.48 1.43–4.32 0.001
Larynx 2.33 1.37–3.95 0.002 2.60 1.39–4.88 0.003
Subsit
Tonsil ––
Ariepiglottic fold 2.44 0.85–6.59 0.083
Base of tongue 1.55 0.81–2.89 0.2
Epiglottis 4.6 1.32–16.7 0.016
False vocal cord 3.83 1.02–14.4 0.041
Postcricoid area 15.3 2.19–304 0.016
Posterior pharyngeal wall 1.49 0.55–3.67 0.4
Pyriform sinus 3.27 1.86–5.79 <0.001
Soft palate 1.28 0.06–10.3 0.8
True vocal cord 1.84 0.85–3.84 0.11
T
1–– ––
2 1.6 0.91–2.87 0.11 1.86 1.03–3.42 0.04
3 2.9 1.60–5.38 <0.001 2.17 1.17–4.12 0.02
4 3.87 1.93–7.91 <0.001 5.08 2.43–10.9 <0.001
Exposure
Good ––
Fair 1.56 0.94–2.66 0.091
Poor 1.09 0.56–2.11 0.8
Mouth opening 1 0.98–1.01 0.8
NCT
No ––
Yes 0.82 0.53–1.28 0.4
NCT cycles 1.02 0.91–1.15 0.7
Previous RT
No ––
Yes 2 0.63–6.13 0.2
4COSTANTINO ET AL.
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
margin was labeled as “positive”class for the classifica-
tion algorithms. The results were plotted using ROC
curves.
A permutation feature importance score was com-
puted using the testing dataset to identify the most
important variables used in model prediction.
36
Permuta-
tion feature importance is defined to be the decrease in a
model score when a single feature value is randomly
shuffled, and it is a model inspection technique that can
be used for any fitted estimator. In particular, the permu-
tation feature importance scores are determined as the
difference in model performance determined by the AUC
before and after the alteration of a given dependent
variable.
2.4 |Statistical analysis
All the abovementioned data were collected and stored in a
Microsoft Excel
®
spreadsheet. Categorical variables were
summarized by counts and percentage, while continuous
variables were reported as means ± standard deviations
(SD), after confirming the normal distribution using the
Shapiro–Wilk normality test. Differences between patients
with negative and positive margins were compared by Wil-
coxon rank-sum test for continuous variables, while cate-
gorical variables were evaluated using the chi-squared test
or Fisher's exact test, as appropriate.
Correlation between the margin status and the
included variables was done using univariable binary
logistic regression (LR) models. Parameters with a p-
value <0.05 in the univariable analysis were included in
the multivariable LR analysis to define independent pre-
dictors of positive surgical margins. In case of variables
holding similar clinical information (e.g., site, subsite),
the covariate with the greater effect size based on zscore
was included in the multivariable analysis. Results were
summarized with odds ratios (OR) and 95% confidence
intervals (CIs).
A conventional predictive binary logistic regression
(LR) model was employed as a benchmark to assess the
actual benefit of using ML models for margins status
prediction and therefore it was trained and validated
using the same training and testing datasets. The accu-
racy, AUC, sensitivity, specificity, PPV, NPV, and F1
score were measured. Finally, a pairwise comparison of
the AUC between LR and other ML models was per-
formed via the method described by DeLong et al.
37
Statistical analyses were performed using the R soft-
ware for statistical computing (R version 4.0.1, Foundation
for Statistical Computing, Vienna, Austria). A value of
p<0.05 was considered to indicate statistical significance.
3|RESULTS
3.1 |Patients characteristics
After applying the abovementioned inclusion and
exclusion criteria, a total of 453 patients (males: 88%;
TABLE 2 (Continued)
Univariable analysis Multivariable analysis
Variable OR 95% CI p-value OR 95% CI p-value
Robot
Multi-port ––
Single-port 0.73 0.48–1.11 0.14
Abbreviations: OR, odds ratio; CI, confidence interval; NCT, neo-adjuvant chemotherapy; RT, radiotherapy.
FIGURE 1 ROC curves showing the accuracy in the prediction
of positive surgical margins in patients undergoing TORS. The
dashed diagonal line (black) represents the identity line
(no discrimination line). Abbreviations: C50, C5.0 decision trees;
FDA, flexible discriminant analysis; LR, logistic regression; kNN, k-
nearest neighbor; RF, random forest; SVM, support vector machine;
XGB, extreme gradient boosting. [Color figure can be viewed at
wileyonlinelibrary.com]
COSTANTINO ET AL.5
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
mean age: 61.8 ± 9.7 years) who underwent TORS were
included in the study. Patients characteristics are shown
in Table 1. Positive margins were identified in 138 (30.5%)
cases. Patients with positive margins were more com-
monly males (94% vs. 85%; p< 0.05). Positive margins
occurred more frequently in tumors of hypopharynx (32%
vs. 19%) and larynx (24% vs. 15%) than oropharynx (44%
vs. 66%) (p< 0.05). Advanced clinical T classification
(cT3-4) was more common in cases with positive margins
(50% vs. 29.2%; p< 0.05). The other features were not sig-
nificantly different between the two groups.
3.2 |Binary logistic regression model
The results of the binary logistic regression model are
showed in Table 2. The multivariable logistic regression
analysis showed that hypopharyngeal (OR: 2.48, 95% CI:
1.43–4.32) and laryngeal (OR: 2.60, 95% CI: 1.39–4.88)
tumor sites, and cT classification (T2, OR: 1.86, 95% CI:
1.03–3.42; T3, OR: 2.17, 95% CI: 1.17–4.12; T4, OR: 5.08,
95% CI: 2.43–10.9) were independent predictors of posi-
tive surgical margins. The predictive LR model showed
accuracy of 0.68 and AUC of 0.69 (Figure 1). In particu-
lar, the model showed high specificity (0.87) and low sen-
sitivity (0.24) in detecting those patients with positive
margins after TORS.
3.3 |Performance of ML models
The dataset was randomly divided into a training dataset
and a testing dataset consisting of 316 (70%) and
137 (30%) patients, respectively. The ML models
TABLE 3 Confusion matrices of
the ML models
Model Predicted class
Actual class (n=137)
Negative margin (n=95) Positive margin (n=42)
C50 Negative margin 85 31
Positive margin 10 11
FDA Negative margin 83 22
Positive margin 12 20
KNN Negative margin 84 30
Positive margin 11 12
XGB Negative margin 78 28
Positive margin 17 14
RF Negative margin 83 31
Positive margin 12 11
SVM Negative margin 71 15
Positive margin 24 27
LR Negative margin 83 32
Positive margin 12 10
TABLE 4 Classification
performance of the ML models
Accuracy AUC Sensitivity Specificity PPV NPV F1 score
C50 0.7 0.7 0.26 0.89 0.52 0.73 0.35
FDA 0.75 0.75 0.48 0.87 0.63 0.79 0.54
KNN 0.7 0.7 0.29 0.89 0.52 0.74 0.37
RF 0.69 0.72 0.26 0.87 0.48 0.73 0.34
SVM 0.72 0.75 0.64 0.75 0.53 0.83 0.58
XGB 0.67 0.68 0.33 0.82 0.45 0.74 0.38
LR 0.68 0.69 0.24 0.87 0.45 0.72 0.31
Abbreviations: AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value;
C50, C5.0 decision trees; FDA, flexible discriminant analysis; LR, logistic regression; kNN, k-nearest
neighbor; RF, random forest; SVM, support vector machine; XGB, extreme gradient boosting.
6COSTANTINO ET AL.
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
development was performed on the training dataset using
the 14 available variables already mentioned. Then, each
trained model was applied to the testing dataset to pre-
dict the margin status after TORS. Table 3shows the con-
fusion matrix containing the information of correct and
incorrect predictions of each model compared to the
actual class. Table 4summarizes the classification perfor-
mance of each prediction algorithm. The accuracy of the
six ML models altogether varied between 0.67 and 0.75,
while the measured AUC between 0.68 and 0.75. Figure 1
shows the ROC curves of the developed models. Overall,
the ML algorithms showed high specificity (range 0.75–
0.89) and low sensitivity (range 0.26–0.64) in detecting
those patients with positive margins after TORS. Accord-
ingly, NPV measured higher (range 0.73–0.83) than PPV
(range 0.45–0.63). All models showed no statistically sig-
nificantly better AUC compared to the LR model based
on the DeLong test.
Figure 2shows the permutation feature importance
scores of the six ML models. Variables contributing to the
model are displayed in descending order according to
their corresponding importance scores. The absolute
magnitude of a permutation feature importance score
reflects the impact of a single variable on the overall
performance.
4|DISCUSSION
To the best of our knowledge, this is the first study focus-
ing on the development of prediction models to define
the risk of positive margins in patients undergoing TORS.
Six different ML algorithms were trained and tested in a
single-center cohort of 453 patients. The classification
performance showed an AUC between 0.68 and 0.75, and
the highest accuracy was measured for the SVM and the
FDA models. Overall, the ML models demonstrated high
specificity (range 0.75–0.89) and low sensitivity (range
0.26–0.64) in detecting patients with positive margins
after TORS.
The classification performance of ML algorithms
should be analyzed according to the specific clinical con-
text. Models with either high sensitivity or high specific-
ity should be preferred based on the specific clinical
situation. The high specificity and negative predictive
value measured in our models allow the identification of
patients with low risk of positive margins, this having
important benefits in terms of customized adjuvant treat-
ment. In fact, the main purpose of TORS is the complete
resection of the tumor with minimal functional impair-
ment compared to open surgery.
38
However, toxicity of
the treatment is also dependent on the amount of
FIGURE 2 Permutation feature importance scores of the six ML models. Variables are displayed in descending order according to their
corresponding importance score. Abbreviations: C50, C5.0 decision trees; FDA, flexible discriminant analysis; kNN, k-nearest neighbor; RF,
random forest; SVM, support vector machine; XGB, extreme gradient boosting; NCT, neo-adjuvant chemotherapy; RT, radiotherapy. [Color
figure can be viewed at wileyonlinelibrary.com]
COSTANTINO ET AL.7
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
radiation delivered post-operatively.
15
For these reasons,
and especially for early-stage tumors, TORS should be
selected when there are high chances of completing the
treatment without the need for adjuvant therapies.
Although recent literature data showed that the prognos-
tic role of surgical margins is yet to be defined in
TORS,
16,17,39,40
current guidelines still require adjuvant
CRT with a regimen that is comparable to the upfront
treatment protocol (66 Gy vs. 70 Gy) in presence of a pos-
itive margin.
41
From this perspective, ML models could
improve the possibility to propose TORS only if negative
surgical margins are likely to be obtained, reducing the
risk of a multi-modality treatment with increased toxic-
ity. In this scenario, further studies are needed to better
define the risk of post-operative RT in patients undergo-
ing TORS. Other clinical and radiological variables may
be employed in the development of more complex predic-
tion models
42
to precisely estimate the best treatment
protocol for each patient, from the standpoint of a “preci-
sion medicine approach”in head and neck oncology.
43
We developed different prediction models based on
14 demographics and clinical variables available pre-
operatively. Multivariable LR analysis showed that tumor
site and T classification were the only independent pre-
dictors of positive surgical margins. Tumors arising from
the hypopharyngeal and laryngeal areas are more diffi-
cult to reach through a transoral route,
25
and anatomical
boundaries limit surgical excisions that are wide enough
yet with minor functional impairment. For the same rea-
sons, locally advanced tumors are more difficult to resect
with consequent higher risk of positive margins. TORS is
usually performed in small T1-T2 tumors according to
current literature data.
3
However, we demonstrated that
this conservative surgery can be proposed also in selected
T3-T4 tumors, particularly after NCT, with still optimal
oncological results.
8,9
In this scenario, our prediction
models might be mostly beneficial to select cases really
amenable to complete tumor excisions.
A permutation feature importance score was also cal-
culated using the testing dataset to identify the most
important variables used in the different ML models.
36
Overall, T classification and tumor site/subsite were con-
firmed to be important predictors of TORS margin status.
In addition, other variables relevantly impacted on the
classification performance, such as exposure, robot type,
NCT administration. On the other hand, previous RT,
smoking, alcohol status, and gender showed minor role
in determining the correct prediction. Permutation fea-
ture importance score is computed to describe the algo-
rithm used by each ML model. However, the overall
interpretability of the analysis remains uncertain as the
significance of the score has still unclear interpretation,
and the only reliable information regarding individual
input variables is their ranking. In addition, the overall
interpretability of the ML algorithms remains uncertain
as a not negligible heterogeneity is evident among differ-
ent models.
This study is subjected to limitations. The retrospec-
tive nature of the study entails the risk of various biases,
especially regarding patient selection. In addition, data
retrospectively collected can be incorrectly or poorly clas-
sified, affecting the quality of the model. Our inclusion
criteria regarding tumor and patients characteristics were
not strict, as to obtain a wider applicability of the predic-
tion models. In fact, the applicability of these models to a
variety of clinical scenarios can be potentially implemen-
ted by the inclusion of tumors arising from all laryngeal
and pharyngeal subsites, regardless of the T classification.
On the other hand, accuracy of the models can be
improved with further studies that apply more strict
selection criteria and employ other variables. In particu-
lar, a specific subgroup analysis based on tumor site and
subsite was not performed in our study due to the low
sample. As already mentioned, the tumor location is an
important predictor of the surgical margin status, and
some differences might be detected in the accuracy of ML
models among different tumor sites. In the current study,
data were obtained from a single tertiary referral center,
and the large amount of data allowed for testing the
internal validity to exclude model overfitting. However,
multicenter studies are recommended to assess the exter-
nal validity of the predictive models.
Another limitation to ML is the lack of transparency
in the analysis that causes difficult interpretation of the
process.
44,45
Predictions generated by the ML algorithm
are based on multiple layers of analysis, but the specific
process is not directly accessible. As already stated, the
impact of individual variables and the relationship
among them cannot be displayed in a comprehensible
format. Especially, ML does not generate measures of the
effect size of individual variables, as instead defined by
the OR of a multivariable logistic regression model. ML
algorithms can recognize complex patterns of non-linear
combination of the input variables to improve the classi-
fication performance. However, future studies should
also focus on a better categorization of the included vari-
ables for improving the algorithms interpretability.
5|CONCLUSIONS
Six ML prediction models were developed and validated
to predict surgical margins positivity in patients undergo-
ing TORS, employing 14 patients' clinical features. Classi-
fication performance of the ML algorithms showed high
specificity and NPV that allow to preoperatively identify
8COSTANTINO ET AL.
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
patients with lower risk of positive margins. External vali-
dation cohorts are mandatory to confirm our results and
to improve accuracy of ML models in the future. Also, fur-
ther prospective studies are needed to state the ability of
the developed models to personalize patients' treatment
based on individual risk estimates, in the context of a pre-
cision medicine approach in head and neck oncology.
CONFLICT OF INTEREST
The author declares that there is no conflict of interest
that could be perceived as prejudicing the impartiality of
the research reported.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available
from the corresponding author upon reasonable request.
ORCID
Andrea Costantino https://orcid.org/0000-0001-5551-
7785
Claudio Sampieri https://orcid.org/0000-0002-7699-
2291
Francesca Pirola https://orcid.org/0000-0001-8925-6341
Armando De Virgilio https://orcid.org/0000-0003-0738-
8223
Se-Heon Kim https://orcid.org/0000-0002-6407-5859
REFERENCES
1. Weinstein GS, O'Malley BW, Cohen MA, Quon H. Transoral
robotic surgery for advanced oropharyngeal carcinoma. Arch
Otolaryngol Head Neck Surg. 2010;136(11):1079-1085. doi:10.
1001/archoto.2010.191
2. Weinstein GS, O'Malley BW, Magnuson JS, et al. Transoral
robotic surgery: a multicenter study to assess feasibility, safety,
and surgical margins. Laryngoscope. 2012;122(8):1701-1707.
doi:10.1002/lary.23294
3. De Virgilio A, Costantino A, Mercante G, et al. Transoral
robotic surgery and intensity-modulated radiotherapy in the
treatment of the oropharyngeal carcinoma: a systematic review
and meta-analysis. Eur Arch Otorhinolaryngol. 2021;278(5):
1321-1335. doi:10.1007/s00405-020-06224-z
4. de Almeida JR, Li R, Magnuson JS, et al. Oncologic outcomes
after transoral robotic surgery: a multi-institutional study.
JAMA Otolaryngol Head Neck Surg. 2015;141(12):1043-1051.
doi:10.1001/jamaoto.2015.1508
5. Wang CC, Liu SA, Wu SH, et al. Transoral robotic surgery for
early T classification hypopharyngeal cancer. Head Neck. 2016;
38(6):857-862. doi:10.1002/hed.24160
6. Hans S, Chekkoury-Idrissi Y, Circiu MP, Distinguin L, Crevier-
Buchman L, Lechien JR. Surgical, oncological, and functional
outcomes of transoral robotic supraglottic laryngectomy.
Laryngoscope. 2021;131(5):1060-1065. doi:10.1002/lary.28926
7. Hanna J, Brauer PR, Morse E, Judson B, Mehra S. Is robotic
surgery an option for early T-stage laryngeal cancer? Early
Nationwide Results. Laryngoscope. 2020;130(5):1195-1201. doi:
10.1002/lary.28144
8. Park YM, Jung CM, Cha D, et al. A New clinical trial of neoad-
juvant chemotherapy combined with transoral robotic surgery
and customized adjuvant therapy for patients with T3 or T4
oropharyngeal cancer. Ann Surg Oncol. 2017;24(11):3424-3429.
doi:10.1245/s10434-017-6001-5
9. Park YM, Keum KC, Kim HR, et al. A clinical trial of combina-
tion neoadjuvant chemotherapy and transoral robotic surgery
in patients with T3 and T4 Laryngo-hypopharyngeal cancer.
Ann Surg Oncol. 2018;25(4):864-871. doi:10.1245/s10434-017-
6208-5
10. De Virgilio A, Costantino A, Mercante G, Di Maio P, Iocca O,
Spriano G. Trans-oral robotic surgery in the management of
parapharyngeal space tumors: a systematic review. Oral Oncol.
2020;103:104581. doi:10.1016/j.oraloncology.2020.104581
11. Ford SE, Brandwein-Gensler M, Carroll WR, Rosenthal EL,
Magnuson JS. Transoral robotic versus open surgical
approaches to oropharyngeal squamous cell carcinoma by
human papillomavirus status. Otolaryngol Head Neck Surg.
2014;151(4):606-611. doi:10.1177/0194599814542939
12. Bernier J, Domenge C, Ozsahin M, et al. Postoperative irradia-
tion with or without concomitant chemotherapy for locally
advanced head and neck cancer. N Engl J Med. 2004;350(19):
1945-1952. doi:10.1056/NEJMoa032641
13. Cooper JS, Pajak TF, Forastiere AA, et al. Postoperative concur-
rent radiotherapy and chemotherapy for high-risk squamous-cell
carcinoma of the head and neck. N Engl J Med. 2004;350(19):
1937-1944. doi:10.1056/NEJMoa032646
14. Dabas S, Gupta K, Sharma AK, Shukla H, Ranjan R,
Sharma DK. Oncological outcome following initiation of treat-
ment for stage III and IV HPV negative oropharyngeal cancers
with transoral robotic surgery (TORS). Eur J Surg Oncol. 2019;
45(11):2137-2142. doi:10.1016/j.ejso.2019.06.027
15. Achim V, Bolognone RK, Palmer AD, et al. Long-term func-
tional and quality-of-life outcomes after transoral robotic sur-
gery in patients with oropharyngeal cancer. JAMA Otolaryngol
Head Neck Surg. 2018;144(1):18-27. doi:10.1001/jamaoto.2017.
1790
16. Gorphe P, Simon C. A systematic review and meta-analysis of
margins in transoral surgery for oropharyngeal carcinoma. Oral
Oncol. 2019;98:69-77. doi:10.1016/j.oraloncology.2019.09.017
17. Sampieri C, Costantino A, Spriano G, Peretti G, De Virgilio A,
Kim SH. Role of surgical margins in transoral robotic surgery:
a question yet to be answered. Oral Oncol. 2022;133:106043.
doi:10.1016/j.oraloncology.2022.106043
18. Crowson MG, Ranisau J, Eskander A, et al. A contemporary
review of machine learning in otolaryngology-head and neck sur-
gery. Laryngoscope. 2020;130(1):45-51. doi:10.1002/lary.27850
19. Bur AM, Shew M, New J. Artificial intelligence for the otolar-
yngologist: a state of the art review. Otolaryngol Head Neck
Surg. 2019;160(4):603-611. doi:10.1177/0194599819827507
20. Obermeyer Z, Emanuel EJ. Predicting the future - big data,
machine learning, and clinical medicine. N Engl J Med. 2016;
375(13):1216-1219. doi:10.1056/NEJMp1606181
21. Chen JH, Asch SM. Machine learning and prediction in medi-
cine - beyond the peak of inflated expectations. N Engl J Med.
2017;376(26):2507-2509. doi:10.1056/NEJMp1702071
22. Davenport T, Kalakota R. The potential for artificial intelli-
gence in healthcare. Future Healthc J. 2019;6(2):94-98. doi:10.
7861/futurehosp.6-2-94
COSTANTINO ET AL.9
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
23. Yu KH, Beam AL, Kohane IS. Artificial intelligence in health-
care. Nat Biomed Eng. 2018;2(10):719-731. doi:10.1038/s41551-
018-0305-z
24. Moons KGM, Altman DG, Reitsma JB, et al. Transparent
reporting of a multivariable prediction model for individual
prognosis or diagnosis (TRIPOD): explanation and elaboration.
Ann Intern Med. 2015;162(1):W1-W73. doi:10.7326/M14-0698
25. De Virgilio A, Park YM, Kim WS, Baek SJ, Kim SH. How to
optimize laryngeal and hypopharyngeal exposure in transoral
robotic surgery. Auris Nasus Larynx. 2013;40(3):312-319. doi:10.
1016/j.anl.2012.07.017
26. Pedersen AB, Mikkelsen EM, Cronin-Fenton D, et al. Missing
data and multiple imputation in clinical epidemiological
research. Clin Epidemiol. 2017;9:157-166. doi:10.2147/CLEP.
S129785
27. Graham JW. Missing data analysis: making it work in the real
world. Annu Rev Psychol. 2009;60:549-576. doi:10.1146/
annurev.psych.58.110405.085530
28. Frank E, Wang Y, Inglis S, Holmes G, Witten IH. Using model
trees for classification. Machine Learning. 1998;32(1):63-76. doi:
10.1023/A:1007421302149
29. Hastie T, Tibshirani R, Buja A. Flexible discriminant analysis
by optimal scoring. J Am Stat Assoc. 1994;89(428):1255-1270.
doi:10.2307/2290989
30. Dudani SA. The distance-weighted k-nearest-neighbor rule.
IEEE Trans Syst Man Cybern. 1976;4:325-327.
31. Breiman L. Random forests. Machine Learning. 2001;45(1):5-
32. doi:10.1023/A:1010933404324
32. Noble WS. What is a support vector machine? Nat Biotechnol.
2006;24(12):1565-1567. doi:10.1038/nbt1206-1565
33. Friedman JH. Greedy function approximation: a gradient
boosting machine. Ann Stat. 2001;29(5):1189-1232.
34. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE:
synthetic minority over-sampling technique. J Artif Intell Res.
2002;16:321-357. doi:10.1613/jair.953
35. Probst P, Wright MN, Boulesteix AL. Hyperparameters and
tuning strategies for random forest. WIREs Data Min Knowl
Discov. 2019;9(3):e1301. doi:10.1002/widm.1301
36. Altmann A, Tolos¸i L, Sander O, Lengauer T. Permutation impor-
tance: a corrected feature importance measure. Bioinformatics.
2010;26(10):1340-1347. doi:10.1093/bioinformatics/btq134
37. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the
areas under two or more correlated receiver operating charac-
teristic curves: a nonparametric approach. Biometrics. 1988;
44(3):837-845.
38. Park DA, Lee MJ, Kim SH, Lee SH. Comparative safety and
effectiveness of transoral robotic surgery versus open surgery
for oropharyngeal cancer: a systematic review and meta-analy-
sis. Eur J Surg Oncol. 2020;46(4 Pt A):644-649. doi:10.1016/j.
ejso.2019.09.185
39. Holcomb AJ, Herberg M, Strohl M, et al. Impact of surgical
margins on local control in patients undergoing single-
modality transoral robotic surgery for HPV-related oropharyn-
geal squamous cell carcinoma. Head Neck. 2021;43(8):2434-
2444. doi:10.1002/hed.26708
40. Warner L, O'Hara JT, Lin DJ, et al. Transoral robotic surgery
and neck dissection alone for head and neck squamous cell car-
cinoma: influence of resection margins on oncological out-
comes. Oral Oncol. 2022;130:105909. doi:10.1016/j.oralonco
logy.2022.105909
41. National Comprehensive Cancer Network. (2022). Head and
Neck Cancers (version 2.2022). https://www.nccn.org/
professionals/physician_gls/pdf/head-and-neck.pdf.
42. Park YM, Lim JY, Koh YW, Kim SH, Choi EC. Machine learn-
ing and magnetic resonance imaging radiomics for predicting
human papilloma virus status and prognostic factors in oropha-
ryngeal squamous cell carcinoma. Head Neck. 2022;44(4):897-
903. doi:10.1002/hed.26979
43. De Virgilio A, Costantino A, Mercante G, et al. Present and
future of De-intensification strategies in the treatment of oro-
pharyngeal carcinoma. Curr Oncol Rep. 2020;22(9):91. doi:10.
1007/s11912-020-00948-1
44. Smith JB, Shew M, Karadaghy OA, et al. Predicting salvage lar-
yngectomy in patients treated with primary nonsurgical therapy
for laryngeal squamous cell carcinoma using machine learning.
Head Neck. 2020;42(9):2330-2339. doi:10.1002/hed.26246
45. Shew M, New J, Bur AM. Machine learning to predict delays in
adjuvant radiation following surgery for head and neck cancer.
Otolaryngol Head Neck Surg. 2019;160(6):1058-1064. doi:10.
1177/0194599818823200
How to cite this article: Costantino A,
Sampieri C, Pirola F, De Virgilio A, Kim S-H.
Development of machine learning models for the
prediction of positive surgical margins in transoral
robotic surgery (TORS). Head & Neck. 2022;1‐10.
doi:10.1002/hed.27283
10 COSTANTINO ET AL.
10970347, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/hed.27283 by Universita Di Firenze Sistema, Wiley Online Library on [27/12/2022]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License