ArticlePDF Available

Long-term frailty modeling using a non-proportional hazards model: Application with a melanoma dataset

Authors:

Abstract and Figures

The semiparametric Cox regression model is often fitted in the modeling of survival data. One of its main advantages is the ease of interpretation, as long as the hazards rates for two individuals do not vary over time. In practice the proportionality assumption of the hazards may not be true in some situations. In addition, in several survival data is common a proportion of units not susceptible to the event of interest, even if, accompanied by a sufficiently large time, which is so-called immune, “cured,” or not susceptible to the event of interest. In this context, several cure rate models are available to deal with in the long term. Here, we consider the generalized time-dependent logistic (GTDL) model with a power variance function (PVF) frailty term introduced in the hazard function to control for unobservable heterogeneity in patient populations. It allows for non-proportional hazards, as well as survival data with long-term survivors. Parameter estimation was performed using the maximum likelihood method, and Monte Carlo simulation was conducted to evaluate the performance of the models. Its practice relevance is illustrated in a real medical dataset from a population-based study of incident cases of melanoma diagnosed in the state of São Paulo, Brazil.
Content may be subject to copyright.
Article
Long-term frailty modeling using a
non-proportional hazards model:
Application with a melanoma dataset
Vinicius F Calsavara
1
, Eder A Milani
2
, Eduardo Bertolli
3
and
Vera Tomazella
4
Abstract
The semiparametric Cox regression model is often fitted in the modeling of survival data. One of its main advantages is
the ease of interpretation, as long as the hazards rates for two individuals do not vary over time. In practice the
proportionality assumption of the hazards may not be true in some situations. In addition, in several survival data is
common a proportion of units not susceptible to the event of interest, even if, accompanied by a sufficiently large time,
which is so-called immune, “cured,” or not susceptible to the event of interest. In this context, several cure rate models
are available to deal with in the long term. Here, we consider the generalized time-dependent logistic (GTDL) model
with a power variance function (PVF) frailty term introduced in the hazard function to control for unobservable
heterogeneity in patient populations. It allows for non-proportional hazards, as well as survival data with long-term
survivors. Parameter estimation was performed using the maximum likelihood method, and Monte Carlo simulation was
conducted to evaluate the performance of the models. Its practice relevance is illustrated in a real medical dataset from a
population-based study of incident cases of melanoma diagnosed in the state of S~
ao Paulo, Brazil.
Keywords
Cure fraction, frailty model, generalized time-dependent logistic model, non-proportional hazard, melanoma, power
variance function distribution, survival model
1 Introduction
In survival analysis, the standard approach to the analysis of censored survival data is to consider the semi-
parametric proportional hazards model proposed by Cox,
1
which assumes that hazard ratios are constant over
time. However, in some situations the covariate effects may change over time and the Cox regression model may
not be adequate. In clinical study, prognostic factors such as treatment disappear with time. For example, some
types of cancer may respond well to chemotherapy initially, but the cancer cells may develop some tolerance to the
treatment through genetic mechanisms, resulting in loss of the treatment effect over time. Such situations clearly
represent non-proportional hazards scenarios. Schemper
2
noted that the Cox model has undoubtedly been used in
many cases in which proportionality assumptions are violated, with consequences for the results.
1
Department of Epidemiology and Statistics, A.C.Camargo Cancer Center, S~
ao Paulo, Brazil
2
Institute of Mathematics and Statistics, Federal University of Goia
´s, Goia
ˆnia, Brazil
3
Skin Cancer Department, A.C.Camargo Cancer Center, S~
ao Paulo, Brazil
4
Department of Statistics, Federal University of S~
ao Carlos, S~
ao Carlos, Brazil
Corresponding author:
Vinicius F Calsavara, Department of Epidemiology and Statistics, A.C.Camargo Cancer Center, Prof. Anto
ˆnio Prudente Street, 211, Liberdade, S~
ao
Paulo 01509-010, SP, Brazil.
Email: vinicius.calsavara@accamargo.org.br
Statistical Methods in Medical Research
0(0) 1–19
!The Author(s) 2019
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/0962280219883905
journals.sagepub.com/home/smm
In practice, one usually fits a Cox proportional hazards model and assesses the proportionality assumption
based on the Schoenfeld residuals.
3–5
Another approach, suggested by Klein and Moeschberger
6
is to plot the log
of the cumulative hazard functions against time and check for parallelism. Several graphical methods for assess-
ment of the proportional hazards assumption have been proposed. Hess
7
considered eight graphical methods for
the detection of assumption violations using three real datasets with single binary covariates.
In the analysis of survival data, when departures from assumption are detected, several possible workarounds,
such as redefinition of covariates, model stratification by a covariate with a non-proportional hazard, use of time-
dependent covariate terms, use of separate models for disjunct time periods,
2
and fitting of a non-proportional
hazard model, can be applied.
Several techniques have been proposed to deal with non-proportional hazards; they include the non-parametric
accelerated failure time model proposed by Prentice
8
and Kalbfleisch and Prentice
9
; the hybrid hazard model
described by Etezadi-Amoli and Ciampi;
10
the extension of hybrid hazard models proposed by Louzada et al.,
11,12
and the generalized time-dependent logistic (GTDL) model proposed by Mackenzie.
13
Louzada-Neto et al.
14
presented a Bayesian approach to the GTDL model, and Milani et al.
15
extended the GTDL model by including
a gamma frailty term in the modeling. These models have been applied successfully to problems in which all units
are susceptible to the event of interest, that is, the existence of a cure fraction of the population is not possible.
However, some subjects will not failure (occurrence of the event of interest) because the units are “cured” in
several studies. The cure rate class of models considers such situations and has been studied by several authors in
recent years. The most popular cure rate model is the standard mixture model, proposed by Boag
16
and modified
by Berkson and Gage.
17
In this model, the population survival function is SðtÞ¼pþð1pÞS0ðtÞ, such that p2
ð0;1Þis referred to as the cured fraction, and S0ðtÞis a proper survival function for uncured patients. Common
choices for S0ðtÞare exponential, Weibull, log-logistic, log-normal distributions, among others.
Traditional cure rate models implicitly assume a homogeneous population for the susceptible units, but covar-
iate information can be included to explain the observable heterogeneity. However, a portion of unobserved
heterogeneity can be induced by several factors, such as environmental or genetic factors, or information that was
not considered in planning. Houggard
18
showed the advantages of considering two sources of heterogeneity
[observable (given by covariates) and unobservable] in a model. This random effect, called frailty, can be incor-
porated into the hazard function to control for unobservable heterogeneity of the units under study. In this
context, the frailty model is used widely.
19
This model is characterized by the inclusion of a random effect,
that is, an unobservable random variable that represents the information that cannot be or has not been observed.
The frailty term not only explains heterogeneity among individuals, but also enables assessment of covariate
effects that were not considered. The omission of an important covariate from a model will increase the amount of
unobservable heterogeneity affecting inferences about the model parameters. The inclusion of a frailty term can
help to alleviate this problem.
A frailty term can be included in an additive form in a model. However, a multiplicative effect of the frailty
term on the baseline hazard function is often used, as a generalization of the proportional hazards model intro-
duced by Cox.
1
This approach has been studied by several authors.
20–24
Other authors
25–31
have considered cure
rate models with frailty terms.
Another possibility for cure rate modeling is through defective models, which offer a strategy for the modeling
of survival data in the context of a cure fraction. Balka et al.
32,33
Rocha et al.
34–36
Scudilio et al.
37
and Calsavara
et al.
38,39
recently popularized the term “defective,” although the same idea had appeared in previous papers.
Instead of estimating the cure fraction pdirectly, as in a standard mixture model, the defective model provides an
alternative for the modeling of lifetime data with long-term survivors, once it has been made a cure rate model by
changing the usual domains of its parameters. When a probability distribution has this property, it is termed
“defective.” In a defective distribution, the integral of the density function is not 1, but a value in the range (0, 1),
which leads to a proportion of immunes in the population. The survival curve stabilizes at p0;1Þ, meaning that
the survival function is improper. However, the impropriety of a survival function does not necessarily imply a
defective distribution.
In this paper, we consider another way to model survival datasets under non-proportional hazards and with the
possibility of a cure fraction of the population. Our strategy is to consider the GTDL model by including a power
variance function frailty term in the modeling, which is an extension of the model proposed by Milani et al.
15
We
illustrate the applicability of this model using a real medical dataset from a population-based study of incident
cases of melanoma diagnosed in the state of S~
ao Paulo, Brazil. Although melanoma is one of the best known by
the population, skin carcinomas are more incident than melanoma, but the survival of patients with melanoma is
worse due to its potential for metastatic dissemination. According to the Brazilian National Institute of Cancer,
2Statistical Methods in Medical Research 0(0)
about 6000 new cases of melanoma were expected to be identified in 2018
40
; according to the International Agency
for Research on Cancer, this number is about 7000.
41
An estimated 2000 deaths per year in Brazil are attributable
to melanoma.
40,41
In the study from which our dataset is drawn, patients diagnosed with melanoma were enrolled
from 2000 to 2014, with follow-up conducted until 2018; death due to cancer was the event of interest.
Our paper is organized as follows. In section 2, we present the GTDL and GTDL frailty distributions, prob-
ability density, survival and hazard functions, and their cure rate version. Inference methods based on the like-
lihood function are presented in section 3. In section 4, we consider a simulation study under different scenarios.
We numerically evaluate the asymptotic properties of the estimators, as well as the performance of the GTDL
frailty model in relation to several discrimination criteria. In addition, we evaluate performance using the like-
lihood ratio statistic to test the inclusion of the frailty term in the GTDL model considering several configurations
of sample size and degree of unobservable heterogeneity in the population. In section 5, we apply these procedures
to the real melanoma cancer dataset. We offer final remarks in section 6.
2 Background
In this section, we present the GTDL regression model and its frailty version, as well as the hazard, survival,
probability density functions, and the cases in which these models become a cure rate models. These models are
useful for the modeling data with non-proportional hazards and with the possibility of a long-term survivors in
the population.
2.1 GTDL regression model
Let T>0 be a random variable representing the failure time and h0ðtÞthe instantaneous failure rate or baseline
hazard function. According to Mackenzie,
13
the GTDL model with a hazard function is given by
h0ðt;x1Þ¼kexp atþx>
1b

1þexp atþx>
1b
 (1)
where k>0 is a scalar, ais a measure of the time effect, x>
1¼ðx1;...;xpÞ, and b>¼ðb1;...;bpÞare the sets of
covariates and their regression coefficients, respectively.
The corresponding probability density function f0ðt;x1Þand survival function S0ðt;x1Þare, respectively, as
follows
f0ðt;x1Þ¼ kexpðatþx>
1bÞ
1þexpðatþx>
1bÞ
"#
1þexpðatþx>
1bÞ
1þexpðx>
1bÞ
"#
k=a
and
S0ðt;x1Þ¼ 1þexpðatþx>
1bÞ
1þexpðx>
1bÞ
"#
k=a
(2)
The ratio of the hazard function for two individuals, iand j, with i jwhere i;j¼1;...;n, with different
covariate vectors is given by
sðt;x1i;x1jÞ¼h0ðt;x1iÞ
h0ðt;x1jÞ¼kexp atþx>
1ib

1þexp atþx>
1ib

1þexp atþx>
1jb

kexp atþx>
1jb

¼
1þexp atþx>
1jb

1þexp atþx>
1ib

exp x1ix1j
ðÞ
>b
hi
(3)
Calsavara et al. 3
Note that the time effect does not disappear in equation (3), and hence the non-proportionality becomes
evident. As mentioned by Mackenzie,
13
equation (1) is neither a proportional hazards model nor an accelerated
life model, but it will approach the proportional hazards model when ½1þexpðatþx>
1bÞ1 and, when this
condition holds, the estimates of the regression parameters bshould be similar in both models.
Model (1) is indicated for modeling lifetime data and for modeling phenomenon with monotone failure rates.
The shape of the hazard function takes several forms according to the value of parameter a.Whena>0, the
hazard function is increasing; when a<0 it is decreasing, and when a¼0 the hazard function is constant
over time.
The survival function is proper for a0, but when time effect ais negative, the GTDL model naturally
acquires an improper distribution, which is useful for the modeling of survival data in the presence of a surviving
fraction. The long-term survivors in the population is calculated as the limit of the survival function (2) when
a<0, given by
pðx1Þ¼lim
t!1 S0ðt;x1Þ¼ lim
t!1
1þexpðatþx>
1bÞ
1þexpðx>
1bÞ
"#
k=a
¼1þexpðx>
1bÞ

k=a0;1Þ(4)
Figure 1 plots the baseline hazard and survival functions for different parameter values for the GTDL model
considering a group variable as the covariate.
The GTDL model has the advantage of allowing a cure rate without requiring extra parameters, as in tradi-
tional cure rate models. An additional advantage over cure rate models is that it does not make assumptions about
the existence of the cure rate. In the literature, models with this property have recently been termed “defective,”
when they accommodate a proportion of the cure fraction with dependence on a single parameter value.
32–37
2.2 GTDL frailty model
The concept of frailty provides a convenient way of introducing unobserved heterogeneity and associations into
models for survival data. The role of frailty in the univariate time scenario is to measure a possible heterogeneity
in order to identify the influence of covariates that were not incorporated into the modeling or cannot be mea-
sured. From the GTDL model given in equation (1), the hazard function of the ith individual with the frailty term
v
i
multiplicative is given by
15
hiðt;x1i;viÞ¼vih0ðt;x1i;viÞ¼vi
kexp atþx>
1ib

1þexp atþx>
1ib
 (5)
0 5 10 15 20 25 30
0.00 0.05 0.10 0.15 0.20
Time
Hazard function
Group10
Group11
Group20
Group21
Group30
Group31
010203040
0.0 0.2 0.4 0.6 0.8 1.0
Time
Survival function
Gruop10
Gruop11
Gruop20
Gruop21
Gruop30
Gruop31
Figure 1. Baseline hazard (left panel) and survival (right panel) functions from the GTDL model. The parameter values used are:
Group 1, a¼0:2;k¼0:2, and b¼1; Group 2, a¼0:5;k¼0:2, and b¼1; and Group 3, a¼0:001;k¼0:2, and b¼1. The
subscript numerals indicate the values of the fixed covariates. (For interpretation of the references to color in this figure legend, the
reader is referred to the online version of this article.)
4Statistical Methods in Medical Research 0(0)
interpreted as the conditional hazard function of the ith individual given the frailty term v
i
, which is characterized
as the frailty of ith subject. The conditional hazard function is greater than the baseline hazard function for vi>1
and smaller than baseline for vi<1; and in the special case of degenerated frailty v
i
¼1, the frailty model reduces
to the GTDL model (1). Its conditional survival function is easily obtained, given by
Siðt;x1i;viÞ¼S0ðt;x1iÞvi¼1þexp atþx>
1ib

1þexp x>
1ib

"#
kvi
a
(6)
According to the way in which the frailty term acts on the hazard function, natural frailty distribution
candidates are supposed to be non-negative, continuous, and time independent (i.e. gamma, log-normal,
inverse Gaussian, positive stable, and power variance function distributions, among others). In the literature,
the gamma distribution with mean 1 and variance hhas been widely used, as it permits easy algebraic
treatment.
In this paper, we consider the family of power variance function (PVF) distributions, as it presents as a
particular case the gamma, inverse Gaussian and positive stable distributions. The PVF distribution was suggested
by Tweedie
42
and derived independently by Hougaard.
43
Let Vbe a random variable following a PVF distribution
with parameters l,wand cwith density function written as
44
fðv;l;w;cÞ¼ewð1cÞv
l1
c
ðÞ
1
pX
1
k¼1
ð1Þkþ1½wð1cÞkð1cÞlkcCðkcþ1Þ
ckk!vkc1sinðkcpÞ
where l>0;w>0 and 0 <c1.
Following the historical definition of frailty originally introduced in the field of demography,
19
we use the
restriction E½V¼l¼1 such that V½V¼l2
w¼1
w:¼h, where his interpretable as a measure of unobserved
heterogeneity.
Note that if we build the likelihood function using the hazard and survival functions given in equations (5) and
(6), respectively, we would have more parameters than observations, considering univariate times. Thus, the
random effect can be integrated out to get a likelihood function not depending on unobserved quantities, con-
sequently the marginal survival function is given by
SðtÞ¼EV½Sðt;x1;vÞ ¼ Z1
0
Sðt;x1;vÞfvðvÞdv¼L
vlog S0ðt;x1Þ½
where fvðÞ is the probability density of the corresponding frailty distribution, S0ðÞ is the baseline survival func-
tion, and Lv½ denotes the Laplace transform of frailty distribution.
Considering as the baseline survival function from the GTDL model given in equation (2), the unconditional
survival, probability density, and hazard functions in the PVF frailty model are, respectively
Sðt;x1Þ¼exp 1c
ch 11þkh
að1cÞlog 1þexpðatþx>
1bÞ
1þexpðx>
1bÞ
"#()
c
0
@1
A
2
43
5(7)
fðt;x1Þ¼ kexpðatþx>
1bÞ
1þexpðatþx>
1bÞ1þkh
að1cÞlog 1þexpðatþx>
1bÞ
1þexpðx>
1bÞ
"#()
c1
exp 1c
ch 11þkh
að1cÞlog 1þexpðatþx>
1bÞ
1þexpðx>
1bÞ
"#()
c
0
@1
A
2
43
5
Calsavara et al. 5
and
hðt;x1Þ¼ kexpðatþx>
1bÞ
1þexpðatþx>
1bÞ1þkh
að1cÞlog 1þexpðatþx>
1bÞ
1þexpðx>
1bÞ
"#()
c1
¼h0ðt;x1Þ1þkh
að1cÞlog 1þexpðatþx>
1bÞ
1þexpðx>
1bÞ

c1(8)
where h0ð;x1Þis the hazard function from the GTDL model given in equation (1).
Henceforth, we will refer to the model in which the survival function is as shown in equation (7), as the GTDL
PVF frailty model. Note that the usual GTDL model (2) is obtained as h!0. In addition, the GTDL PVF frailty
model is a flexible model in the sense that it includes many other frailty models as special cases. For instance, the
GTDL gamma frailty model is obtained if c!0. In the case of c¼0:5, the GTDL inverse Gaussian frailty model
is derived. The GTDL positive stable frailty is a special case of the GTDL PVF frailty model in which some
asymptotic considerations are necessary to show this fact. We refer the interested readers to Wienke.
44
It is evident that the hazard function in equation (8) depends on the time; consequently, the GTDL PVF frailty
model is also of non-proportional hazard. As does the GTDL model, the GTDL PVF frailty model allows
negative values for the time effect (a<0). Thus, the corresponding long-term survivors is
pðx1Þ¼lim
t!1 Sðt;x1Þ
¼exp 1c
ch 11kh log 1 þexp x>
1b

að1cÞ
()
c
0
@1
A
2
43
50;1Þ(9)
If parameter ais estimated to be negative, then the cure fractions for the GTDL and GTDL PVF frailty models
can be obtained from equations (4) and (9), respectively. If parameter ais estimated to be positive, then there is no
cure rate according to the two models, and functions (2) and (7) are proper survival functions.
As previously mentioned, the GTDL model (1) does not provide a reasonable parametric fit for modeling
phenomenon with non-monotone failure rates such as the bathtub-shaped and the unimodal failure rates which
are common in reliability and biological studies. In this sense, an advantage of the proposed model (8) over the
traditional GTDL model is the ability to accommodate various forms of the hazard function that can be used in a
variety of problems for modeling lifetime data. Figure 2 plots the baseline hazard and survival functions for
different parameter values in the GTDL PVF frailty model considering a group variable as a covariate.
0 1020304050
0.0 0.2 0.4 0.6 0.8
Time
Hazard function
Group11
Group10
Group21
Group20
Group31
Group30
Group41
Group40
0 1020304050
0.0 0.2 0.4 0.6 0.8 1.0
Time
Survival function
Group11
Group10
Group21
Group20
Group31
Group30
Group41
Group40
Figure 2. Hazard (left panel) and survival (right panel) functions from the GTDL PVF frailty model. The parameter values fixed are:
Group 1, a¼0:1;k¼0:6;b¼1;c¼0:01 and h¼1; Group 2, a¼0:2;k¼0:7;b¼1;c¼0:5 and h¼0:5; and Group 3, a¼
0:15;k¼0:3;b¼5;c¼0:9 and h¼0:3 and Group 4, a¼0:05, k¼1, b¼0:5;c¼0:9 and h¼1. The subscripted numerals
indicate the values of the fixed covariates. (For interpretation of the references to color in this figure legend, the reader is referred to
the online version of this article.)
6Statistical Methods in Medical Research 0(0)
In this paper, we also incorporate explanatory variables in the GTDL and GTDL PVF frailty models through
parameter a, which is a more reasonable approach because it can directly reflect the effect of a treatment. For
instance, for some treatment A, if the treatment effect is good, then some patients will be cured and the estimate
for awill be a<0; if the treatment is not sufficient, the estimate will be a>0. Given this capacity, the GTDL and
GTDL PVF frailty models are more flexible than are standard approaches.
13,15
In this sense, explanatory variables are incorporated in the model through the hazard function (1) and the scale
parameter awith a set of two-covariate vectors, x12Rpand x22Rqþ1, such that x>¼ðx>
1;x>
2Þ2Rwis a w-
dimensional covariate vector, where w¼pþqþ1. Importantly, parameter acan be estimated to be negative
(which leads to cure) or positive (indicating the absence of a cure rate). Thus, to guarantee a2R, we use an
identity link function, such as
aðx2iÞ¼x2>
ia
where x2>
i¼ð1;x2i1;x2i2;...;x2iqÞand a>¼ða0;a1;...;aqÞare the sets of covariates and their regression coeffi-
cients, respectively. In practice, the covariate vectors may be the same, i.e. x¼x1¼x2, but if the researcher has
prior knowledge about the variables that can be associated to cure rate, we suggest link this subset variables to the
aparameter.
An advantage of the GTDL and GTDL PVF frailty models over alternative models is the lack of assumption
about the existence of the cure rate; the time effect values lead to proper or improper distribution. Thus, these
models are flexible and can be applied in situations with and without a cure fraction.
3 Inference
In this section, we describe the inferential procedure, which is based on the maximum likelihood approach and the
asymptotic large sample theory. Let T>0 be a random variable representing the time until the occurrence of the
event of interest. Furthermore, let d
i
be the censoring indicator variable, that is, di¼0 if the observed time is
censored and di¼1 otherwise, i¼1;...;n. The observed dataset is D¼t;d;XÞð , where t¼t1;...;tn
ðÞ
>are the
observed lifetimes, d¼d1;...;dn
ðÞ
>are the censoring indicators, and Xis a matrix containing the covariate
information. Consider that T
i
s are independent and identically distributed random variables with survival and
hazard functions specified, respectively, by S;#; x1;x2
ðÞ
and h;#; x1;x2
ðÞ
, where #denotes a vector of unknown
parameters. We assume that Tis independent of the censoring time. Thus, the likelihood function of #under non-
informative censoring is expressed as
6
L#;D
ðÞ
/Y
n
i¼1
hðti;#; x1i;x2iÞdiSðti;#; x1i;x2iÞ
The corresponding log-likelihood function, ð#Þ¼log L#;D
ðÞ
, is given by
ð#Þ/X
n
i¼1
dilog hðti;#; x1i;x2iÞþX
n
i¼1
log Sðti;#; x1i;x2iÞ
Thus, for the GTDL regression model the log-likelihood function for #¼a;b;kðÞ
>is
ð#Þ¼log kX
n
i¼1
diþX
n
i¼1
x>
2iaditiX
n
i¼1
dilog 1 þexp x>
2iatiþx>
1ib

kX
n
i¼1
1
x>
2ialog 1þexp x>
2iatiþx>
1ib

1þexpðx>
1ibÞ
"#()
þX
n
i¼1
dix>
1ib
(10)
Calsavara et al. 7
For the GTDL PVF frailty regression model the log-likelihood function for #¼a;b;k;h;c
ðÞ
>is
ð#Þ¼log kX
n
i¼1
diþX
n
i¼1
x>
2iaditiþX
n
i¼1
dix>
1ibX
n
i¼1
dilog 1 þexpðx>
2iaþx>
1ibÞ

þc1
ðÞ
X
n
i¼1
dilog 1 þhk
x>
2iað1cÞlog 1þexpðx>
2iatiþx>
1ibÞ
1þexpðx>
1ibÞ
"#() !
þX
n
i¼1
1c
ðÞ
ch 11þhk
x>
2iað1cÞlog 1þexpðx>
2iatiþx>
1ibÞ
1þexpðx>
1ibÞ

c
!
(11)
Maximum likelihood estimates (MLEs) for parameters from the GTDL and GTDL PVF frailty models are
obtained by numerically maximizing log-likelihood functions (10) and (11), respectively. Many routines are
available for numerical maximization. We used the optim routine in the R software
45
for numerical maximization.
The asymptotic properties of maximum likelihood estimators are needed to build confidence intervals and to
test hypotheses about the model parameters. Under certain regularity conditions, ^
#has asymptotic multivariate
normal distribution with mean #and variance and covariance matrix Rð^
#Þ, which is estimated by
^
Rð^
#Þ¼ @2ð#;DÞ
@#@#>#¼^
#
()
1
Thus, an approximate 100ð1aÞ% confidence interval for #iis ð^
#iza=2ffiffiffiffiffi
^
Rii
q;^
#iþza=2ffiffiffiffiffi
^
Rii
qÞ, where ^
Rii
denotes the ith diagonal element of the inverse of ^
Rand zadenotes the 100ð1aÞpercentile of the standard
normal random variable.
The asymptotic normality assumption of MLEs holds only under certain regularity conditions, which are not
easy to assess with our models. In the next section, we describe a simulation study performed to determine
whether the usual asymptotes of the MLEs hold. Many authors have performed simulations to assess the asymp-
totic behavior of MLEs, especially when the analytical investigation is not trivial.
35,38,39
We also conducted a
simulation study in order to assess the impact of unobservable heterogeneity on the cured fraction, as well as we
evaluate the performance of the models in estimating correctly the parameter a(negative) when in fact there is a
long-term survivors group. In addition, we investigate the performance of the GTDL frailty model in terms of the
same discrimination criteria when compared to the standard GTDL model, that is, without the frailty term.
4 Simulation study
In this section, we evaluate the performance of MLEs of the GTDL PVF frailty and GTDL model parameters
considering different sample sizes. To assess the covariate effects on the hazard function and time effect, we divide
the sample into two groups (X); control (group 0) and treatment (group 1). Subjects in the control and treatment
groups are assigned covariate values of 0 and 1, respectively. We introduce two regression parameters for a, that
is, aðxÞ¼a0þa1x, where a
0
is the intercept and a
1
is the associated group variable. The cure fractions from the
GTDL PVF frailty and GTDL models are functions of parameters as shown in equations (4) and (9), respectively.
To introduce random censoring, the distribution of censoring times is assumed to be exponential with rate s,
which is set to control the proportion of right-censored observations. Datasets ðti;di;xiÞfrom the GTDL PVF
frailty and GTDL models are generated using the steps shown in the Supplementary material.
4.1 Model fitting
We performed out an extensive Monte Carlo simulation considering sample sizes of n¼50, 100, 150, 200, 300,
500, 1000, 2000, 5000, and 10,000. For each scenario (combination of parameter values and sample size), we
computed average MLEs of the parameters, their standard deviations (SDs), bias, and root mean square errors
(RMSEs) of the MLEs of the parameters, and the empirical coverage probabilities (CPs) of 90% and 95%
confidence intervals. The standard error for the cure rate parameter was estimated using the delta method with
first-order Taylor’s approximation. All simulations were performed with the R software
45
and 1000 Monte Carlo
runs. Estimates were obtained using the BFGS algorithm of maximization, which is an option for the optim
8Statistical Methods in Medical Research 0(0)
function in R. In our simulation studies, we fixed the parameter c!0 (GTDL gamma frailty model) for all fitted
models in order to corroborate with the results obtained in the application section.
Results for the GTDL and GTDL gamma frailty models are summarized in Table 1. The estimation method
worked very well, as the sample size increased the bias gets to 0 for all parameters. The RMSEs and SDs decreased
to 0 as the sample size increased, and besides they are closer (RMSEs and SDs) when the sample size was n150.
Empirical CPs for all parameters appeared to be reasonably close to the nominal level with increasing sample size,
regardless of model. Under the GTDL gamma frailty model, the empirical CPs for kand hwere below than the
nominal level for n100. Considering the scenario of n¼2000, the empirical distributions of parameter estimates
for both models are shown in the Supplementary material. The plots indicate that the normal distribution
provides reasonable approximations for estimator distributions (see Figure 1 in the supplementary material).
In the supplementary material, we also illustrated the behavior of MLEs of the parameters from GTDL
gamma frailty model when a continuous covariate is considered in the data generation process, as well as
when two covariates (one binary and a continuous covariates) are considered in the GTDL gamma frailty
model. The results are similar with previously obtained suggesting that the models can separate the effects on
the two components (aand bparameters).
4.2 Impact of frailty on the cured fraction
To evaluate the impact of the frailty term in the estimation of the cure rates, we conducted a simulation consid-
ering several degrees of unobservable heterogeneity in the population. For each dataset simulated, the GTDL
gamma frailty and GTDL models were fitted and then the cure rates were compared with the fixed true values. We
generated datasets from the GTDL gamma frailty model considering different sample sizes and parameter values.
We fixed a0¼0:36;a1¼0:24;k¼1:4, b1¼2:7, and h¼f0;0:05;0:1;0:3;0:7;1:1;1:5g. The corresponding
cure rates are for groups 0 and 1 are p0¼f0:068;0:080;0:092;0:139;0:220;0:286;0:340gand p1¼f0:859;0:860;
0:860;0:862;0:866;0:869;0:872g, respectively. Based on 1000 datasets, we calculated the average RMSEs of MLEs
to cure rates.
Figure 3 shows the RMSEs of the MLEs of the cure fractions obtained from the GTDL gamma frailty and
GTDL models, considering several degrees of heterogeneity in the sample.
As expected, RMSE decreased to 0 as the sample size increased, regardless of group. However, RMSEs
increased with the degree of unobservable heterogeneity, mainly in group 0. For both models, the estimated
cure fractions are close to the true values, as indicated by the RMSEs of the cure rates.
4.3 Impact of cure rate and sample size in the estimating of aparameter
As previously mentioned, if parameter ais estimated to be negative, then the cure fraction is computed as function
of model parameters. In this sense, it is important to evaluate the sensitivity of the model in to identify correctly an
aless than zero when in fact there is an immune group. To evaluate the impact of small cure probabilities, we
conducted a simulation considering several values of sample size and cure rates p¼f0:01;0:03;0:045;0:11;0:20g.
For each dataset simulated, we fitted the models and then based on 1000 datasets we calculated the percentage of
cases in which estimates of awere less than zero. The results are shown in Table 2. As expected, the percentages
(model correctly identifies the cure fraction, i.e., a<0) increased with the sample size and cured rate fixed. As cure
probabilities decreased, the GTDL gamma frailty model had difficulty in identifying an immune group. However,
the percentages increased with sample size, regardless of the model. In the scenario of cure rate fixed at 0.01, the
GTDL model correctly identified the avalue in 100% when n¼1000, but in the GTDL gamma frailty model was
87.8%. For sample size n¼2000 the rate was higher than 95% (it was omitted in the table). When the cure rate is
greater than or equal to 0.11 and small sample size, both models correctly identified the cure fraction in more than
92% of the cases.
4.4 Model discrimination
In the fourth simulation, we investigated the performance of various discrimination criteria in the selection of the
correct model from the GTDL gamma frailty and GTDL models. We considered the Akaike information criterion
(AIC), corrected Akaike information criterion (AICc), Bayesian information criterion (BIC), Hannan–Quinn
information criterion (HQIC), and consistent Akaike information criterion (CAIC). These functions were com-
puted as follows: AIC ¼2þ2k;AICc ¼AIC þ2kðkþ1Þ=ðnk1Þ, BIC ¼2þklog n,HQIC¼2þ
2klogðlog nÞ, and CAIC ¼2þkðlog nþ1Þ, where is the maximized log-likelihood function value, kis the
Calsavara et al. 9
Table 1. Bias, square roots of the mean squared errors (RMSEs), and standard deviations (SDs) of the maximum likelihood estimates,
and empirical coverage probabilities (CPs) of 90% and 95% confidence intervals for the simulated data.
GTDL model GTDL frailty model
a
0
a
1
kb
1
a
0
a
1
kb
1
h
n0.6 0.4 1 2.4 0.36 0.24 1.4 2.7 1.1
50 Bias 0.025 0.071 0.047 0.063 0.071 0.390 0.134 0.015 0.139
RMSE 0.206 1.713 0.340 0.727 0.235 1.277 0.818 1.171 0.985
SD 0.204 1.712 0.337 0.725 0.224 1.216 0.807 1.172 0.975
CP(90%) 0.902 0.919 0.894 0.914 0.915 0.953 0.868 0.939 0.805
CP(95%) 0.941 0.962 0.925 0.965 0.956 0.980 0.912 0.980 0.837
100 Bias 0.023 0.005 0.037 0.004 0.060 0.114 0.061 0.031 0.139
RMSE 0.130 0.147 0.234 0.455 0.163 0.551 0.522 0.743 0.765
SD 0.128 0.147 0.231 0.456 0.152 0.540 0.519 0.743 0.752
CP(90%) 0.906 0.908 0.900 0.902 0.888 0.942 0.871 0.915 0.816
CP(95%) 0.955 0.953 0.947 0.948 0.940 0.972 0.902 0.968 0.856
150 Bias 0.005 0.003 0.020 0.007 0.032 0.054 0.035 0.007 0.090
RMSE 0.100 0.113 0.184 0.359 0.121 0.326 0.399 0.588 0.634
SD 0.100 0.113 0.183 0.359 0.117 0.322 0.398 0.588 0.628
CP(90%) 0.904 0.908 0.904 0.898 0.910 0.929 0.893 0.901 0.885
CP(95%) 0.951 0.954 0.946 0.958 0.947 0.964 0.931 0.954 0.919
200 Bias 0.011 0.003 0.022 0.007 0.029 0.031 0.024 0.016 0.095
RMSE 0.090 0.101 0.162 0.315 0.107 0.256 0.363 0.501 0.560
SD 0.089 0.101 0.161 0.315 0.103 0.255 0.362 0.501 0.553
CP(90%) 0.888 0.894 0.908 0.898 0.911 0.918 0.885 0.905 0.888
CP(95%) 0.945 0.945 0.942 0.952 0.949 0.968 0.927 0.950 0.924
300 Bias 0.006 0.001 0.011 0.001 0.020 0.020 0.010 0.020 0.063
RMSE 0.070 0.080 0.125 0.254 0.088 0.193 0.284 0.409 0.471
SD 0.069 0.080 0.124 0.254 0.086 0.192 0.284 0.408 0.467
CP(90%) 0.917 0.892 0.915 0.891 0.905 0.911 0.907 0.909 0.910
CP(95%) 0.956 0.942 0.959 0.948 0.945 0.959 0.950 0.955 0.945
500 Bias 0.003 0.001 0.011 0.000 0.011 0.008 0.020 0.005 0.034
RMSE 0.054 0.060 0.100 0.190 0.069 0.134 0.234 0.311 0.392
SD 0.053 0.060 0.100 0.190 0.068 0.134 0.233 0.311 0.391
CP(90%) 0.896 0.891 0.893 0.918 0.894 0.910 0.898 0.903 0.904
CP(95%) 0.949 0.947 0.949 0.959 0.949 0.965 0.942 0.953 0.950
1000 Bias 0.002 0.002 0.007 0.004 0.007 0.004 0.007 0.001 0.026
RMSE 0.037 0.042 0.069 0.131 0.049 0.095 0.159 0.218 0.275
SD 0.037 0.042 0.068 0.131 0.048 0.095 0.159 0.218 0.274
CP(90%) 0.908 0.900 0.905 0.917 0.906 0.884 0.902 0.904 0.915
CP(95%) 0.953 0.954 0.953 0.962 0.951 0.941 0.960 0.953 0.958
2000 Bias 0.001 0.001 0.004 0.006 0.005 0.000 0.000 0.001 0.020
RMSE 0.025 0.028 0.049 0.096 0.035 0.066 0.109 0.151 0.194
SD 0.025 0.028 0.048 0.096 0.035 0.066 0.109 0.151 0.193
CP(90%) 0.910 0.932 0.914 0.889 0.899 0.892 0.916 0.915 0.899
CP(95%) 0.960 0.968 0.959 0.948 0.942 0.953 0.958 0.960 0.960
5000 Bias 0.001 0.001 0.001 0.001 0.002 0.000 0.001 0.003 0.006
RMSE 0.017 0.018 0.032 0.061 0.021 0.039 0.073 0.099 0.122
SD 0.017 0.018 0.032 0.061 0.02 0.039 0.074 0.099 0.122
CP(90%) 0.906 0.904 0.903 0.908 0.904 0.917 0.896 0.902 0.901
CP(95%) 0.960 0.949 0.951 0.953 0.958 0.962 0.944 0.952 0.95
(continued)
10 Statistical Methods in Medical Research 0(0)
Table 1. Continued
GTDL model GTDL frailty model
a
0
a
1
kb
1
a
0
a
1
kb
1
h
n0.6 0.4 1 2.4 0.36 0.24 1.4 2.7 1.1
10,000 Bias 0.000 0.000 0.000 0.002 0.000 0.001 0.001 0.001 0.000
RMSE 0.012 0.014 0.023 0.043 0.015 0.029 0.052 0.069 0.087
SD 0.012 0.014 0.023 0.043 0.015 0.029 0.052 0.069 0.087
CP(90%) 0.886 0.894 0.889 0.900 0.900 0.890 0.895 0.903 0.901
CP(95%) 0.942 0.944 0.936 0.952 0.952 0.950 0.954 0.950 0.958
Sample size
RMSE of cured fraction
50 100 300 500 1000 2000 5000 10000
0.00 0.02 0.04 0.06 0.08 0.10
Group 0
GTDL frailty model
Sample size
RMSE of cure fraction
50 100 300 500 1000 2000 5000 10000
0.00 0.02 0.04 0.06 0.08
Group 1
GTDL frailty model
Sample size
RMSE of cure fraction
50 100 300 500 1000 2000 5000 10000
0.00 0.02 0.04 0.06 0.08 0.10
Group 0
GTDL model
Sample size
RMSE of cure fraction
50 100 300 500 1000 2000 5000 10000
0.00 0.02 0.04 0.06 0.08
Group 1
GTDL model
Figure 3. RMSEs of MLEs of the cure fraction obtained from the GTDL gamma frailty and GTDL models. (For interpretation of the
references to color in this figure legend, the reader is referred to the online version of this article.)
Table 2. Percentage of cases in which estimates of aare less than zero when in fact there is a long-term survivors group.
Sample size
Model Cure rate fixed 50 100 150 200 300 400 500 1000
GTDL 0.20 99.7 100 100 100 100 100 100 100
0.11 92.9 98.8 100 100 100 100 100 100
0.045 83.3 93.0 96.9 99.6 99.8 100 100 100
0.03 79.5 89.2 96.3 98.8 99.5 99 100 100
0.01 71.2 83.9 91.4 96.2 98.6 99.1 99.5 100
GTDL frailty 0.20 99.2 99.9 100 100 100 100 100 100
0.11 93.2 99.3 100 100 100 100 100 100
0.045 62.9 83.9 93.2 96.7 99.4 99.7 100 100
0.03 52.6 73.4 84.2 90.7 96.2 98.0 99.4 100
0.01 30.8 40.3 49.5 56.4 62.2 70.1 73.8 87.8
Calsavara et al. 11
number of parameters in the fitted model, and nis the sample size. Given a set of candidate models, the preferred
model will provide the minimum values.
To evaluate the performance of these information criteria, we generated datasets from the GTDL gamma
frailty and GTDL models (h¼0) using a binary explanatory variable, and fixing the same parameter values and
n¼50;100;300;500 and 1000.
For each simulated dataset, we calculated discrimination criteria from fits of the GTDL gamma frailty model
and the corresponding GTDL model without a frailty term. Based on 1000 datasets for each situation, we
calculated the mean differences in information criterion values between the GTDL and GTDL gamma frailty
models, as well as the observed selection proportions (with the GTDL gamma frailty model preferred) for each of
the five criteria. Figure 4 shows the mean differences and selection proportions for the discrimination criteria. A
positive mean difference means that, on average, the information criterion value from the fitted GTDL gamma
frailty model is smaller than that from the GTDL model, indicating an advantage of the GTDL gamma frailty
model. Note that the mean difference is always positive for n1000 and h0:7, regardless of information cri-
terion. These results show that the information criteria can distinguish between the models in the presence of high
heterogeneity, as the mean differences for the correct model were always greater than those for the incorrect
model. Selection proportions for the correct model were always high, especially when the AIC was used. We will
use the AIC in Section 5 because this criterion presented the greatest mean difference (Figure 4).
4.5 Hypothesis testing
As we are interested in estimating the degree of unobservable heterogeneity in the model including the frailty term,
we assessed whether the inclusion of this term in the GTDL model is necessary using the null hypothesis
H0:h¼0. The statistic used most commonly for this purpose is the likelihood ratio. Asymptotically, this statistic
has the distribution v2
1, but under H
0
the parameter value is on the boundary of the parametric space and
problems can occur when testing the null hypothesis. The likelihood ratio test (LRT) is given by
K¼2fð^
#Þð^
#0Þg, where ^
#0is the maximum likelihood estimator of ^
#under H
0
. Under certain regularity
conditions, Maller and Zhou
46
showed that the statistical distribution Kis a mixture in proportions 50%=50% of
a chi-squared distribution with one degree of freedom and a point mass at 0, that is P½Kn¼0:5þ0:5P½v2
1n.
To evaluate the performance of the LRT in testing the null hypothesis (equivalent absence of heterogeneity),
datasets were simulated considering different values of h¼f0;0:05;0:1;0:3;0:7;1:1;1:5g;a0¼0:36;
a1¼0:24;k¼1:4;b1¼2:7, and several sample sizes. For each configuration, we calculated the rate of rejec-
tion of the null hypothesis. The size and power of the tests are presented in Table 3. As expected, the H
0
rejection
rate decreases as happroaches 0. This rate increases with sample size and h. If the null hypothesis is false, the
rejection rate increases with the sample size. If the null hypothesis is true, the rejection rate is below 5% signif-
icance for small samples, but converges to the expected rate size as the sample size increases.
5 Application
To illustrate applicability of the proposed model, we consider a real cancer dataset. We fitted the GTDL and
GTDL PVF frailty models and their special models to the dataset and compared them with survival curve
estimates obtained using the Kaplan–Meier estimator.
47
For each fitted model, we provide the MLEs and stan-
dard error estimate, 95% confidence interval estimates for the parameters, and AIC value. Estimates of the
standard error for the cure fraction parameter were obtained using the delta method with first-order Taylor’s
approximation.
5.1 Melanoma cancer data
The melanoma dataset is from a retrospective survey of 7166 records of patients diagnosed with melanoma in the
state of S~
ao Paulo, Brazil, between 2000 and 2014, with follow-up conducted until 2018. It was provided by the
Fundac¸~
ao Oncocentro de S~
ao Paulo (FOSP), which is responsible for coordinating the Hospital Cancer Registry of
the State of S~
ao Paulo. The FOSP is a public institution connected to the State Health Secretariat, which assists in
the preparation and implementation of healthcare policies in the field of Oncology, and serves as an instrument
so that oncology hospitals can prepare their own protocols and improve their care practices.
48
Death due to cancer was defined as the event of interest. The main goal was to assess the impact of surgery on
specific survival. Of the 7166 patients, 6307 underwent surgery and 859 did not. A total of 2067 events occurred
during follow-up period: 1561 (24.75%) occurred among patients who underwent surgery and 506 (58.9%)
12 Statistical Methods in Medical Research 0(0)
Sample size
Mean difference
50 100 300 500 1000
−10 −5 051015
20
θ=0
Sample size
Observed selection proportions based on information criteria (%)
50 100 300 500 1000
0 20 40 60 80 100
θ=0
Sample size
Mean difference
50 100 300 500 1000
−10 −5 0 5 10 15 20
θ=0.3
Sample size
Observed selection proportions based on information criteria (%)
50 100 300 500 1000
0 20 40 60 80 100
θ=0.3
Sample size
Mean difference
50 100 300 500 1000
−10 −5 0 5 10 15 20
θ=0.7
Sample size
Observed selection proportions based on information criteria (%)
50 100 300 500 1000
0 20 40 60 80 100
θ=0.7
Sample size
Mean difference
50 100 300 500 1000
−10 −5 0 5 10 15 20
θ=1.1
Sample size
Observed selection proportions based on information criteria (%)
50 100 300 500 1000
02040
60 80 100
θ=1.1
Figure 4. Top row: Mean differences in information criteria obtained from fitted GTDL gamma frailty and GTDL models, when data were generated from the GTDL gamma frailty
model. Bottom row: Observed selection proportions (correct model) based on information criteria. (For interpretation of the references to color in this figure legend, the reader is
referred to the version of this article.)
Calsavara et al. 13
occurred among those who did not undergo surgery. The maximum observation time was approximately
18.54 years and the median follow-up time was 5.24 years.
The staging system proposed by the American Joint Committee on Cancer (AJCC) is commonly used world-
wide for several solid tumors, including melanoma. According to the latest edition,
49
clinical stages I and II
correspond to the melanoma limited to the skin, which is associated to a better prognosis. These patients are
normally treated with surgery and the great majority will be alive after 10 years of follow-up. Clinical stage III
corresponds to nodal spreading of the melanoma, and in this scenario surgery is routinely associated to radio-
therapy and/or some modality of systemic treatment such as immunotherapy or targeted therapies.
50
In the
literature, the melanoma specific survival after 10 years in these patients may vary from 24% to 88%.
50
Clinical stage IV corresponds to metastatic disease, which carries the worst prognostic.
49
Even though several
new modalities of treatment have been reported in the latest years, treating these patients is still challenging.
51
Some of these patients may undergo surgery at some time, but it is more likely that they will need systemic
treatment.
52
In our study, from available information about stage clinical and surgery, around 75% of the patients
who underwent surgery were in stage clinical I and II, while 65% of the patients who did not undergo surgery were
in stage clinical III and IV.
Figure 5 shows a plot of log cumulative baseline hazard rates against time (follow-up period) for the “treatment
received” variable. According to Klein and Moeschberger,
6
if the proportionality assumption holds, then these
curves should be approximately parallel, with constant vertical separation between them. This plot suggests that
the hazard are non-proportional. In particular the proportional hazards model is questionable before five years.
Table 3. Rates of rejection of the null hypothesis (absence of unobservable heterogeneity) at 5% nominal significance level for several
sample sizes and degrees of unobservable heterogeneity.
Sample size
h50 100 300 500 1000 2000 5000 10000
0 0.033 0.049 0.052 0.041 0.042 0.038 0.049 0.049
0.05 0.022 0.036 0.015 0.033 0.016 0.038 0.101 0.233
0.1 0.029 0.056 0.057 0.085 0.098 0.173 0.404 0.608
0.3 0.051 0.109 0.168 0.294 0.483 0.732 0.973 0.998
0.7 0.092 0.200 0.453 0.665 0.899 0.995 1 1
1.1 0.159 0.257 0.666 0.852 0.992 1 1 1
1.5 0.194 0.322 0.733 0.924 0.999 1 1 1
0 5 10 15 20
−1.0 −0.8 −0.6 −0.4 −0.2
Time (years)
log(Cumulative hazard function)
Treatment received
No surgery
Surgery
−6
−4
−2
0
0.24 0.69 1.3 1.9 2.9 4.3 6.5 10
Time (years)
Beta(t) for x
Figure 5. Left panel: Plot of log cumulative baseline hazard rates versus time on study for the treatment received variable. Right
panel: Standardized Schoenfeld residuals þ^
bfor the covariate Treatment received plotted from Cox model fitted.
14 Statistical Methods in Medical Research 0(0)
Figure 5 also shows a plot of standardized Schoenfeld residuals against time for this covariate. The results of
proportional hazards assumption testing for a Cox regression model fit
5
are displayed in Table 4; they provided
strong evidence that this variable had a non-constant effect over time.
To evaluate the effect of the surgery in the lifetime, we fitted the GTDL and GTDL PVF frailty models to
the dataset. For illustrative purposes, we link parameter ato treatment received through an identity link
function. Thus
aðxiÞ¼a0þxia1
where x
i
is a group variable, where x
i
¼1 and 0 indicate patients undergoing and not undergoing surgery, respec-
tively for i¼1;...;7166; and a>¼ða0;a1Þrepresents the regression coefficients. The results of the fitted GTDL
and GTDL PVF frailty models are given in Table 5. Notice that the estimate of cis close to zero indicating that a
GTDL gamma frailty model can be considered. In this sense, we also fitted the main special cases, GTDL inverse
Gaussian (c¼0:5) and gamma (c!0) frailty models. According to the AIC value, the GTDL gamma frailty
model seems to be the better choice among the four models.
The results suggest a significant effect of treatment in the lifetime, regardless of model, as the 95% confidence
interval the b
1
does not include 0. In addition, the measure of the time effect differs between groups (a
0
and a
1
are
significant). Note that ^
a0<0 and ^
a0þ^
a1<0 in the four models, which means that the distributions were
Table 4. Test of proportional hazards assumption.
Variable qv
2p-value
Treatment received 0.282 156 <0.0001
Table 5. Maximum likelihood estimates (MLEs), standard errors (SEs), 95% asymptotic confidence intervals (CIs), maximum of the
log-likelihood function ½max ð#Þ, and AIC values obtained by fitting the GTDL and GTDL frailty models to the melanoma dataset.
Model GTDL model GTDL PVF frailty model
CI (95%) CI (95%)
Parameter MLE SE Lower Upper MLE SE Lower Upper
h 1.033 0.284 0.476 1.590
c 0.020 0.466 0.001 0.933
k1.058 0.061 0.939 1.178 1.345 0.139 1.072 1.618
a
0
0.607 0.037 0.679 0.535 0.380 0.106 0.588 0.173
a
1
0.430 0.039 0.354 0.505 0.252 0.092 0.072 0.433
b
1
2.432 0.074 2.576 2.287 2.644 0.113 2.865 2.423
p
0
0.299 0.018 0.263 0.334 0.293 0.020 0.254 0.333
p
1
0.605 0.013 0.579 0.630 0.583 0.019 0.547 0.620
max ð#Þ7161.299 7150.338
AIC 14330.598 14312.676
Model GTDL Inv. Gaussian model GTDL Gamma frailty model
h0.823 0.286 0.268 1.388 1.153 0.244 0.675 1.632
k1.260 0.104 1.055 1.465 1.418 0.126 1.171 1.666
a
0
0.476 0.047 0.567 0.384 0.358 0.061 0.477 0.239
a
1
0.333 0.043 0.249 0.416 0.237 0.054 0.132 0.341
b
1
2.583 0.091 2.762 2.404 2.701 0.099 2.895 2.507
p
0
0.303 0.019 0.266 0.340 0.290 0.020 0.251 0.329
p
1
0.595 0.016 0.563 0.627 0.579 0.018 0.543 0.616
max ð#Þ7154.150 7150.065
AIC 14318.300 14310.130
Calsavara et al. 15
improper, leading to cure rates in the two groups. The results also show that the estimated long-term survivors in
the four models are similar, as seen in the simulation study.
As mentioned previously, of the four fitted models, the GTDL gamma frailty model gave the best fit according
to the AIC value. Although the GTDL PVF frailty model can be also considered, the difference between AIC
values is small and the parameter estimates are similar.
Taking into account the AIC criterion, max ð) values and number of parameters in the model, we select the
GTDL gamma frailty model as our working model. Note that ^
h¼1:153, which indicates a reasonable degree of
unobserved heterogeneity in the sample. In addition, the estimated time effects from GTDL gamma frailty model
were ^
a0¼0:358; CI(95%Þ¼½0:477;0:239in the no surgery group and ^
a0þ^
a1¼0:121; CI
(95%Þ¼½0:153;0:089in the surgery group. These estimates evidence that the time effect is not the same in
both groups. As the time effects are negative, the model suggests that there are long-term survivors, as can be seen
in the estimated proportions, ^
p0¼0:290 with standard error 0.02 (no surgery) and ^
p1¼0:579 with standard error
0.018 (surgery).
0 5 10 15 20
0.0 0.2 0.4 0.6 0.8 1.0
Time (years)
Survival Function
Treatment received
No surgery
Surgery
Model
GTDL
GTDL frailty
0 5 10 15 20
0.0 0.2 0.4 0.6 0.8
Time (years)
Hazard function
Treatment received
No surgery
Surgery
Model
GTDL
GTDL frailty
Figure 6. Left panel: Estimated survival curve obtained via Kaplan–Meier (black line) for melanoma dataset, and estimated survival
function according to GTDL (red line) and GTDL gamma frailty models (green line). Right panel: Estimated hazard function according
to GTDL model and GTDL gamma frailty model. (For interpretation of the references to color in this figure legend, the reader is
referred to the web version of this article.)
0.2 0.4 0.6 0.8 1.0
0.2 0.4 0.6 0.8 1.0
Predict values
Kaplan−Meier estimates
No surgery
Model
GTDL
GTDL frailty
0.6 0.7 0.8 0.9 1.0
0.6 0.7 0.8 0.9 1.0
Predict values
Kaplan−Meier estimates
Surgery
Model
GTDL
GTDL frailty
Figure 7. Plots of the Kaplan–Meier estimates for the survival function versus the respective predicted values obtained from the
GTDL (red points) and GTDL gamma frailty models (green points) stratified by treatment received. (For interpretation of the
references to color in this figure legend, the reader is referred to the web version of this article.)
16 Statistical Methods in Medical Research 0(0)
Overall, the models reasonably fit Kaplan–Meier curves. However, the GTDL frailty model enables the quan-
tification of unobserved heterogeneity, which is of great importance in clinical practice. We thus tested the
suitability of the frailty term in the GTDL model using LRT, as described for the simulation study. We obtained
K¼40:936 (p-value <0.0001), which provides evidence in favor of the inclusion of the frailty term.
Figure 6 shows the estimated survival and hazard functions from the GTDL and GTDL frailty models. In both
models, but more so in the GTDL frailty model, the survival function estimates are close to the Kaplan–Meier
curves. In addition, the hazard function curves are higher for patients who did not undergo surgery, mainly in the
first five years of follow-up, regardless of models. In both models, the fitted hazard functions decrease over time;
the curves also cross over time. Such crossing does not occur in the traditional GTDL model (1), which is a
disadvantage.
Finally, it is compared in Figure 7 the empirical estimated based on the Kaplan–Meier versus the correspond-
ing predicted values obtained from the GTDL and GTDL gamma frailty models. Through this approach, it is
evident that GTDL gamma frailty model provided predicted values closer than GTDL model, regardless of the
type of treatment patient received, which indicates a satisfactory fit of the proposed model. As can be seen in the
plots, some deviations of predictions occurred in relation the Kaplan–Meier estimates (overestimated or under-
estimated), but these differences are acceptable, once the higher difference between predicted values and Kaplan–
Meier estimates was 0.049 in the no surgery group, while in the surgery group it was 0.012. In the medical clinic,
this small difference is acceptable.
6 Concluding remarks
In this paper, we considered the GTDL model with a PVF frailty term for right-censored data with the potential
existence of long-term survivors in the population. An advantage of the studied model over alternatives is that it
does not make assumptions about the existence of the cure rate, once the parameter avalue has led to proper
ða>0Þor improper ða<0Þdistribution; this makes the model flexible and applicable to situations with and
without cure fractions. If parameter ais estimated to be negative, then the cure fraction is computed as a function
of the GTDL model parameters. In addition, the inclusion of a frailty term in the hazard function enables the
quantification of unobserved heterogeneity by means of the parameter h. In our simulation study, conducted to
illustrate the frequent properties of the MLEs of the parameters, the bias and RMSEs appeared to trend reason-
ably close to 0 as the sample size increased. The simulation study showed that the GTDL frailty model is not
indicated for small (n150) samples. In practice, the model is often chosen based on a selection criterion.
Therefore, we evaluated the performance of the GTDL frailty model against that of the GTDL model using
several such criteria. A simulation revealed that, on average, the AIC value from the fitted GTDL model was
smallest and, consequently, that this model performed best. The practical relevance and applicability of the
studied models were demonstrated using a real dataset. Although further research on this approach must be
conducted, our initial results suggest that this model enhances the analysis of non-proportional hazards in the
presence or absence of long-term survivors.
Acknowledgements
The authors thank the Fundac¸ ~
ao Oncocentro de S~
ao Paulo for providing the melanoma dataset. They also thank the two
referees for their comments which greatly improved this paper.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this
article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
ORCID iD
Vinicius F Calsavara https://orcid.org/0000-0003-2332-5863
Supplemental material
Supplemental material for this article is available online.
Calsavara et al. 17
References
1. Cox DR. Regression models and life-tables. J Royal Stat Soc B 1972; 34: 187–220.
2. Schemper M. Cox analysis of survival data with non-proportional hazard functions. Statistician 1992; 41: 455–465.
3. Schoenfeld D. Partial residuals for the proportional hazards regression model. Biometrika 1982; 69: 239–241.
4. Pettitt A and Bin Daud I. Investigating time dependence in Cox’s proportional hazards model. Appl Stat 1990; 39:
313–329.
5. Grambsch PM and Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika
1994; 81: 515–526.
6. Klein JP and Moeschberger ML. Survival analysis: Statistical methods for censored and truncated data. New York, NY:
Springer Verlag, 2003.
7. Hess KR. Graphical methods for assessing violations of the proportional hazards assumption in Cox regression. Stat Med
1995; 14: 1707–1723.
8. Prentice RL. Linear rank tests with right censored data. Biometrika 1978; 65: 167–179.
9. Kalbfleisch JD and Prentice RL. The statistical analysis of failure time data. Hoboken, NJ: John Wiley & Sons, 2011.
10. Etezadi-Amoli J and Ciampi A. Extended hazard regression for censored survival data with covariates: a spline approx-
imation for the baseline hazard function. Biometrics 1987; 43: 181–192.
11. Louzada-Neto F. Extended hazard regression model for reliability and survival analysis. Lifetime Data Analysis 1997; 3:
367–381.
12. Louzada-Neto F. Polyhazard models for lifetime data. Biometrics 1999; 55: 1281–1285.
13. Mackenzie G. Regression models for survival data: the generalized time-dependent logistic family. Statistician 1996; 45:
21–34.
14. Louzada-Neto F, Cremasco CP and MacKenzie G. Sampling-based inference for the generalized time-dependent logistic
hazard model. J Stat Theory Appl 2010; 9: 169–184.
15. Milani EA, Tomazella VL, Dias TC et al. The generalized time-dependent logistic frailty model: an application to a
population-based prospective study of incident cases of lung cancer diagnosed in Northern Ireland. Brazil J Probabil Stat
2015; 29: 132–144.
16. Boag JW. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J Royal Stat Soc B 1949;
11: 15–53.
17. Berkson J and Gage RP. Survival curve for cancer patients following treatment. J Am Stat Assoc 1952; 47: 501–515.
18. Hougaard P. Modelling heterogeneity in survival data. J Appl Probabil 1991; 28: 695–701.
19. Vaupel JW, Manton KG and Stallard E. The impact of heterogeneity in individual frailty on the dynamics of mortality.
Demography 1979; 16: 439–454.
20. Clayton D. A model for association in bivariate life tables and its application in epidemiological studies of familial
tendency in chronic disease incidence. Biometrika 1978; 65: 141–151.
21. Andersen P. Statistical models based on counting processes. New York, NY: Springer Verlag, 1993.
22. Hougaard P. Frailty models for survival data. Lifetime Data Analysis 1995; 1: 255–273.
23. Sinha D and Dey D. Semiparametric Bayesian analysis of survival data. J Am Stat Assoc 1997; 92: 1195–1212.
24. Oakes D. A model for association in bivariate survival data. J Royal Stat Soc B 1982; 44: 414–422.
25. Aalen OO. Heterogeneity in survival analysis. Stat Med 1988; 7: 1121–1137.
26. Hougaard P, Myglegaard P and Borch-Johnsen K. Heterogeneity models of disease susceptibility, with application to
diabetic nephropathy. Biometrics 1994; 50: 1178–1188.
27. Price DL and Manatunga AK. Modelling survival data with a cured fraction using frailty models. Stat Med 2001; 20:
1515–1527.
28. Peng Y, Taylor JMG and Yu B. A marginal regression model for multivariate failure time data with a surviving fraction.
Lifetime Data Analysis 2007; 13: 351–369.
29. Yu B and Peng Y. Mixture cure models for multivariate survival data. Computat Stat Data Analys 2008; 52: 1524–1532.
30. Calsavara VF, Tomazella VLD and Fogo JC. The effect of frailty term in the standard mixture model. Chilean J Stat 2013;
4: 95–109.
31. Calsavara VF, Rodrigues AS, Tomazella VLD, et al. Frailty models power variance function with cure fraction and latent
risk factors negative binomial. Commun Stat-Theory Meth 2017; 46: 9763–9776.
32. Balka J, Desmond AF and McNicholas PD. Review and implementation of cure models based on first hitting times for
Wiener processes. Lifetime Data Analysis 2009; 15: 147–176.
33. Balka J, Desmond AF and McNicholas PD. Bayesian and likelihood inference for cure rates based on defective inverse
Gaussian regression models. J Appl Stat 2011; 38: 127–144.
34. Rocha R, Nadarajah S, Tomazella V, et al. Two new defective distributions based on the Marshall–Olkin extension.
Lifetime Data Analysis 2016; 22: 216–240.
35. Rocha R, Nadarajah S, Tomazella V, et al. A new class of defective models based on the Marshall–Olkin family of
distributions for cure rate modeling. Computat Stat Data Analys 2017; 107: 48–63.
18 Statistical Methods in Medical Research 0(0)
36. Rocha R, Nadarajah S, Tomazella V, et al. New defective models based on the Kumaraswamy family of distributions with
application to cancer data sets. Stat Meth Med Res 2017; 26: 1737–1755.
37. Scudilio J, Calsavara VF, Rocha R, et al. Defective models induced by gamma frailty term for survival data with cured
fraction. J Appl Stat 2019; 46: 484–507.
38. Calsavara VF, Rodrigues AS, Rocha R, et al. Defective regression models for cure rate modeling with interval-censored
data. Biometric J 2019; 61: 841–859.
39. Calsavara VF, Rodrigues AS, Rocha R, et al. Zero-adjusted defective regression models for modeling lifetime data. J Appl
Stat 2019; 46: 2434–2459.
40. Coordenac¸ ~
ao de Prevenc¸ ~
ao e Vigilaˆ ncia. Instituto Nacional de Cancer Jose
´Alencar Gomes da Silva. Estimativa 2018:
Incideˆ ncia de Cancer no Brasil. Coordenac¸ ~
ao de Prevenc¸ ~
ao e Vigilaˆ ncia Rio de Janeiro, http://www1.inca.gov.br/
estimativa/2018/ (2017).
41. Ervik M, Lam F, Ferlay J, et al. Cancer today Lyon, France: International agency for research on cancer. Cancer Today,
http://gco iarc fr/today (2016, accessed 01 February 2019).
42. Tweedie MCK. An index which distinguishes between some important exponential families. In: Statistics: applications and
new directions.Proc.Indian statistical institute golden jubilee international conference, pp.579–604.
43. Hougaard P. Survival models for heterogeneous populations derived from stable distributions. Biometrika 1986; 73:
387–396.
44. Wienke A. Frailty models in survival analysis. Boca Raton, FL: Chapman & Hall/CRC, 2011.
45. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical
Computing, 2018.
46. Maller R and Zhou X. Survival analysis with long-term survivors. Hoboken, NJ: John Wiley & Sons, 1996.
47. Kaplan EL and Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958; 53: 457–481.
48. Andrade CTd, Magedanz AMPCB, Escobosa DM et al. The importance of a database in the management of healthcare
services. Einstein (S~
ao Paulo) 2012; 10: 360–365.
49. Gershenwald JE, Scolyer RA, Hess KR, et al. Melanoma staging: evidence-based changes in the American joint committee
on cancer eighth edition cancer staging manual. CA: A Cancer J Clin 2017; 67: 472–492.
50. Eggermont AM and Dummer R. The 2017 complete overhaul of adjuvant therapies for high-risk melanoma and its
consequences for staging and management of melanoma patients. Eur J Cancer 2017; 86: 101–105.
51. Ascierto PA, Flaherty K and Goff S. Emerging strategies in systemic therapy for the treatment of melanoma. Am Soc Clin
Oncol Educ Book 2018; 38: 751–758.
52. Puza CJ, Bressler ES, Terando AM, et al. The emerging role of surgery for patients with advanced melanoma treated with
immunotherapy. J Surg Res 2019; 236: 209–215.
Calsavara et al. 19
... Yin and Ibrahim 20 proposed a cure fraction model that provides zero and nonzero cure fraction estimates, obviating the need to assume a cure fraction. Along the same line, Calsavara et al. 21 proposed a nonproportional hazards model with a frailty term that allows the existence or nonexistence of long-term survivors in a population. Defective models, recently popularized by Rocha et al., [22][23][24] Scudilio et al., 25 and Calsavara et al., 26,27 can also be used to model presence or absence of long-term survivors. ...
... However, skin carcinomas are more incident than melanoma. 21 In 2018, about 6000 new cases of melanoma were expected to be identified, according to the Brazilian National Institute of Cancer; 35 the International Agency for Research on Cancer estimated that this number was about 7000. 36 About 2000 deaths per year in Brazil are attributable to melanoma. ...
... Death due to cancer was defined as the event of interest. This dataset was initially studied by Calsavara et al., 21 where they evaluated only the effect of surgery covariate in lifetime using a nonproportional hazards model with a frailty term. Recently, Molina et al., 60 Leão et al., 61 and Rodrigues et al. 62 used this dataset considering surgery, as well as other available covariates. ...
Article
In this article, we discuss an extension of the classical negative binomial cure rate model with piecewise exponential distribution of the time to event for concurrent causes, which enables the modeling of monotonic and non-monotonic hazard functions (ie, the shape of the hazard function is not assumed as in traditional parametric models). This approach produces a flexible cure rate model, depending on the choice of time partition. We discuss the local influence on this negative binomial power piecewise exponential model. We report on Monte Carlo simulation studies and the application of the model to real melanoma and leukemia datasets.
... However, the GTDL model has advantages over these others because (i) it allows long-term survivors without the need for extra parameters, and (ii) it does not make assumptions about the existence of long-term survivors in the study population. As the hazard function of the GTDL model only accommodates monotone (constant, decreasing, or increasing) shapes, extensions of the GTDL model have been made by Milani et al. (2015) and Calsavara et al. (2020) in order to give more flexibility in the fitting of the survival data. ...
... Usually, in a retrospective data collection study, these factors are unknown and cannot be included in the analysis. The amount of unobservable heterogeneity will increase if significant covariates are omitted or not considered in the planning, and the inclusion of a frailty term can help alleviate this problem (Calsavara et al., 2020). ...
... As done by Calsavara et al. (2020), we also incorporated explanatory variables in the GTDL-RWLF model through of the parameter , making the model more flexible. For example, when a treatment is effective, patients can be long-term survivors and the insertion of covariates in this parameter will reflect an estimate of < 0. Thus, covariates can be included in the hazard function (12) as follows: ...
Article
With advancements in medical treatments for cancer, an increase in the life expectancy of patients undergoing new treatments is expected. Consequently, the field of statistics has evolved to present increasingly flexible models to explain such results better. In this paper, we present a lung cancer dataset with some covariates that exhibit nonproportional hazards (NPHs). Besides, the presence of long‐term survivors is observed in subgroups. The proposed modeling is based on the generalized time‐dependent logistic model with each subgroup's effect time and a random term effect (frailty). In practice, essential covariates are not observed for several reasons. In this context, frailty models are useful in modeling to quantify the amount of unobservable heterogeneity. The frailty distribution adopted was the weighted Lindley distribution, which has several interesting properties, such as the Laplace transform function on closed form, flexibility in the probability density function, among others. The proposed model allows for NPHs and long‐term survivors in subgroups. Parameter estimation was performed using the maximum likelihood method, and Monte Carlo simulation studies were conducted to evaluate the estimators' performance. We exemplify this model's use by applying data of patients diagnosed with lung cancer in the state of São Paulo, Brazil.
... It is part of a study of skin cancer in 6749 patients diagnosed melanoma in the state of São Paulo, Brazil. This dataset was initially studied by Calsavara et al. (2020), which they considered only an observed covariate (surgery) in the modeling. Recently, Molina et al. (2021), Rodrigues et al. (2021), Leão et al. (2021) and Gómez et al. (2021) considered other covariates available in the medical records, such as sex, clinical stage, and type of treatment (radiotherapy and chemotherapy). ...
... In the routine clinical the staging system proposed by the American Joint Committee on Cancer (AJCC) is used to define stage melanoma cases. As mentioned by Calsavara et al. (2020), the early clinical stages (I or II) are associated with a better prognosis. The great majority of patients with stage I or II melanoma will be alive after 10 years of follow-up, once that most of these cases are treated with surgery. ...
Article
Full-text available
Over the last decades, the challenges in survival models have been changing considerably and full probabilistic modeling is crucial in many medical applications. Motivated from a new biological interpretation of cancer metastasis, we introduce a general method for obtaining more flexible cure rate models. The proposal model extended the promotion time cure rate model. Furthermore, it includes several well-known models as special cases and defines many new special models. We derive several properties of the hazard function for the proposed model and establish mathematical relationships with the promotion time cure rate model. We consider a frequentist approach to perform inferences, and the maximum likelihood method is employed to estimate the model parameters. Simulation studies are conducted to evaluate its performance with a discussion of the obtained results. A real dataset from population-based study of incident cases of melanoma diagnosed in the state of São Paulo, Brazil, is discussed in detail.
... where h 0 (·) is the unconditional hazard rate function of the GTDL regression model defined in (1). The non-proportionality of this model is evident due to the presence of the hazard function of the GTDL model. ...
... As done by [1], we also incorporated explanatory variables in the GTDL-RWLF model through of the parameter α, making the model more flexible. For example, when a treatment is effective, patients can be long-term survivors and the insertion of covariates in this parameter will reflect an estimate of α < 0. Thus, covariates can be included in ...
Conference Paper
Full-text available
In this paper, we extend the non-proportional GTDL regression model by including a frailty term in the hazard function for right-censored data to quantify the amount of unobserved heterogeneity among individuals under study. The frailty term is modeled by a weighted Lindley (WL) distribution. Similar to some standard frailty distributions, the WL frailty distribution has also a simple Laplace transform, which permits the closed-form expression of the marginal survival and hazard functions, in addition, to simplifying parameter estimation. For our model to be identifiable, we employ a WL distribution with mean one as the frailty distribution by using the parameterization of the WL distribution in terms of the mean. To demonstrate the applicability of the proposed model, we analyze a medical dataset from a population-based study of incident cases of lung cancer diagnosed in the state of São Paulo, Brazil.
... Many studies have been conducted on the mixture cure frailty models [16][17][18]. For example, Calsavara introduced the long-term frailty model using a non-proportional risk model and applied this model to melanoma datasets [19]. Other studies have used the mixture cure model for competing risks [20][21][22]. ...
Article
Full-text available
Background HIV is one of the deadliest epidemics and one of the most critical global public health issues. Some are susceptible to die among people living with HIV and some survive longer. The aim of the present study is to use mixture cure models to estimate factors affecting short- and long-term survival of HIV patients. Methods The total sample size was 2170 HIV-infected people referred to the disease counseling centers in Kermanshah Province, in the west of Iran, from 1998 to 2019. A Semiparametric PH mixture cure model and a mixture cure frailty model were fitted to the data. Also, a comparison between these two models was performed. Results Based on the results of the mixture cure frailty model, antiretroviral therapy, tuberculosis infection, history of imprisonment, and mode of HIV transmission influenced short-term survival time ( p -value < 0.05). On the other hand, prison history, antiretroviral therapy, mode of HIV transmission, age, marital status, gender, and education were significantly associated with long-term survival ( p -value < 0.05). The concordance criteria (K-index) value for the mixture cure frailty model was 0.65 whereas for the semiparametric PH mixture cure model was 0.62. Conclusion This study showed that the frailty mixture cure models is more suitable in the situation where the studied population consisted of two groups, susceptible and non-susceptible to the event of death. The people with a prison history, who received ART treatment, and contracted HIV through injection drug users survive longer. Health professionals should pay more attention to these findings in HIV prevention and treatment.
... There are many possible extensions of the current work to consider further. For instance, one can investigate the use of non-parametric baseline hazard functions as well as other parametric baseline hazard functions (see Nielsen et al. 1992;Barker and Henderson 2005;Calsavara et al. 2020). In addition, other parameter estimation methods could be developed and compared (see Ibrahim et al. 2001;Parner 1998). ...
Article
Full-text available
In this paper, we propose a novel frailty model for modeling unobserved heterogeneity present in survival data. Our model is derived by using a weighted Lindley distribution as the frailty distribution. The respective frailty distribution has a simple Laplace transform function which is useful to obtain marginal survival and hazard functions. We assume hazard functions of the Weibull and Gompertz distributions as the baseline hazard functions. A classical inference procedure based on the maximum likelihood method is presented. Extensive simulation studies are further performed to verify the behavior of maximum likelihood estimators under different proportions of right-censoring and to assess the performance of the likelihood ratio test to detect unobserved heterogeneity in different sample sizes. Finally, to demonstrate the applicability of the proposed model, we use it to analyze a medical dataset from a population-based study of incident cases of lung cancer diagnosed in the state of São Paulo, Brazil.
... Then, the survival function is improper, and in each case, we have a natural tail-deficit or cure rate model. We do not pursue the technical details of cure models in the XGTDL family here, but draw attention to related work by Milani et al. (2015), who studied the tail-deficit GTDL and the tail-deficit GTDL with Gamma frailty, while Rasouli et al. (2016) studied a GTDL model as a classical cure rate mixture model and Calsavara et al. (2020) analysed a melanoma dataset. 1 ...
Article
Full-text available
Non-PH parametric survival modelling is developed within the framework of the multiple logistic function. The family considered comprises three basic models: (a) a PH model, (b) an accelerated life model and (c) a model which is non-proportional hazards and non-accelerated life. The last model, the generalised time-dependent logistic model was described first by the author in 1996 and this model gives its name to the entire family. The family is generalised by means of a Gamma frailty extension which is shown to accommodate crossing hazards data. A further generalisation is the inclusion of a dispersion model. These extensions lead naturally to the concept of a multi-parameter regression model described by Burke and MacKenzie in which the scale and shape parameters are modelled simultaneously as functions of covariates. Where possible, we include the MPR extension in the XGTDL family. Following a simulation study, the new models are used to analyse two sets survival data and the methods are discussed.
Thesis
Full-text available
In this work, we propose different statistical modeling for survival data based on a reparameterized weighted Lindley distribution. Initially, we present this distribution and study its mathematical properties, maximum likelihood estimation, and numerical simulations. Then, we propose a novel frailty model by using the reparameterized weighted Lindley distribution for modeling unobserved heterogeneity in univariate survival data. The frailty is introduced multiplicatively on the baseline hazard function. We obtain unconditional survival and hazard functions through the Laplace transform function of the frailty distribution. We assume hazard functions of the Weibull and Gompertz distributions as the baseline hazard functions and use the maximum likelihood method for estimating the resulting model parameters. Simulation studies are further performed to verify the behavior of maximum likelihood estimators under different proportions of right-censoring and to assess the performance of the likelihood ratio test to detect unobserved heterogeneity in different sample sizes. Also, we propose a frailty long-term model where the frailties are described by reparameterized weighted Lindley distribution. An advantage of the proposed model is to jointly model the heterogeneity among patients by their frailties and the presence of a cured fraction of them. We assume that the unknown number of competing causes that can influence the survival time follows a negative binomial distribution and that the time for the $k$-th competing cause to produce the event of interest follows the reparameterized weighted Lindley frailty model with Weibull baseline distribution. Some special cases of the model are presented. The cure fraction is modeled by using the logit link function. Again, we use the maximum likelihood method under random right-censoring to estimate the proposed model parameters. Further, we present a Monte Carlo simulation study to verify the maximum likelihood estimators' behavior assuming different sample sizes and censoring proportions. Finally, we extend the non-proportional generalized time-dependent logistic regression model by incorporating reparameterized weighted Lindley frailties. This proposed modeling has several important characteristics, such as non-proportional hazards, identifies the presence of long-term survivors without the addition of new parameters, captures the unobserved heterogeneity, allows the intersection of survival curves, and allows decreasing or unimodal hazard function. Again, parameter estimation is performed using the maximum likelihood method. Monte Carlo simulation studies are conducted to evaluate the asymptotic properties of the estimators as well as some properties of the model. The potentiality of all the proposed models is analyzed by employing real datasets and model comparisons are performed.
Article
Full-text available
Regression models in survival analysis are most commonly applied for right‐censored survival data. In some situations, the time to the event is not exactly observed, although it is known that the event occurred between two observed times. In practice, the moment of observation is frequently taken as the event occurrence time, and the interval‐censored mechanism is ignored. We present a cure rate defective model for interval‐censored event‐time data. The defective distribution is characterized by a density function whose integration assumes a value less than one when the parameter domain differs from the usual domain. We use the Gompertz and inverse Gaussian defective distributions to model data containing cured elements and estimate parameters using the maximum likelihood estimation procedure. We evaluate the performance of the proposed models using Monte Carlo simulation studies. Practical relevance of the models is illustrated by applying datasets on ovarian cancer recurrence and oral lesions in children after liver transplantation, both of which were derived from studies performed at A.C. Camargo Cancer Center in São Paulo, Brazil.
Article
Full-text available
Recent years have seen major improvements in survival of patients with advanced melanoma with the advent of various novel systemic immunotherapies and targeted therapies. As our understanding of these agents and their various mechanisms of action improves, even more impressive outcomes are being achieved through use of various combination strategies, including the combining of different immunotherapies with one another as well as with other modalities. However, despite the improved outcomes that have been achieved in advanced melanoma, responses to treatment are heterogeneous and may not always be durable. Additional advances in therapy are required, and several emerging strategies are a focus of interest. These include the investigation of several new immunotherapy and/or targeted therapy combinations, such as checkpoint inhibitors (anti-PD-1/anti-CTLA-4) with other immunotherapies (e.g., indoleamine 2,3 dioxygenase [IDO] inhibitors, antilymphocyte activation 3 [anti-LAG-3], histone deacetylase [HDAC] inhibitors, Toll-like receptor 9 [TLR-9] agonists, antiglucocorticoid-induced tumor necrosis factor receptor [anti-GITR], pegylated interleukin-2 [IL-2]), combined targeted therapies (e.g., MEK and CDK4/6 coinhibition), and combined immunotherapy and targeted therapy (e.g., the triplet combination of BRAF/MEK inhibition with anti-PD-1s). The identification of novel therapeutic targets in the MAP kinase pathway also offers opportunities to improve outcomes by overcoming de novo and acquired resistance to BRAF/MEK inhibition (e.g., the development of ERK inhibitors). In addition, adoptive cell transfer, the infusion of large numbers of activated autologous lymphocytes, may have a potential role in patients whose disease has progressed after immunotherapy. Taken together, these new approaches offer further potential to increase systemic treatment options and improve long-term outcomes for patients with advanced melanoma.
Article
Background: The emergence of immune checkpoint inhibitors (ICIs) has improved survival for patients with metastatic melanoma. The types of disease-response patterns to ICI therapy can be more complex relative to traditional chemotherapy and include mixed responses, pseudoprogression, and oligoprogression. The potential benefit of surgery after incomplete response to ICI therapy has not been explored. The purpose of this study was to explore outcomes of surgery after ICI therapy in patients with metastatic melanoma. Methods: A retrospective study was conducted at two centers and included patients with melanoma who underwent surgery after treatment with monotherapy or combination therapy with anti-programmed cell death protein (PD) 1 and/or anti-cytotoxic T-lymphocyte associated protein (CTLA)-4 checkpoint blockade. Results: Of 25 patients, nine received anti-CTLA-4 therapy, eight received anti-PD-1 therapy, and eight received both anti-CTLA-4 and anti-PD-1 therapies before surgery. Five patients were treated in the adjuvant setting and developed new lesions, whereas 20 patients were treated for metastatic disease and underwent surgery for persistent disease on imaging after ICI therapy. Twenty-five patients underwent 30 operations without complications. Twenty-seven of 30 masses were confirmed to be melanoma on pathology, one was a desmoid tumor and two were necrosis. At a median follow-up of 14.2 months, 2 patients died, 8 were alive with a known disease, and 15 continued to have no further evidence of disease. Conclusions: Surgery was well tolerated in this cohort of patients receiving ICI therapy for melanoma. Surgery may benefit select patients with an oligoprogressive disease after ICI therapy. After a mixed response, surgery remains the only definitive method to render some patients free of disease.
Article
In this paper, we introduce a defective regression model for survival data modeling with a proportion of early failures or zero-adjusted. Our approach enables us to accommodate three types of units, that is, patients with ‘zero’ survival times (early failures) and those who are susceptible or not susceptible to the event of interest. Defective distributions are obtained from standard distributions by changing the domain of the parameters of the latter in such a way that their survival functions are limited to p∈(0,1). We consider the Gompertz and inverse Gaussian defective distributions, which allow modeling of data containing a cure fraction. Parameter estimation is performed by maximum likelihood estimation, and Monte Carlo simulation studies are conducted to evaluate the performance of the proposed models. We illustrate the practical relevance of the proposed models on two real data sets. The first is from a study of occlusion of endoscopic stenting in patients with pancreatic cancer performed at A.C.Camargo Cancer Center, and the other is from a study on insulin use in pregnant women diagnosed with gestational diabetes performed at São Paulo University Medical School. Both studies were performed in São Paulo, Brazil.
Article
In this paper, we propose a defective model induced by a frailty term for modeling the proportion of cured. Unlike most of the cure rate models, defective models have advantage of modeling the cure rate without adding any extra parameter in model. The introduction of an unobserved heterogeneity among individuals has bring advantages for the estimated model. The influence of unobserved covariates is incorporated using a proportional hazard model. The frailty term assumed to follow a gamma distribution is introduced on the hazard rate to control the unobservable heterogeneity of the patients. We assume that the baseline distribution follows a Gompertz and inverse Gaussian defective distributions. Thus we propose and discuss two defective distributions: the defective gamma-Gompertz and gamma-inverse Gaussian regression models. Simulation studies are performed to verify the asymptotic properties of the maximum likelihood estimator. Lastly, in order to illustrate the proposed model, we present three applications in real data sets, in which one of them we are using for the first time, related to a study about breast cancer in the A.C.Camargo Cancer Center, São Paulo, Brazil.
Article
The application of Cox's (1972) regression model for censored survival data to epidemiological studies of chronic disease incidence is discussed. A related model for association in bivariate survivorship time distributions is proposed for the analysis of familial tendency in disease incidence. The possible extension of the model to general multivariate survivorship distributions is indicated.
Article
Answer questions and earn CME/CNE To update the melanoma staging system of the American Joint Committee on Cancer (AJCC) a large database was assembled comprising >46,000 patients from 10 centers worldwide with stages I, II, and III melanoma diagnosed since 1998. Based on analyses of this new database, the existing seventh edition AJCC stage IV database, and contemporary clinical trial data, the AJCC Melanoma Expert Panel introduced several important changes to the Tumor, Nodes, Metastasis (TNM) classification and stage grouping criteria. Key changes in the eighth edition AJCC Cancer Staging Manual include: 1) tumor thickness measurements to be recorded to the nearest 0.1 mm, not 0.01 mm; 2) definitions of T1a and T1b are revised (T1a, <0.8 mm without ulceration; T1b, 0.8-1.0 mm with or without ulceration or <0.8 mm with ulceration), with mitotic rate no longer a T category criterion; 3) pathological (but not clinical) stage IA is revised to include T1b N0 M0 (formerly pathologic stage IB); 4) the N category descriptors "microscopic" and "macroscopic" for regional node metastasis are redefined as "clinically occult" and "clinically apparent"; 5) prognostic stage III groupings are based on N category criteria and T category criteria (ie, primary tumor thickness and ulceration) and increased from 3 to 4 subgroups (stages IIIA-IIID); 6) definitions of N subcategories are revised, with the presence of microsatellites, satellites, or in-transit metastases now categorized as N1c, N2c, or N3c based on the number of tumor-involved regional lymph nodes, if any; 7) descriptors are added to each M1 subcategory designation for lactate dehydrogenase (LDH) level (LDH elevation no longer upstages to M1c); and 8) a new M1d designation is added for central nervous system metastases. This evidence-based revision of the AJCC melanoma staging system will guide patient treatment, provide better prognostic estimates, and refine stratification of patients entering clinical trials.
Article
The spectacular outcomes of the phase III trials regarding nivolumab versus ipilimumab in fully resected stage IIIB/C-IV and of the combination of dabrafenib (D) plus trametinib (T) in BRAF-mutant stage III patients demonstrate that effective treatments in advanced melanoma are also highly effective in the adjuvant setting. In 2016, an overall survival benefit with adjuvant high-dose ipilimumab was demonstrated, and the European Organisation for Research and Treatment of Cancer trial 1325 comparing pembrolizumab versus placebo will complete the picture in the early 2018. Toxicity profiles are in line with the experience in advanced melanoma, i.e. favourable for the anti-PD1 agents and for D + T and problematic for ipilimumab. The 2017 outcomes are practice changing and put an end to the use of interferon (IFN) and ipilimumab. In countries with only access to IFN, its use can be restricted to patients with ulcerated melanoma, based on the individual patient data meta-analysis recently published. Because of the results of the Melanoma Sentinel Lymph node Trial-2 (MSLT-2) trial, completion lymph node dissection (CLND) will decrease sharply, leading to a lack of optimal prognostic information. Prognosis in sentinel node-positive stage IIIA/B patients is extremely heterogeneous with 5-year survival rates varying from 90% to 40% and depends mostly on the number of positive nodes identified by CLND. This information is crucial for clinical decision-making. How to guarantee optimal staging information needs to be discussed urgently. Further improvements of adjuvant therapies will have to address all these questions as well as the exploration of neoadjuvant use of active drugs and combination approaches. Important paradigm shifts in the management of high-risk melanoma patients are upon us.