ArticlePDF Available

Checking the Short-Term and Long-Term Hazard Ratio Model for Survival Data

Authors:

Abstract and Figures

The short-term and long-term hazard ratio model includes the proportional hazards model and the proportional odds model as submodels, and allows a wider range of hazard ratio patterns compared with some of the more traditional models. We propose two omnibus tests for checking this model, based, respectively, on the martingale residuals and the contrast between the non-parametric and model-based estimators of the survival function. These tests are shown to be consistent against any departure from the model. The empirical behaviours of the tests are studied in simulations, and the tests are illustrated with some real data examples.
Content may be subject to copyright.
Scandinavian Journal of Statistics, Vol. 39: 554–567, 2012
doi: 10.1111/j.1467-9469.2012.00804.x
©2012 Board of the Foundation of the Scandinavian Journal of Statistics. Published by Wiley Publishing Ltd.
Checking the Short-Term and Long-Term
Hazard Ratio Model for Survival Data
SONG YANG
Office of Biostatistics Research, National Heart, Lung and Blood Institute
YICHUAN ZHAO
Department of Mathematics and Statistics, Georgia State University
ABSTRACT. The short-term and long-term hazard ratio model includes the proportional hazards
model and the proportional odds model as submodels, and allows a wider range of hazard ratio
patterns compared with some of the more traditional models. We propose two omnibus tests for
checking this model, based, respectively, on the martingale residuals and the contrast between the
non-parametric and model-based estimators of the survival function. These tests are shown to be
consistent against any departure from the model. The empirical behaviours of the tests are studied
in simulations, and the tests are illustrated with some real data examples.
Key words: censoring, goodness of fit, martingale residuals, model checking, omnibus test,
survival data
1. Introduction
In clinical trials life testing and reliability studies, the problem of comparing two groups
of data is often encountered. Usually a summary measure is used to capture the difference
between the two groups, and the baseline distribution is left unspecified. For survival data,
the Cox proportional hazards model (Cox, 1972) is the most widely used and its parameter
has an appealing interpretation as the hazard ratio, or the relative risk, for the two groups.
The proportional hazards model and the derived hazard ratio estimate provide good approxi-
mations in many situations when the hazards of the two groups are nearly proportional.
When the assumption of a constant hazard ratio is in doubt, common alternatives include
the accelerated failure time model (Kalbfleisch & Prentice, 2002) and the proportional odds
model (Bennett, 1983). The proportional hazards model and the proportional odds model
also belong to the class of transformation models (Bickel et al., 1993). These alternative
models have been studied extensively in the literature, in works such as Ying (1993), Cheng
et al. (1995), Murphy et al. (1997), Yang & Prentice (1999) and Chen et al. (2002), among
others. In addition to these more established models, other semiparametric models have
also been proposed (e.g. Tsodikov, 2002; Chen & Cheng, 2006; Zeng & Lin, 2007).
Yang & Prentice (2005) proposed a model which includes the proportional hazards model
and the proportional odds model as submodels. Assume absolutely continuous failure time
distributions and label the two groups control and treatment, with cumulative hazard func-
tions C(t) and T(t), and hazard rate functions C(t) and T(t), respectively. The model of
Yang & Prentice (2005) postulates that,
T(t)=12
1+(21)SC(t)C(t), (1)
almost everywhere for t<0, where SCis the survivor function of the control group, 1,2>0
and 0is the upper boundary of the support of the control distribution: 0=sup{x:
x
0C(t)dt<∞}.Under this model, 1,2can be interpreted as the short-term and long-term
Scand J Statist 39 Checking the hazard ratio model 555
hazard ratio, respectively, and various patterns of the hazard ratio can be realized, such as
proportional hazards, no initial effect, disappearing effect or crossing hazards. The survivor
functions may also cross, a phenomenon not possible under the linear transformation models
and the accelerated failure time model.
Note that this model has an asymmetric flavour in that interchange of the control and
treatment groups will not result in a model of the same form. Also, in clinical applications,
heavy censoring is often present and there may be no data available near 0. Thus in applica-
tions one is interested in using model (1) for tin [0, ], where <0, a range of interest with
adequate data available.
In this article, we propose two omnibus tests for checking model (1), based, respectively,
on the martingale residuals and the contrast between the non-parametric and model-based
estimators of the survival function. In the literature, the martingale residuals have often been
used to detect departures from the assumed model (Therneau et al., 1990; Lin et al., 1993).
Lin et al. (1993) studied various partial-sum processes of the martingale residuals, and used
simulated Gaussian processes to approximate the distributions of those processes under the
proportional hazards regression model. The martingale residual-based test proposed here is
related to the test for the proportional hazards regression model in Lin et al. (1993), but
different in that we will consider integrals of the martingale residuals.
While the martingale residual-based test and related graphical methods have been shown in
the literature to be extremely useful for the Cox proportional regression model, the
partial-sum processes of martingale residuals themselves do not have a simple interpre-
tation. The contrast-based test uses the non-parametric survival function estimate, and has a
very simple and direct interpretation as the difference between the estimated survival proba-
bilities. Under the null hypothesis that model (1) holds, the distribution of both the martin-
gale residual-based test and the contrast-based test can be approximated through simulations.
We will show that both tests are consistent against any departure from model (1). Various
numerical simulations show that the proposed tests have good empirical size and power. The
proposed procedures will be illustrated in real data examples.
We organize the article as follows: In section 2, distributional results for the stochastic pro-
cesses used for the tests are established. Then the p-values of the tests are studied and the
consistency of the tests against departure from the model is established. In section 3, simula-
tion results and data examples are presented. Some concluding remarks are given in section 4.
Proofs of the asymptotic results are placed in appendix S1.
2. Distributional results
Let T1,...,Tnbe the pooled lifetimes of the two groups, and C1,...,Cnbe the corresponding
censoring variables. Let the sample sizes of the two groups be n1and n2, respectively, and
arrange the indices such that T1,...,Tn1,n1<n, constitute the control group. Let Zi=I(i>n1),
i=1, ...,n, where I(·) is the indicator function. We assume that T1,...,Tn,C1,...,Cnare inde-
pendent. The available data are the triplets (Xi,i,Zi), i=1, …, n, where Xi=TiCiand
i=I(TiCi), with denoting the minimum.
To simplify the presentation, throughout this article, we will assume the following condi-
tions.
Condition 1. lim n1/n =(0, 1).
Condition 2. The data range of interest is [0, ], where <0. The survivor function Giof
the censoring variable Cigiven Ziis continuous and satisfies in1Gi(t)/n1C(t),
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
556 S. Yang and Y. Zhao Scand J Statist 39
i>n1Gi(t)/n2T(t), uniformly in t, for some functions C(t), T(t) satisfy-
ing C()T()>0.
Condition 3. The survivor functions SCand STof the two comparison groups are absolutely
continuous, and SC()ST()>0.
Define R(t)=1/SC(t)1, t, the odds function of the control group. Let =(1,2)T,
where T denotes transpose and 1=log 1,2=log2.With this parameterization, we con-
sider the model
i(t)=1
1i()+2i()R(t)
dR(t)
dt,i=1, ...,n, (2)
almost everywhere on t[0,], where i(t) is the hazard function for Tigiven Zi, and ji (b)=
exp(bjZi), j=1, 2, for b=(b1,b2)T.
To develop the tests for model (2), we need an estimator for .Although it is possible to
adopt the pseudo-likelihood approach in Yang & Prentice (2005), here we will use a simpler
estimator. If Rwere known, the log-likelihood function of model (2) would be proportional to
n
i=1
[iln{1i(b)+2i(b)R(Xi)}+ln{1+2i(b)R(Xi)/1i(b)}/2i(b)].
This motivates the following estimator for . Let ˆ
SCbe the Kaplan–Meier estimator of SC
and ˆ
R=1/ˆ
SC1 (Kaplan & Meier, 1958). Throughout this article, c/d is defined to be zero
when d=0. Let j(b)=exp(bj), j=1, 2.For satisfying conditions 2 and 3, define
p(b)=
i>n1I(Xi)iln 1(b)+2(b)ˆ
R(Xi)+ln 1+2(b)ˆ
R(Xi)/1(b)/2(b).
Then we will use the minimizer ˆ
of p(b) to estimate . Equivalently, ˆ
is the zero of the
gradient p(b)ofp(b). Define
A(b)=
0
ln{1(b)+2(b)R(t)}T(t)ST(t)dT(t)+
0
{1+R(t)}T(t)ST(t)
1(b)+2(b)R(t)dC(t).
It can be checked that, under model (2), A(b)iszeroat. We will also assume the
following condition.
Condition 4. The function A(b) has a unique minimum that occurs at the unique zero of
A.
Let Mi(t)=iI(Xit)t
0I(Xis)/{1i()+2i()R(s)}dR(s), the martingale associated
with the ith study subject, 1 in, where throughout the article we use the notation
u
l=(l,u].Now define the martingale residuals
ˆ
Mi(t)=iI(Xit)t
0
I(Xis)dˆ
R(s)
1i(ˆ
)+2i(ˆ
)ˆ
R(s),1in.
Note that ˆ
Mi(t) does not involve ˆ
for in1. It will be shown later that, under model
(2) and conditions 1–4, ˆ
is strongly consistent for . Thus, i>n1
ˆ
Mi(t) will be close to
i>n1Mi(t), and will fluctuate around zero under model (2). This leads us to define a mar-
tingale residual-based test that rejects model (2) if supt|i>n1t
0(s)dˆ
Mi(s)|/nis large,
where is a data-dependent function. The choice of will be discussed more later.
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 39 Checking the hazard ratio model 557
The asymptotic distribution of the test involves that of ˆ
. To describe these asymptotic
distributions, let
¯
M1(t)=1
n
in1
M1i(t), ¯
M2(t)=1
n
i>n1
M2i(t),
K1(t)=
in1
I(Xit), K2(t)=
i>n1
I(Xit),
N1(t)=
in1
iI(Xit), N2(t)=
i>n1
iI(Xit).
Define D(g,b)=1(b)+2(b)gfor g>0 and let
W(g,b)=1(b)
D(g,b),2(b)g
D(g,b)T
,(b)=−{H(˜p/n,b)}1,
where H(f,b) is the Hessian matrix of the function fat band
˜p(b)=
0
ln{1(b)+2(b)ˆ
R(t)}dN2(t)+
0
K2(t)dˆ
R(t)
1(b)+2(b)ˆ
R(t).
Define
WT(t)=W(ˆ
R(t), ),
WC(t)=K2(t)WT(t)
K1(t)D(R(t), )ˆ
SC(t)
+1
K1(t)
t
K2(s)WT(s)
D(R(s), )2()
D(ˆ
R(s), )ˆ
SC(s)1dˆ
R(s).
(3)
Let ˆ
WTbe the estimator of WTdefined by replacing and Rwith ˆ
and ˆ
R, respectively.
Similarly, define an estimator ˆ
WCof WC. We first establish the following results.
Theorem 1. Suppose that conditions 1–4 hold. Then, under model (2),
(i) ˆ
is strongly consistent for ;
(ii) n(ˆ
)has a limiting zero mean normal distribution. A strongly consistent estimator
of the limiting covariance matrix is given by
(ˆ
)
0
ˆ
WTˆ
WT
TdN2/n +
0
ˆ
WCˆ
WT
CdN1/n(ˆ
).
For the integrand in the martingale residual-based test, while many choices are possible,
based on a trial-and-error process of simulation studies and real data applications, we will
work with the choice (t)=1(t)2(t), where
1(t)=K1(t)
K1(t)+K2(t)and 2(t)=1+4{K1(t)+K2(t)}
n1K1(t)+K2(t)
n.
The factor 1(t) is used to help stabilize the integral near the upper tail. The function 2(t)
assigns weights between 1 and 2 to all data points, with more weight in the central region
and less weight towards the boundaries of the data range.
Let
Un(t)=t
0
(s)d ¯
M2(s)t
0
(s)d ¯
M1(s)t
0
(s)Tdˆ
R(s)()Q, (4)
where
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
558 S. Yang and Y. Zhao Scand J Statist 39
(s)=(s)K2(s)
K1(s)D(R(s), )ˆ
SC(s)1
K1(s)t
s
(x)K2(x)
D(R(x), )2(ˆ
)
D(ˆ
R(x), ˆ
)ˆ
SC(x)1dˆ
R(x)
and
(s)=(s)K2(s)
nD(ˆ
R(s), ˆ
)W(R(s), ).
Let ˆbe the estimator of defined by replacing and Rwith ˆ
and ˆ
R, respectively, and
define the estimator ˆ
of analogously. For the asymptotic distribution of the martingale
residual-based statistical test, we have the following result.
Theorem 2. Suppose that conditions 1–4 hold. Then, under model (2), the process i>n1t
0(s)×
dˆ
Mi(s)/n,0t,is asymptotically equivalent to the process Un(t), 0 t,defined in (4),
which converges weakly to a zero mean Gaussian process. A strongly consistent estimator of the
covariance process of the limiting Gaussian process is given by
ˆ1(s,t)=N2(s)/n +s
0
ˆ2(x)dN1(x)/n
+s
0
ˆ
(x)Tdˆ
R(x)(ˆ
)
0
ˆ
WTˆ
WT
TdN2/n +
0
ˆ
WCˆ
WT
CdN1/n
(ˆ
)t
0
ˆ
(x)d ˆ
R(x)
s
0
ˆ
(x)Tdˆ
R(x)(ˆ
)t
0
ˆ
WTdN2/n t
0
ˆˆ
WCdN1/n
t
0
ˆ
(x)Tdˆ
R(x)(ˆ
)s
0
ˆ
WTdN2/n s
0
ˆˆ
WCdN1/n,0st.
In the literature, the martingale residuals have been very useful for checking the Cox pro-
portional regression model. However, the partial-sum processes of martingale residuals them-
selves do not have a simple interpretation. Alternative to the martingale residual-based test,
we can also obtain a test by contrasting the non-parametric and model-based estimators for
the survivor function STof the treatment group. Define ˆ
ST=exp(ˆ
T), where ˆ
Tis the
Nelson–Aalen estimator for the cumulative hazard function T(Nelson, 1969; Aalen, 1975).
Note that under model (2), we have ST(t)={1()+2()R(t)}1/2().From this we can define
˜
ST(t)={1(ˆ
)+2(ˆ
)ˆ
R(t)}1/2(ˆ
),
to be a model-based estimator for ST. Intuitively, model (2) holds if the model-based survival
estimator is close to the non-parametric estimator. This leads us to define a contrast-based
test using ˆ
ST(t)˜
ST(t), the difference between the estimated survival probabilities. Note that
the Kaplan–Meier estimator could also be used in the contrast-based test. In various simu-
lations, ˆ
STresults in a better performance for small samples, hence it is used in defining the
test instead of the Kaplan–Meier estimator.
Let
Vn(t)=ˆ
ST(t)t
0
nd¯
M2(s)
K2(s)+(t)t
0
nd¯
M1(s)
K1(s)+˜
ST(t)B(t)T(ˆ
)Q, (5)
where
(t)=
˜
ST(t)
D(ˆ
R(t), ˆ
)ˆ
SC(t),B(t)=ˆ
R(t)
D(ˆ
R(t), ˆ
),ln( ˜
ST(t)) ˆ
R(t)
D(ˆ
R(t), ˆ
)T
.
The following result establishes the weak convergence of n{ˆ
ST(t)˜
ST(t)}on t[0, ].
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 39 Checking the hazard ratio model 559
Theorem 3. Suppose that conditions 1–4 are satisfied. Then, under model (2), the process n
{ˆ
ST(t)˜
ST(t)},0t, is asymptotically equivalent to the process Vn,0t, defined in
(5), which converges weakly to a zero mean Gaussian process. A strongly consistent estimator
of the covariance process of the limiting Gaussian process is given by
ˆ2(s,t)=˜
ST(s)˜
ST(t)B(s)T(ˆ
)
0
ˆ
WT(x)ˆ
WT
T(x)dN2(x)/n
+
0
ˆ
WC(x)ˆ
WT
C(x)dN1(x)/n.(ˆ
)B(t)
+ˆ
ST(s)ˆ
ST(t)s
0
n
K2
2(x)dN2(x)+(s)(t)s
0
n
K2
1(x)dN1(x)
ˆ
ST(t)˜
ST(s)B(s)T(ˆ
)t
0
ˆ
WT(x)
K2(x)dN2(x)ˆ
ST(s)˜
ST(t)B(t)T(ˆ
)s
0
ˆ
WT(x)
K2(x)dN2(x)
+(t)˜
ST(s)B(s)T(ˆ
)t
0
ˆ
WC(x)
K1(x)dN1(x)
+(s)˜
ST(t)B(t)T(ˆ
)s
0
ˆ
WC(x)
K1(x)dN1(x), 0 st.(6)
With the asymptotic results above, we define a contrast-based test that rejects model (2)
if nsupt2(t)|ˆ
ST(t)˜
ST(t)|/ˆ2(t,t) is large. Note that we use the standardized process
in defining the test. Furthermore, the weight function 2(t) is used to moderate the influence
of data points near the boundaries of the data range. Use of the standardized process and
the weight function is based on various numerical studies that indicate better performance
of this setup.
The p-values of the tests are difficult to obtain analytically. The bootstrap method provides
a practical alternative. It is, however, very time-consuming. The normal resampling approxi-
mation method of Lin et al. (1993) reduces computing time significantly, and has become a
standard method. We will modify the method for our problem here.
Let i,i=1, ...,n, be independent variables that are also independent from the data. For
t, define the process
ˆ
U(t)=1
n
i>n1t
0
d(iNi)
in1t
0
ˆd(iNi)
t
0
ˆ
dˆ
R(ˆ
)
i>n1
0
ˆ
WTd(iNi)+
in1
0
ˆ
WCd(iNi)
=1
n
i>n1
XiiiI(Xit)
in1
iiˆ(Xi)I(Xit)
t
0
ˆ
dˆ
R(ˆ
)
in1
iiˆ
WC(Xi)I(Xi)+
i>n1
iiˆ
WT(Xi)I(Xi)
,(7)
and
ˆ
V(t)=1
n
ˆ
ST(t)
i>n1
ii
n
K2(Xi)I(Xit)+(t)
in1
ii
n
K1(Xi)I(Xit)
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
560 S. Yang and Y. Zhao Scand J Statist 39
+˜
ST(t)B(t)T(ˆ
)
in1
iiˆ
WC(Xi)I(Xi)+
i>n1
iiˆ
WT(Xi)I(Xi)
.(8)
In the approach of Lin et al. (1993), the ivalues are the standard normal variables. The
standard normal variables sometimes result in inflated empirical size in various simulation
studies for our problem here. Thus we need to make some adjustment. Specifically we will
choose i,i=1, ...,nto be independent normal variables that are independent of the data,
with mean zero and variance c2
nsuch that supncn<and cn1. Conditional on the observed
data (Xi,i,Zi), i=1, ...,n, the processes ˆ
Uand ˆ
Vare sums of nindependent Gaussian pro-
cesses. It can be shown that, given the data, the processes ˆ
Uand ˆ
Vconverge weakly and
have the same limiting process as that of Unand V
n, respectively.
Let ro,cobe the observed values of
sup
t[0,]!!!
in1t
0
(s)d ˆ
Mi(s)!!!"nand sup
t[0, ]
n{2(t)|ˆ
ST(t)˜
ST(t)|/ˆ2(t,t)},
respectively. The p-values
P#sup
t[0,]!!!
in1t
0
(s)d ˆ
Mi(s)!!!"n>ro$,P[ sup
t[0, ]
n{2(t)|ˆ
ST(t)˜
ST(t)|/ˆ2(t,t)}>co]
can be approximated by
P[ sup
t[0,]|ˆ
U(t)|>ro], P[ sup
t[0, ]{2(t)|ˆ
V(t)|/ˆ2(t,t)}>co],
respectively, which in turn can be approximated by simulating the conditional distributions
given data a large number of times. We have the following result for the consistency of the
tests.
Theorem 4. Suppose conditions 1–4 are satisfied. Then the martingale residual-based test is
consistent against a general departure from model (2) on t [0, ]. The contrast-based test with
the standardized process is consistent against a general departure from model (2) on t [0, ],
except for the degenerate case where 2(t,t)is zero for some t [0, ].
3. Simulation studies and examples
3.1. Simulation studies
To fine-tune the tests and to evaluate their performance, we have conducted various simu-
lation studies. As mentioned before, because of better performance for small samples, ˆ
STis
used in defining the contrast-based test instead of the Kaplan–Meier estimator. Note that the
zero of ˜pprovides an alternative estimator for . It was found that the estimator ˆ
gener-
ally results in a better performance than this alternative estimator. For the ivariables in (7),
we can take cn1. That is, no modification of the original normal approximation method is
needed. For the ivariables in (8), the choice cn=1+1/nworks well.
Regarding the choice of , we found that in computing ˆ
, we can take =maxiXito include
all data. After ˆ
is obtained, in simulating the conditional distributions of ˆ
U,ˆ
V, we can
take =(maxiXiZi)(maxiXi(1 Zi)). Note that our presentation and proofs work with the
situation where all processes are restricted to [0, ]forin condition 2.
Next, we report the results from some representative simulation studies. All numerical com-
putations were done in Matlab. To evaluate the empirical size of the tests, we first generate
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 39 Checking the hazard ratio model 561
data under the model of Yang & Prentice (2005). Lifetime variables were generated with R(t)
being the identity function. The values of were (log(0.9), log(1.2))Tand (log(1.2), log(0.8))T,
representing a one-third increase or decrease from the initial hazard ratio, respectively. We
will refer these two cases as model I and model II, respectively. For the empirical power
evaluation, lifetime variables were generated with the standard exponential distribution for
the control group, and with
T(t)
C(t)=a,t(0, 0.5) (1.5, )
=1/a,t[0.5, 1.5],
for a=3 and a=0.5. These two cases will be referred to as model III and model IV, respec-
tively. Notice that model III gives a U-shaped hazard ratio while Model IV gives an upside
down U-shaped hazard ratio, as opposed to a monotone hazard ratio implied by the model
of Yang & Prentice (2005). The censoring variables were independent and identically distr-
ibuted with the log-normal distribution, where the normal distribution had mean cand stan-
dard deviation 0.5, with cchosen to achieve various censoring rates. The empirical size and
power were obtained from 1000 repetitions, and for each repetition, the critical values of the
tests were calculated empirically from 1000 realizations of relevant conditional distributions.
The results of these simulations are summarized in Tables 1 and 2.
From Table 1, both tests in general have the correct size. When the censoring is light
or moderate, the tests are generally conservative. On the other hand, when the censoring
is heavy, both tests may have inflated size for small sample size. This may be expected as
there is very little information available. For example, with n1=n2=80 and the censoring at
75 per cent, only about 40 of the total 160 data points are not censored. As the sample size
increases, the size generally improves. The multiples used in ivariables could possibly be cho-
sen to even control the size for small sample size with heavy censoring, but that may come
at the expense of making the tests more conservative at light or moderate censoring levels.
From Table 2, under light or moderate censoring, generally the martingale residual-based
test has some advantage for model III while the contrast-based test for model IV. However,
we see almost a reversal in performance when the censoring is heavy. These behaviours indi-
cate that the two tests may possibly be powerful under different classes of alternatives. Also,
both tests have reduced power when there is heavy censoring. One possible reason is that, due
Tabl e 1. Empirical size of the lack-of-fit tests for model (2), at various sample sizes and censoring levels.
=(log(0.9), log(1.2))Tfor model I and =(log(1.2), log(0.8))Tfor model II. The results were based on
1000 repetitions. For each repetition, the critical values of the tests were calculated from 1000 realizations
of relevant conditional distributions
Censoring rate
Model I Model II
Test 10% 30% 50% 75% 10% 30% 50% 75%
n1=n2=40
Residual 0.0210 0.0340 0.0310 0.0460 0.0290 0.0380 0.0330 0.0690
Contrast 0.0220 0.0270 0.0410 0.0470 0.0300 0.0310 0.0370 0.0710
n1=n2=80
Residual 0.0170 0.0170 0.0300 0.0450 0.0260 0.0330 0.0290 0.07300
Contrast 0.0190 0.0180 0.0320 0.0430 0.0220 0.0200 0.0220 0.05600
n1=n2=160
Residual 0.0310 0.0380 0.0360 0.0520 0.0510 0.0400 0.0470 0.0540
Contrast 0.0260 0.0290 0.0420 0.0380 0.0290 0.0240 0.0300 0.0400
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
562 S. Yang and Y. Zhao Scand J Statist 39
Tabl e 2. Empirical power of the lack-of-fit tests for model (2), at various sample sizes and censoring levels,
at model III with a U-shaped hazard ratio, and model IV with an upside-down U-shaped hazard ratio. The
results were based on 1000 repetitions. For each repetition, the critical values of the tests were calculated
from 1000 realizations of relevant conditional distributions
Censoring rate
Model III Model IV
Test 10% 30% 50% 75% 10% 30% 50% 75%
n1=n2=40
Residual 0.2810 0.3520 0.2330 0.2290 0.2630 0.2610 0.2780 0.1680
Contrast 0.1160 0.1590 0.2260 0.2880 0.5410 0.4530 0.3240 0.1580
n1=n2=80
Residual 0.4390 0.5090 0.2660 0.3440 0.6220 0.6230 0.6260 0.2090
Contrast 0.2190 0.2720 0.2560 0.3450 0.9100 0.8820 0.7010 0.1600
n1=n2=160
Residual 0.7530 0.7920 0.3010 0.3830 0.9320 0.9530 0.9390 0.3650
Contrast 0.7600 0.6410 0.2690 0.3900 0.9960 0.9940 0.9590 0.3210
to heavy censoring, the hazard ratio in the data range is likely monotone instead of U-shaped
or upside down U-shaped, resulting in the power reduction. For example, for the last case
with n1=n2=160, the average of maxiXifrom the simulated data sets, at the four censoring
levels, is 3.89, 2.31, 1.33 and 0.56 for model III, and 4.65, 2.87, 1.76 and 0.82 for model IV.
Note that the U-shape or upside down U-shape is realized over a range containing [0, 1.5].
With heavy censoring, often there may not be enough data in that range. This also gives a
possible reason for the power being higher at 75 per cent censoring than at 50 per cent censor-
ing under model III. At 75 per cent censoring, given the average 0.56 of maxiXi, the available
data are mostly in the [0, 0.5] range where the hazard ratio is constant. This simple situation
possibly yields a better result compared with the situation at 50 per cent censoring, where the
available data are in a range over which the hazard ratio has a rapid descent from 3 to 0.5.
3.2. The Women’s Health Initiative trial
The Women’s Health Initiative (WHI) randomized controlled trial of combined post-
menopausal hormone therapy reported an elevated coronary heart disease risk and overall
unfavourable health benefits versus risks over a 5.6-year study period (Writing Group for the
Women’s Health Initiative Investigators, 2002; Manson et al., 2003). Here we look at the time
to coronary heart disease in the WHI clinical trial, which included 16,608 postmenopausal
women initially in the age range of 50–79 with uterus. The trial has two arms. The placebo
arm has sample size 8102, and the estrogen plus progestin arm has sample size 8506. About
98 per cent of the data are censored, primarily by the trial monitoring time. For this data set,
there was strong evidence that the hazards were non-proportional. In Prentice et al. (2005),
a time axis partition was used and the relative risk estimate was obtained separately over the
intervals 0 2, 2 5 and >5 years. Fitting model (2) to this data set with the placebo group
being the control group, we get ˆ
=(0.636, 3.601)T. The martingale residual-based test has
the p-value of 0.438, and the contrast-based test has the p-value of 0.427.Thus both tests
indicate a good fit of the model. Figure 1 gives the plots of the non-parametric survival curve
ˆ
STand the model-based survival curve ˜
STfor the treatment group, or the estrogen plus pro-
gestin group. We see that the model-based and non-parametric survival curves are very close
to each other.
Now if we switch the group label and use the estrogen plus progestin group as the con-
trol group, then both tests result in p-values less than 0.01, discrediting the model. Figure 2
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 39 Checking the hazard ratio model 563
0 500 1000 1500 2000 2500 3000 3500
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
Days
% Survival
Fig. 1. Estimated survival curves for the WHI data when fitting model (2) using the placebo group as
the control group: solid line Kaplan–Meier curve for the estrogen plus progestin group; dashed line
model-based estimator for the estrogen plus progestin group.
0 500 1000 1500 2000 2500 3000 3500
0.965
0.97
0.975
0.98
0.985
0.99
0.995
1
1.005
Days
% Survival
Fig. 2. Estimated survival curves for the WHI data when fitting model (2) using estrogen plus proges-
tin group as the control group: Solid line Kaplan–Meier curve for the placebo group; dashed line
model-based estimator for the placebo group.
gives the plots of ˆ
STand ˜
STfor the placebo arm, which is the ‘treatment group’ under the
assumed model. We see that in this way, compared with the model using the placebo arm as
the control group, there is some noticeable gap between the model-based survival curve and
non-parametric survival curve in the middle of the data range. One possible reason may be
that the data yield a Kaplan–Meier curve for the placebo group which behaves very differ-
ently near the end of the data range, with jumps considerably larger than at early or middle
time points. In comparison, for the estrogen plus progestin group, as displayed in Fig. 1, the
Kaplan–Meier curve behaves more or less linearly throughout, with jumps less dramatic near
the end of the data range.
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
564 S. Yang and Y. Zhao Scand J Statist 39
3.3. The gastrointestinal tumour study
Next, we look at an example where the hazards are highly non-proportional in that both the
hazard functions and the survivor functions cross. The Gastrointestinal Tumour Study Group
(1982) reported the results of a trial that compared chemotherapy with combined chemo-
therapy and radiation therapy, in the treatment of locally unresectable gastric cancer. There
were 45 patients on each treatment. Two observations in the chemotherapy group and six
in the combination group were censored. Kaplan–Meier plots of the two estimated survival
curves cross at around 1000 days (see p. 10 of Yang & Prentice, 2005). If we let the chemo-
therapy group be the control group and fit model (2) to the data, then we have ˆ
=(1.714,
0.981)T. The martingale residual-based test has the p-value of 0.104, and the contrast-based
test has the p-value of 0.595. Thus the martingale residual-based test signals some degree of
lack of fit, while the contrast-based test indicates a good fit. Figure 3 gives the plots of ˆ
ST
and ˜
STfor the treatment group, with combined chemotherapy and radiation therapy. We see
that the model-based survival curve and the non-parametric survival curve are mostly close
to each other, except for a very small region around 230 days, where both the model-based
survival curve and the non-parametric survival curve descend rapidly and have their largest
discrepancy. This discrepancy affects the martingale residual-based test more than the con-
trast-based test. Plots of i>n1t
0(s)dˆ
Mi(s)/nand a few realizations of the process ˆ
U(t)
given the data, shown in appendix S2, suggest that the low p-value of the martingale resid-
ual-based test is mainly caused by behaviours of the processes on that small region.
Now if we let the combined chemotherapy and radiation therapy group be the control
group and fit model (2) to the data, then the martingale residual-based test has the p-value
of 0.634, while the contrast-based test has the p-value of 0.354. Figure 4 gives the plots of
ˆ
STand ˜
STfor the chemotherapy group, which is the ‘treatment group’ under the assumed
model. We see that, compared with the scenario in Fig. 3 before, the difference between the
model-based survival curve and non-parametric survival curve is more pronounced over most
of the data range. Here the model fit seems worse compared with that in Fig. 3, but the
0 500 1000 1500 2000 2500 3000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Days
% Survival
Fig. 3. Estimated survival curves for the gastric data when fitting model (2) using the chemo group
as the control group: solid line Kaplan–Meier curve for the combination group; dashed line
model-based estimator for the combination group.
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 39 Checking the hazard ratio model 565
0 500 1000 1500 2000 2500 3000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Days
% Survival
Fig. 4. Estimated survival curves for the gastric data when fitting model (2) using the combination group
as the control group: solid line Kaplan–Meier curve for the chemo group; dashed line model-based
estimator for the chemo group.
p-value of the martingale residual-based test is less extreme. Plots of i>n1t
0(s)d ˆ
Mi(s)/n
and a few realizations of the process ˆ
U(t) given the data suggest that, for tup to about 700
days, the variance process of ˆ
U(t) is generally larger than for the case in Fig. 3, helping to
avoidalowp-value for the martingale residual-based test. These behaviours may plausibly
be due to the high non-proportionality of the two groups combined with a moderate sample
size.
4. Discussion
We have investigated the tests using the supremum norm of the relevant processes. Alterna-
tively, Kolmogorov–Smirnov type tests can be considered using the integrated absolute value
of the processes. Such integrated-type of test statistics are related to the Cramer–von Mises
tests. Also, contrast between the Nelson–Aalen estimator of the cumulative hazard and the
model-based estimator can be used instead of the contrast between the Kaplan–Meier esti-
mator and the model-based estimator. In some preliminary simulation studies, those tests did
not provide improvement over the tests we focused on here, and thus are omitted.
The model of Yang & Prentice (2005) implies that the hazard ratio is monotone. While this
covers many practical situations, other patterns of the hazard ratio are possible. When the
tests indicate a lack of fit for the model of Yang & Prentice (2005), it is possible to remedy
the situation by considering larger classes of semiparametric models to incorporate an even
wider range of hazard ratio patterns. Also, in addition to the two-sample case considered
here, adjustment for covariates may be considered.
Acknowledgements
The authors thank the editors and reviewers for their constructive comments and sugges-
tions that have led to an improved manuscript. The research of Yichuan Zhao was partially
supported by NSA grant #H98230-12-1-0209.
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
566 S. Yang and Y. Zhao Scand J Statist 39
Supporting Information
Additional Supporting Information may be found in the online version of this article:
Appendix S1. Proofs of asymptotic results.
Appendix S2. Additional plots for the gastrointestinal tumor study.
Please note: Wiley-Blackwell are not responsible for the content or functionality of any
supporting materials supplied by the authors. Any queries (other than missing material) should
be directed to the corresponding author for the article.
References
Aalen, O. O. (1975). Statistical inference for a family of counting processes. PhD thesis, University of
California, Berkeley.
Bennett, S. (1983). Analysis of survival data by the proportional odds model. Statist. Med. 2,
273–277.
Bickel, P. J., Klaassen, C. A. J., Ritov, Y. & Wellner, J. A. (1993). Efficient and adaptive estimation for
semiparametric models. Johns Hopkins University Press, Baltimore.
Chen, Y. Q. & Cheng, S. (2006). Linear life expectancy regression with censored data. Biometrika 93,
303–313.
Chen, K., Jin, Z. & Ying, Z. (2002). Semiparametric analysis of transformation models with censored
data. Biometrika 89, 659–668.
Cheng, S. C., Wei, L. J. & Ying, Z. (1995). Analysis of transformation models with censored data. Bio-
metrika 82, 835–845.
Cox, D. R. (1972). Regression models and life-tables (with discussion). J. Roy. Statist. Soc. B 34,
187–220.
Gastrointestinal Tumor Study Group: Schein, P. D., Bruckner, H. W., Douglass, H. O., Mayer, R. et al.
(1982). A comparison of combination chemotherapy and combined modality therapy for locally
advanced gastric carcinoma. Cancer 49, 1771–1777.
Kalbfleisch, J. D. & Prentice, R. L. (2002), The statistical analysis of failure time data, 2nd edn. Wiley,
Hoboken, NJ.
Kaplan, E. & Meier, P. (1958). Nonparametric estimation from incomplete observations. J. Amer. Statist.
Assoc. 53, 457–481.
Lin, D. Y., Wei, L. J. & Ying, Z. (1993). Checking the Cox model with cumulative sums of martingale-
based residuals. Biometrika 80, 557–572.
Manson, J. E., Hsia, J., Johnson, K. C., Rossouw, J. E., Assaf, A. R., Lasser, N. L., Trevisan, M., Black,
H. R., Heckbert, S. R., Detrano, R., Strickland, O. L.,Wong, N. D., Crouse, J. R., Stein, E. & Cush-
man, M., for the Women’s Health Initiative Investigators (2003). Estrogen plus progestin and the risk
of coronary heart disease. New Engl. J. Med.349, 523–534.
Murphy, S. A., Rossini, A. J. & Van der Vaart, A. W. (1997). Maximal likelihood estimate in the pro-
portional odds model. J. Amer. Statist. Assoc. 92, 968–976.
Nelson, W. (1969). Hazard plotting for incomplete failure data. J. Qual. Technol. 1, 27–52.
Prentice, R. L., Langer, R., Stefanick, M. L., Howard, B. V., Pettinger, M., Anderson, G., Barad,
D., Curb, J. D., Kotchen, J., Kuller, L., Limacher, M. & Wactawski-Wende, J., for the Women’s
Health Initiative Investigators (2005). Combined postmenopausal hormone therapy and cardiovascular
disease: toward resolving the discrepancy between observational studies and the women’s health initia-
tive clinical trial. Am. J. Epi. 162, 404–414.
Therneau, T. M., Grambsch, P. M. & Fleming, T. R. (1990). Martingale-based residuals for survival
models. Biometrika 77, 147–160.
Tsodikov, A. (2002). Semi-parametric models of long- and short-term survival: an application to the
analysis of breast cancer survival in Utah by age and state. Statist. Med. 21, 895–920.
Writing Group for the Women’s Health Initiative Investigators (2002). Risks and benefits of estrogen
plus progestin in healthy postmenopausal women: principal results from the women’s health initiative
randomized controlled trial. J. Amer. Med. Assoc.288, 321–333.
Yang, S. & Prentice, R. L. (1999). Semiparametric inference in the proportional odds regression model.
J. Amer. Statist. Assoc. 94, 125–136.
Yang, S. & Prentice, R. L. (2005). Semiparametric analysis of short-term and long-term hazard ratios
with two-sample survival data. Biometrika 92, 1–17.
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 39 Checking the hazard ratio model 567
Ying, Z. (1993). A large sample study of rank estimation for censored regression data. Ann. Statist. 21,
76–99.
Zeng, D. & Lin, D. Y. (2007). Maximum likelihood estimation in semiparametric regression models with
censored data. J. Roy. Statist. Soc. B 69, 507–564.
Received January 2010, in final form March 2012
Song Yang, PhD, Office of Biostatistics Research, Division of Cardiovascular Sciences, National Heart,
Lung and Blood Institute, NIH, DHHS, 6701 Rockledge Dr. MSC 7913, Bethesda, MD 20892, USA.
E-mail: yangso@nhlbi.nih.gov
©2012 Board of the Foundation of the Scandinavian Journal of Statistics.
... It is well-known that one drawback of the regression models mentioned above is that they cannot accommodate the situation where hazard functions cross, which may occur sometimes. To address this, in the following, we will consider the short-term and long-term hazard ratio model, for which some methods have been developed for parameter estimation for the situations without length bias (Tong et al., 2007;Yang and Prentice, 2005;Yang, 2011;Yang and Zhao, 2012;Yang and Prentice, 2015). In addition to allow crossing hazard functions, another appealing feature of this class of models is that they include many commonly used models such as the proportional hazards model as special cases. ...
... On the other hand, it is clear that this would be difficult and computationally intensive. To deal with this, by following Yang and Zhao (2012), we propose to estimate ...
... In this appendix, we will sketch the proofs of the asymptotic results given above. For this, we will first describe the conditions needed, which are similar to those used in Yang and Prentice (2005) and Yang and Zhao (2012). ...
... The Yang-Prentice model is more general than the Cox PH model, since its hazard ratio function covers PH, crossing hazards, no initial effect and a disappearing effect. However, the Yang-Prentice model can only handle a single dichotomized covariate and not a continuous covariate (Yang and Prentice, 2005;Yang et al., 2012). The Yang and Prentice's adaptive weighted log-rank tests are shown to have an inflated type I error (Lin and Leon, 2017;Lin et al., 2020a). ...
... The paper by Yang and Zhao (Yang and Zhao, 2012) describes the Yang-Prentice model and two tests that check the goodness of fit of the Yang-Prentice model. One of the examples provided to illustrate these tests is a gastrointestinal cancer trial with 45 patients assigned to chemotherapy alone and another 45 patients assigned to chemotherapy and radiation therapy for the treatment of locally unresectable gastric cancer. ...
Article
In trials of novel immuno-oncology drugs, the proportional hazards (PH) assumption often does not hold for the primary time-to-event (TTE) efficacy endpoint, likely due to the unique mechanism of action of these drugs. In practice, when it is anticipated that PH may not hold for the TTE endpoint with respect to treatment, the sample size is often still calculated under the PH assumption, and the hazard ratio (HR) from the Cox model is still reported as the primary measure of the treatment effect. Sensitivity analyses of the TTE data using methods that are suitable under non-proportional hazards (non-PH) are commonly pre-planned. In cases where a substantial deviation from the PH assumption is likely, we suggest designing the trial, calculating the sample size and analyzing the data, using a suitable method that accounts for non-PH, after gaining alignment with regulatory authorities. In this comprehensive review article, we describe methods to design a randomized oncology trial, calculate the sample size, analyze the trial data and obtain summary measures of the treatment effect in the presence of non-PH. For each method, we provide examples of its use from the recent oncology trials literature. We also summarize in the Appendix some methods to conduct sensitivity analyses for overall survival (OS) when patients in a randomized trial switch or cross-over to the other treatment arm after disease progression on the initial treatment arm, and obtain an adjusted or weighted HR for OS in the presence of cross-over. This is an example of the treatment itself changing at a specific point in time - this cross-over may lead to a non-PH pattern of diminishing treatment effect.
... They further proved the consistency and asymptotic normality of the estimates at a fixed time point. Yang et al. (2012) proposed two omnibus tests to evaluate the adequacy of the YP model. The first test is based on the martingale residuals and the second one examines the contrast between the non-parametric and model-based estimators of the survival function. ...
... All models were implemented and fitted using the R programming language (R Core Team, 2018). In terms of optimization, we emphasize that the BFGS method is applied through the function optim available in R. The semiparametric YP model in Yang and Prentice (2005) can be fitted through the R package YPmodel; see more details in Yang and Prentice (2010), Yang and Prentice (2011) and Yang et al. (2012). ...
Preprint
Proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used to deal with survival data in different fields of knowledge. Despite their popularity, such models are not suitable to handle survival data with crossing survival curves. Yang and Prentice (2005) proposed a semiparametric two-sample approach, denoted here as the YP model, allowing the analysis of crossing survival curves and including the PH and PO configurations as particular cases. In a general regression setting, the present work proposes a fully likelihood-based approach to fit the YP model. The main idea is to model the baseline hazard via the piecewise exponential (PE) distribution. The approach shares the flexibility of the semiparametric models and the tractability of the parametric representations. An extensive simulation study is developed to evaluate the performance of the proposed model. In addition, we demonstrate how useful is the new method through the analysis of survival times related to patients enrolled in a cancer clinical trial. The simulation results indicate that our model performs well for moderate sample sizes in the general regression setting. A superior performance is also observed with respect to the original YP model designed for the two-sample scenario.
... Also, the traditional log-rank test is not employed to compare whether two fitted survival curves differ significantly. The reason is that its power will reduce when the hazard ratio is not constant, especially when those ratios of two curves cross (Yang & Prentice, 2005Yang & Zhao, 2012). Instead, the adaptively weighed log-rank (AWL) test proposed in Yang and Prentice (2010) is employed, which is more robust when the hazard ratio is not constant. ...
Article
Full-text available
This study explores the changing patterns of the length of stay (LOS) at Australian residential aged care facilities during 2008–2018 and likely trends up to 2040. The expected LOS was estimated via the hazard function of exiting from such a facility and its heterogeneity by residents’ sociodemographic characteristics using an improved Cox regression model. Data were sourced from the Australian Institute of Health and Welfare. In-sample modelling results reveal that the estimated LOS differed by age (in general, shorter for older groups), marital status (longer for the widowed) and sex (longer for females). In addition, the estimated LOS increased slowly from 2008–2009 to 2016–2017 but declined steadily thereafter. Out-of-sample predictions suggest that the declining trend of the estimated LOS will continue until 2040 and that the longest LOS (approximately 37 months) will be observed among widowed females aged 50–79 years. Relative uncertainty measures are provided. The results portray the current changing landscape and the future trend of residential aged care use in Australia, which can inform the development of optimised residential aged care policies to support ageing Australians more effectively.
... Also, the traditional log-rank test is not employed to compare whether two fitted survival curves differ significantly. The reason is that its power will reduce when the hazard ratio is not constant, especially when those ratios of two curves cross [20][21][22]. Instead, the adaptively weighed log-rank (AWL) test proposed in [23] is employed, which is more robust when the hazard ratio is not constant. ...
Preprint
Full-text available
This paper analyzes the hazard functions of exiting from an aged care facility in Australia. Using a comprehensive dataset ranging over 2008--2018, we find that those functions are heterogeneous across the age, sex and year-of-leaving. The modelling results lead to in-sample estimated expected length of stay (LOS) for residents differed by age (in general, longer for older groups) and sex (longer for females). The estimated LOS declines gradually from 2008 to 2014 and then steadily increase afterwards for all heterogeneous age and sex groups. Out-of-sample predictions up to 2100 suggest that the longest LOS belongs to females aged 100 and older, with the estimated/predicted LOS increasing from 58.6 months in 2018 to 68.9 months in 2100. Relative uncertainty measures are also provided. Those results can be used to explore the nature of and aspects to improve service quality of Australian aged care facilities for policy makers.
... The adaptative weighted log-rank test (LRAD) gives a p value of 0.035 using the resampling method with 1 million repetitions. The residual based goodness-of-fit test gives a p−value of 0.10 while the contrast based goodness-of-fit results in a p−value of 0.60 [5]. Now, if we reverse the coding so that Z = 1 for chemotherapy alone while Z = 0 for chemotherapy and radiotherapy, then we find a p−value of 0.08 for the LRAD test, in contrast to the 0.035 found with the original coding, indicating no sig-nificant difference between the two randomized groups (Table). ...
... Datasets 4 and 5 consist of gene expression data obtained using Affymetrix and HG 1.ST microarrays, respectively, from patients with ovarian and oral cancer (Tothill et al., 2008;Saintigny et al., 2011 ). Raw data were pre-processed using standard methods for each dataset as described in the Supplementary Section 8.1, and relevant characteristics of these datasets are displayed in Tables 1 and 2. We describe a comprehensive analysis of these datasets using the PH, PO and YP models to test for the goodness-of-fit (GOF) of each model using the methods outlined in Grambsch and Therneau (1994), Martinussen and Scheike (2006) and Yang and Zhao (2012), respectively (see Supplementary Section 8.3, for details on this analysis and GOF tests). All analyses were performed at the genomic feature level, the purpose of which is to identify features that exhibit some form of NPH and to demonstrate the need for alternatives to the PH model. ...
Article
Motivation: One of the major goals in large-scale genomic studies is to identify genes with a prognostic impact on time-to-event outcomes which provide insight into the disease's process. With rapid developments in high-throughput genomic technologies in the past two decades, the scientific community is able to monitor the expression levels of tens of thousands of genes and proteins resulting in enormous data sets where the number of genomic features is far greater than the number of subjects. Methods based on univariate Cox regression are often used to select genomic features related to survival outcome; however, the Cox model assumes proportional hazards (PH), which is unlikely to hold for each feature. When applied to genomic features exhibiting some form of non-proportional hazards (NPH), these methods could lead to an under- or over-estimation of the effects. We propose a broad array of marginal screening techniques that aid in feature ranking and selection by accommodating various forms of NPH. First, we develop an approach based on Kullback-Leibler information divergence and the Yang-Prentice model that includes methods for the PH and proportional odds (PO) models as special cases. Next, we propose R2 measures for the PH and PO models that can be interpreted in terms of explained randomness. Lastly, we propose a generalized pseudo-R2 index that includes PH, PO, crossing hazards and crossing odds models as special cases and can be interpreted as the percentage of separability between subjects experiencing the event and not experiencing the event according to feature measurements. Results: We evaluate the performance of our measures using extensive simulation studies and publicly available data sets in cancer genomics. We demonstrate that the proposed methods successfully address the issue of NPH in genomic feature selection and outperform existing methods. Availability: R code for the proposed methods is available at github.com/lburns27/Feature-Selection. Supplementary information: Supplementary data are available at Bioinformatics online.
... The authors prove the consistency and asymptotic normality of estimates at a fixed time point. Yang et al. (2012) proposed two omnibus tests to verify how appropriate is the YP model for the data. The first test is based on the martingale residuals and the second one accounts for the contrast between the nonparametric and model-based estimators for the survival function. ...
Preprint
The proportional hazards (PH), proportional odds (PO) and accelerated failure time (AFT) models have been widely used in different applications of survival analysis. Despite their popularity, these models are not suitable to handle lifetime data with crossing survival curves. In 2005, Yang and Prentice proposed a semiparametric two-sample strategy (YP model), including the PH and PO frameworks as particular cases, to deal with this type of data. Assuming a general regression setting, the present paper proposes an unified approach to fit the YP model by employing Bernstein polynomials to manage the baseline hazard and odds under both the frequentist and Bayesian frameworks. The use of the Bernstein polynomials has some advantages: it allows for uniform approximation of the baseline distribution, it leads to closed-form expressions for all baseline functions, it simplifies the inference procedure, and the presence of a continuous survival function allows a more accurate estimation of the crossing survival time. Extensive simulation studies are carried out to evaluate the behavior of the models. The analysis of a clinical trial data set, related to non-small-cell lung cancer, is also developed as an illustration. Our findings indicate that assuming the usual PH model, ignoring the existing crossing survival feature in the real data, is a serious mistake with implications for those patients in the initial stage of treatment.
... Although tests of proportionality are non-significant (p = 0.0934 for the primary outcome and p = 0.9329 for all-cause mortality), the Kaplan-Meier curves are nearly indistinguishable at the beginning, indicating a slow starting treatment process. For the primary outcome, the two goodness-offit tests 9 for model (1) have p values of 0.44 and 0.59, respectively. Applying model (1) to the SPRINT primary outcome, we obtain an estimated hazard ratio function ℎ(t) that starts from 0.94 at t = 0 and steadily decreases to 0.57 near the end of the trial. ...
Article
Background/aims In clinical trials with time-to-event outcomes, usually the significance tests and confidence intervals are based on a proportional hazards model. Thus, the temporal pattern of the treatment effect is not directly considered. This could be problematic if the proportional hazards assumption is violated, as such violation could impact both interim and final estimates of the treatment effect. Methods We describe the application of inference procedures developed recently in the literature for time-to-event outcomes when the treatment effect may or may not be time-dependent. The inference procedures are based on a new model which contains the proportional hazards model as a sub-model. The temporal pattern of the treatment effect can then be expressed and displayed. The average hazard ratio is used as the summary measure of the treatment effect. The test of the null hypothesis uses adaptive weights that often lead to improvement in power over the log-rank test. Results Without needing to assume proportional hazards, the new approach yields results consistent with previously published findings in the Systolic Blood Pressure Intervention Trial. It provides a visual display of the time course of the treatment effect. At four of the five scheduled interim looks, the new approach yields smaller p values than the log-rank test. The average hazard ratio and its confidence interval indicates a treatment effect nearly a year earlier than a restricted mean survival time–based approach. Conclusion When the hazards are proportional between the comparison groups, the new methods yield results very close to the traditional approaches. When the proportional hazards assumption is violated, the new methods continue to be applicable and can potentially be more sensitive to departure from the null hypothesis.
Article
Cox model inference and the log‐rank test have been the cornerstones for design and analysis of clinical trials with survival outcomes. In this article, we summarize some recently developed methods for analyzing survival data when the hazards may possibly be nonproportional and also propose some new estimators for summary measures of the treatment effect. These methods utilize the short‐term and long‐term hazard ratio model proposed in Yang and Prentice (2005), which contains the Cox model and also accommodates various nonproportional hazards scenarios. Without the proportional hazards assumption, these methods often improve the log‐rank test and inference procedures based on the Cox model, as well as nonparametric procedures currently available in the literature. The proposed methods have sound theoretical justifications and can be computed quickly. R codes for implementing them are available. Detailed illustrations with 3 clinical trials are provided.
Article
Incomplete failure data consisting of times to failure on failed units and differing running times on unfailed units are called multiply censored. Data on units operating in the field, for example, are usually multiply censored. Presented in this paper is a method of plotting multiply censored data on hazard paper to obtain engineering information on the distribution of time to failure. Step-by-step instructions on how to plot and interpret data on hazard paper are given with the aid of examples based on real and simulated data. Hazard paper is presented here for the exponential, Weibull, normal, log normal, and extreme value distributions. The theory underlying hazard paper and plotting is presented in an appendix.
Article
A unified estimation procedure is proposed for the analysis of censored data using linear transformation models, which include the proportional hazards model and the proportional odds model as special cases. This procedure is easily implemented numerically and its validity does not rely on the assumption of independence between the covariates and the censoring variable. The estimator is the same as the Cox partial likelihood estimator in the case of the proportional hazards model. Moreover, the asymptotic variance of the proposed estimator has a closed form and its variance estimator is easily obtained by plug-in rules. The method is illustrated by simulation and is applied to the Veterans’ Administration lung cancer data.
Article
In this paper we consider a class of semi-parametric transformation models, under which an unknown transformation of the survival time is linearly related to the covariates with various completely specified error distributions. This class of regression models includes the proportional hazards and proportional odds models. Inference procedures derived from a class of generalised estimating equations are proposed to examine the covariate effects with censored observations. Numerical studies are conducted to investigate the properties of our proposals for practical sample sizes. These transformation models, coupled with the new simple inference procedures, provide many useful alternatives to the Cox regression model in survival analysis.
Article
We consider maximum likelihood estimation of the parameters in the proportional odds model with right-censored data. The estimator of the regression coefficient is shown to be asymptotically normal with efficient variance. The maximum likelihood estimator of the unknown monotonic transformation of the survival time converges uniformly at a parametric rate to the true transformation. Estimates for the standard errors of the estimated regression coefficients are obtained by differentiation of the profile likelihood and are shown to be consistent. A likelihood ratio test for the regression coefficient is also considered.
Article
For fitting the proportional odds regression model with right-censored survival times, we introduce some weighted empirical odds functions. These functions are solutions of some self-consistency equations and have a nice martingale representation. From these functions, several classes of new regression estimators, such as the pseudo–maximum likelihood estimator, martingale residual-based estimators, and minimum distance estimators, are derived. These estimators have desirable properties such as easy computation, asymptotic normality via a martingale analysis, and reliable asymptotic covariance estimation in closed form. Extensive numerical studies show that the minimum L2 distance estimators have very good finite-sample behaviors compared to existing methods. Results of some simulation studies and applications to a real dataset are given. The weighted odds function–based approach also provides inference on the baseline odds function and some measures for lack-of-fit analysis.
Article
Graphical methods based on the analysis of residuals are considered for the setting of the highly-used D. R. Cox [J. R. Stat. Soc., Ser. B 34, 187-220 (1972; Zbl 0243.62041)] regression model and for the P. K. Andersen and R. D. Gill [Ann. Stat. 10, 1100-1120 (1982; Zbl 0526.62026)] generalization of that model. We start with a class of martingale-based residuals as proposed by W. E. Barlow and R. L. Prentice [Biometrika 75, 65-74 (1988; Zbl 0632.62102)]. These residuals and/or their transforms are useful for investigating the functional form of covariate, the proportional hazards assumption, the leverage of each subject upon the estimates of β, and the lack of model fit to a given subject.
Article
This paper presents a new class of graphical and numerical methods for checking the adequacy of the Cox regression model. The procedures are derived from cumulative sums of martingale-based residuals over follow-up time and/or covariate values. The distributions of these stochastic processes under the assumed model can be approximated by zero-mean Gaussian processes. Each observed process can then be compared, both visually and analytically, with a number of simulated realizations from the approximate null distribution. These comparisons enable the data analyst to assess objectively how unusual the observed residual patterns are. Special attention is given to checking the functional form of a covariate, the form of the link function, and the validity of the proportional hazards assumption. An omnibus test, consistent against any model misspecification, is also studied. The proposed techniques are illustrated with two real data sets.