ArticlePDF AvailableLiterature Review

Statistical Considerations for NonInferiority/Equivalence Trials in Vaccine Development

Authors:

Abstract and Figures

Noninferioritylequivalence designs are often used in vaccine clinical trials. The goal of these designs is to demonstrate that a new vaccine, or new formulation or regimen of an existing vaccine, is similar in terms of effectiveness to the existing vaccine, while offering such advantages as easier manufacturing, easier administration, lower cost, or improved safety profile. These noninferioritylequivalence designs are particularly useful in four common types of immunogenicity trials: vaccine bridging trials, combination vaccine trials, vaccine concomitant use trials, and vaccine consistency lot trials. In this paper, we give an overview of the key statistical issues and recent developments for noninferioritylequivalence vaccine trials. Specifically, we cover the following topics: (i) selection of study endpoints; (ii) formulation of the null and alternative hypotheses; (iii) determination of the noninferioritylequivalence margin; (iv) selection of efficient statistical methods for the statistical analysis of noninferioritylequivalence vaccine trials, with particular emphasis on adjustment for stratification factors and missing pre-or post-vaccination data; and (v) the calculation of sample size and power.
Content may be subject to copyright.
Journal of Biopharmaceutical Statistics, 16: 429–441, 2006
Copyright © Taylor & Francis Group, LLC
ISSN: 1054-3406 print/1520-5711 online
DOI: 10.1080/10543400600719251
STATISTICAL CONSIDERATIONS
FOR NONINFERIORITY/EQUIVALENCE TRIALS
IN VACCINE DEVELOPMENT
W. W. B. Wang, D. V. Mehrotra, I. S. F. Chan, and J. F. Heyse
Clinical Biostatistics, Merck Research Laboratories, North Wales,
Pennsylvania, USA
Noninferiority/equivalence designs are often used in vaccine clinical trials. The goal of
these designs is to demonstrate that a new vaccine, or new formulation or regimen of
an existing vaccine, is similar in terms of effectiveness to the existing vaccine, while
offering such advantages as easier manufacturing, easier administration, lower cost, or
improved safety profile. These noninferiority/equivalence designs are particularly useful
in four common types of immunogenicity trials: vaccine bridging trials, combination
vaccine trials, vaccine concomitant use trials, and vaccine consistency lot trials. In this
paper, we give an overview of the key statistical issues and recent developments for
noninferiority/equivalence vaccine trials. Specifically, we cover the following topics: (i)
selection of study endpoints; (ii) formulation of the null and alternative hypotheses;
(iii) determination of the noninferiority/equivalence margin; (iv) selection of efficient
statistical methods for the statistical analysis of noninferiority/equivalence vaccine
trials, with particular emphasis on adjustment for stratification factors and missing pre-
or post-vaccination data; and (v) the calculation of sample size and power.
Key Words: Equivalence; Minimum risk weights; Missing data; Noninferiority; Stratified analysis;
Vaccine clinical trial.
1. INTRODUCTION
The usual goal of vaccination is to simulate an in vivo pathogen-specific
exposure that triggers the host’s immune system to generate a pool of effector
and memory B or T cells that will protect against potential real exposures in the
future. The simulation is accomplished via inoculation of the host by a vaccine that
contains either a live, attenuated version of the pathogen, or a DNA plasmid or
viral vector that encodes relevant genes of the pathogen to help elicit a cell-mediated
immune response, and so on.
The immunogenicity of a new vaccine is studied in the early stages of
development to assess whether the vaccine can induce quantifiable levels of
pathogen-specific immune responses. In later stages, the goal is to assess whether
the immune marker used to quantify vaccine immunogenicity qualifies as a correlate
Received and Accepted March 13, 2006
Address correspondence to William W.B. Wang, Clinical Biostatistics, Merck Research
Laboratories, UG-1CD, P.O. Box 1000, North Wales, Pennsylvania 19454 USA; E-mail:
William_Wang@Merck.Com
429
430 WANG ET AL.
of (or surrogate for) disease protection. Once a correlate for disease protection has
been established, immunogenicity trials with noninferiority/equivalence designs are
widely used as economic and time-efficient alternatives to large efficacy trials in
evaluating the effectiveness of a new or reformulated vaccine as compared to already
licensed vaccines (Chan et al., 2003; Horne, 1995; Mehrotra, in press).
Noninferiority/equivalence designs are particularly useful in four common
types of vaccine immunogenicity trials: vaccine bridging trials, combination vaccine
trials, vaccine concomitant use trials, and vaccine consistency lot trials. Each of
these trials serves a different purpose. For example, vaccine bridging trials are
used because manufacturing processes or storage conditions may be changed after
vaccine licensure to improve production yields and vaccine stability/shelf life.
Vaccine bridging trials are often required to demonstrate that such changes do
not have an adverse impact on vaccine effectiveness, by ruling out a clinically
significant difference in immune responses between the modified vaccine and the
current vaccine.
Combination vaccine trials are typically used to rule out clinically significant
differences in immune responses between a combined vaccine and separate but
simultaneously administered vaccines (Blackwelder, 1995; FDA, 1997; Horne et al.,
2001). A combination vaccine is intended to prevent multiple diseases or to prevent
one disease caused by different strains or serotypes of the same organism while
reducing the number of injections required (Chan et al., 2003). Similarly, since
the concomitant administration of multiple vaccines can reduce the number of
vaccination visits, vaccine concomitant use trials are used to rule out clinically
significant differences in immune responses between the concomitant administration
of two or more vaccines and the separate administration of each vaccine.
Finally, since vaccines are biological products that are not as stable and
well characterized as chemical drug products, vaccine consistency lot trials are
required to study multiple (typically, three) lots of vaccines made from the same
manufacturing process (called consistency lots). The goal is to rule out a clinically
significant difference in immunogenicity in either direction between any two pairs
of consistency lots.
The goal of this article is to provide an overview of key statistical
issues and recent developments for noninferiority/equivalence vaccine trials.
Some of the issues and methodologies that we discuss are also applicable to
drug noninferiority/equivalence trials. However, there are noteworthy differences
between drug and vaccine trials; where appropriate, we will draw attention to
such differences. The rest of the paper is organized as follows. In Section 2, we
describe typical immunogenicity endpoints and noninferiority/equivalence margins
used in practice, and discuss the formulation of the null and alternative hypotheses.
In Section 3, we discuss statistical analysis methods, including novel approaches
for dealing with stratification and missing data either at pre- or post-vaccination.
We discuss sample size and power considerations in Section 4, and offer some
concluding remarks in Section 5.
2. ENDPOINTS, HYPOTHESES, AND MARGIN SELECTION
The selection of immunogenicity endpoints, the formulation of study
hypotheses, and the choice of the noninferiority/equivalence margin should be
CONSIDERATIONS FOR NONINFERIORITY/EQUIVALENCE TRIALS 431
based on sound clinical, regulatory, and statistical judgments. A commonly used
immunologic endpoint is the immune response rate, defined as the percentage of
subjects who achieve a certain level of immune response after vaccination (yes/no).
If a particular level of immune response (either based on an absolute cutoff or a
fold rise from baseline) has been shown to be correlated with disease protection,
the percentage of participants achieving this “protective level” after vaccination is
usually considered the primary endpoint for immunogenicity analyses. However,
this definition is susceptible to potential issues in assay stability and consistency
across subjects, study sites, and assay-performing laboratories. Accordingly, another
commonly used immunologic endpoint is the geometric mean concentration (GMC)
of immune response after vaccination or the geometric mean fold rise (GMFR)
from pre-vaccination to post-vaccination. In the absence of a reasonable protective
level or a good correlate of protection, the GMC or GMFR is used as the primary
immunogenicity endpoint. In addition, as noted by Plikaytis and Carlone (2004),
the selection of endpoints can depend on the type of vaccine, such as T-cell
independent versus T-cell dependent, and the kinetic curve of immune responses
after vaccination.
The primary objective of a vaccine noninferiority/equivalence trial is to
demonstrate that a new or modified vaccine is noninferior or equivalent to the
current vaccine by ruling out a prespecified clinically relevant difference in the
immune response. Accordingly, in a noninferiority immunogenicity trial without
stratification, the null and alternative hypotheses are generally set up as
H0≤−0versus H1>0(1)
In this hypothesis setup, depending on the endpoint being used, =PTPC
or =logGMCTlogGMCC, where PTand PCare the population immune
response rates, and GMCTand GMCCare the population geometric means of the
immune response (either in terms of concentration or in terms of fold rise) after
vaccination, for the new vaccine and the control vaccine groups, respectively. 0is a
prespecified small positive quantity defining the noninferiority margin. The nominal
significance level for this one-sided test (usually one-sided =0025) is generally set
at half of the conventional significance level for a two-sided test for a difference
in proportions (Schuirmann, 1987). This approach has been adopted in regulatory
environments, as suggested in the International Conference on Harmonization E9
(ICH E9) guidelines (1999).
In vaccine equivalence trials, such as consistency-lot studies, the objective is to
show that the two vaccines are similar by ruling out a clinically significant difference
in either direction. Such studies are typically designed as a two-sided (at the =005
level) equivalence trial with respect to both immune response rates and GMCs. For
such trials, the hypothesis in Eq. (1) is replaced with H0≤−0or 0versus
H10<<
0, and the approach of two one-sided tests (Schuirmann, 1987) is
recommended by the ICH E9 guidelines (1999).
With respect to the noninferiority margin, the choice of 0should justify that
the new vaccine preserves a large proportion of the effectiveness of the control
vaccine relative to a placebo. Although a noninferiority margin that preserves 50%
of the treatment effect has been proposed for the evaluation of drug treatments
(Ebbutt and Frith, 1999; Temple, 1996), there is a general perception that a
432 WANG ET AL.
narrower margin should be used in preventive vaccine trials because vaccines
are given to potentially millions of healthy individuals for prophylaxis. The
noninferiority or equivalence margin for vaccine immunogenicity should depend
on the level of the correlation between immune responses and the vaccine efficacy,
the variability of immunogenicity responses, the class of vaccine being tested, and
the relative importance of the immunogenicity endpoints in the given trial. For
example, in studies of the hepatitis A vaccine VAQTA®, the immune response rates
in terms of seroconversion in the VAQTA®vaccine groups are usually greater than
90% (with 0% in the placebo group) and are highly correlated with protection.
A margin to preserve half of that is obviously inadequate; instead, a 0of 10
percentage points has been used as the noninferiority margin (Frey et al., 1999). For
comparing GMCs, a 0of log(0.67) or log(0.5) (corresponding to a 1.5- or 2-fold
difference) has been used. The noninferiority margin should be discussed proactively
between the trial sponsor and the regulatory agencies. Temple (1996), Ebbutt and
Frith (1999), and both the ICH E9 (1999) and ICH E10 guidelines (2000) and the
EMEA guideline (2005) include some more general discussions on the choice of
noninferiority and equivalence margins.
3. STATISTICAL METHODS FOR VACCINE
NONINFERIORITY/EQUIVALENCE TRIALS
The analysis of noninferiority/equivalence trials is generally based on the dual-
use of hypothesis testing and test-based confidence intervals. Both p-value and
confidence intervals are reported in most trials. Operationally, rejection of the null
hypothesis in Eq. (1) at one-sided level is equivalent to the lower bound of the
two-sided (1 2) confidence interval for be greater than 0.
Asymptotic statistical tests of hypothesis in Eq. (1) and corresponding test-
based confidence intervals for two treatment groups with a dichotomous endpoint
have been extensively discussed in the literature in assessing a vaccine effect based
on the difference of two immune response rates. Many authors have proposed
Z-type test statistics with different standard error estimates (Blackwelder, 1982;
Miettinen and Nurminen, 1985). The more commonly used and better performing
of these is the Z-type statistic, shown below, proposed by Miettinen and Nurminen
(1985), Farrington and Manning (1990), Chan and Zhang (1999).
ZD=
PT
PC+0
˜02
(2)
where
˜02 =1
NT
PT1
PT+1
NC
PC1
PC1/2
(3)
and
PT
PCare the constrained maximum likelihood estimates (MLEs) of (PT,PC
under the null hypothesis given in Eq. (1) based on the observed responses
PT
PC.
(Detailed expressions for
PT
PCare presented by Miettinen and Nurminen
(1985), as well as by Farrington and Manning (1990)). Noninferiority is established
CONSIDERATIONS FOR NONINFERIORITY/EQUIVALENCE TRIALS 433
(H0is rejected) at the one-sided significance level if ZDZwhere Zis the
1001 percentile of the standard normal distribution.
The method in the preceding paragraph works exceptionally with large
sample sizes. With “small” sample sizes (say n<50 per group), exact (rather than
asymptotic theory-based) tests and corresponding confidence intervals (Agresti and
Min, 2001; Chan, 1998, 2002, 2003; Chan and Zhang, 1999; Mehrotra et al., 2003)
are preferable. However, as noted by Mehrotra et al. (2003), the test statistic for
exact inference should be carefully chosen to avoid a loss in power. For example,
the only option in the StatXact-4 software package for calculating exact confidence
intervals used the numerator of ZDas the test statistic; the power based on this
unstandardized statistic can be substantially lower than the power based on ZD.
Fortunately, StatXact-5 includes other options for calculating exact unconditional
test-based confidence intervals, including Chan and Zhang’s (1999) method based
on inverting two one-sided tests using the ZDstatistic, and the Agresti and Min’s
(2001) method of inverting one two-sided test. It should be noted that with relatively
large sample sizes (say, n>200 per group), the exact and corresponding asymptotic
methods usually yield very similar results, so use of the former is not necessary.
Of note, since noninferiority trials focus on one-sided (level) hypothesis
tests, we recommend that the corresponding exact confidence intervals (usually
at two-sided 1 2level) be obtained by inverting two one-sided tests to ensure
consistency of inference in the sense that rejection of the noninferiority null
hypothesis in Eq. (1) (p-value  is equivalent to the lower bound of the two-
sided 12 CI on the difference being greater than 0(Chan, 2003; Chan and
Mehrotra, 2003). The confidence interval obtained by inverting one two-sided test
(Agresti and Min, 2001), although generally narrower than the confidence intervals
obtained by inverting two one-sided tests, do not guarantee control of the error rate
on each side at the level; hence, it may produce results that are inconsistent with
the one-sided noninferiority hypothesis test (Chan, 2003).
In assessing a vaccine effect based on a fold difference in GMCs between
the vaccine and control groups, the statistical testing for hypothesis in Eq. (1)
and the corresponding confidence interval estimation are often performed by using
an analysis of variance (ANOVA), an analysis of covariance (ANCOVA), or a
linear mixed-effects model that includes the natural log of immune responses as the
dependent variable, and the treatment group, baseline, and stratification factors (if
any) as the explanatory variables.
Statistical Methods for Stratified Trials
If it is known a priori that certain prognostic factors, such as the subject’s age,
gender, or pre-vaccination immune status, will influence the response to vaccination,
then the strategy of pre-stratification by such factors is often used in clinical trials
to facilitate unbiased and more efficient comparisons of treatment groups. For a
stratified trial, as noted by Mehrotra (2002), it is important to be clear about what
hypothesis is being tested; see also Gail et al. (1996) and Ganju and Mehrotra
(2003). For a noninferiority vaccine trial, the hypothesis that is usually of interest is
H0=
s
i=1
fii≤−0versus H1=
s
i=1
fii>0(4)
434 WANG ET AL.
where iis the true difference in response rates (or means) for stratum i, and fiis the
fraction of subjects in the target population that belong to stratum i
s
i=1fi=1.
A test of hypothesis Eq. (4) is typically conducted using the following statistic:
Zw=
ˆ
w+0
V0ˆ
w
(5)
In Eq. (5), ˆ
w=s
i=1wiˆ
i=s
i=1wi
PiT
PiC , and wiis the weight assigned to the
ith stratum s
i=1wi=1. There are two options for the denominator in Eq. (5).
The first option is to use the null variance
V0ˆ
w=s
i=1w2
i
V0ˆ
i, where
V0ˆ
i=
1
NiT
PiT 1
PiT +1
NiC
PiC 1
PiC , with
PiT
PiC as before. The second option is
to replace
V0ˆ
iwith the observed variance (Blackwelder, 1995; Dunnett and Gent,
1977) ˆpiC1−ˆpiC
niC
+ˆpiT 1−ˆpiT
niT . In either case, the null hypothesis is rejected at the
one-sided level if Zw>Z
.
For the weights, wi, two popular choices are the Cochran-Mantel-Haenszel
(CMH) weights, and the precision or inverse-variance (INVAR) weights (Mehrotra
and Railkar, 2000). The CMH weights are proportional to the harmonic means
of the observed stratum-specific sample sizes, while the INVAR weights are
proportional to the reciprocals of the observed variances of the stratum-specific
differences. The SSIZE weights are optimal (in terms of power) if the true odds
ratios, piT 1piC
piC 1piT , are constant across strata, while the INVAR weights are optimal
if the iare constant. Unfortunately, in practice, we rarely know whether it is
the true stratum-specific odds ratios or the ithat are “closer” to being constant
across strata, so the choice between SSIZE and INVAR is essentially a gamble.
To help minimize the potential loss in power that might be incurred by gambling
unfavorably, Mehrotra and Railkar (2000) proposed a “minimum risk” (MR)
weighting strategy which yields an estimate of that has the smallest asymptotic
mean squared error, and offers power advantages as well.
The MR weighting strategy can be particularly useful for stratified
noninferiority vaccine trials. To illustrate, Table 1 contains simulation results
comparing the null variance (MN) method with CMH weights, and the observed
variance method with CMH or MR weights for a stratified analysis of a difference
in two proportions. In more general settings, it has been shown that using the null
variance is typically better than using the observed variance in terms of power,
Type I error rate control, and confidence interval coverage (Dunnett and Gent,
1977; Miettinen and Nurminen, 1985). However, for stratified noninferiority trials
involving vaccines with high response rates (>90%), the results in Table 1 suggest
that using the observed variance along with MR rather than CMH weights can yield
notable gains in power. Intuitively, this is caused by the relatively large separation
of
PiT and
PiC under the constraint of
PiT
PiC =−0, which in turn leads to a
relatively large estimated null variance as compared with the observed variance. Of
note, the same result holds for unstratified trials, or for stratified trials in which
stratification is ignored at the time of analysis and an unstratified analysis is used;
the latter is not recommended because it can result in a loss in power (Mehrotra
and Railkar, 2000; Mehrotra, 2001).
CONSIDERATIONS FOR NONINFERIORITY/EQUIVALENCE TRIALS 435
Table 1 Comparison of using null variances with CMH weights versus using observed variances with
either CMH or MR weights
Stratum 1 Stratum 2 Methods
p1Tp1Cp2Tp2C
0N/trt CMH_OBS CMH_MN MR_OBS
Empirical Type I error rates (nominal = 2.5%, 1-sided)
.35 50 50 65 015 15 130 252724
.60 70 80 90 010 10 200 212324
.85 90 90 95 005 05 350 222223
Empirical power (%)
.50 50 80 80 0 15 130 791801802
.70 70 90 90 0 10 200 790788833
.90 90 95 95 0 05 350 752740779
f1=3, f2=7); 10000 simulations;
CMH_OBS = CMH weights with observed variance;
CMH_MN = CMH weights with null variance (MN Method);
MR_OBS = Minimum Risk weights with observed variance
Another important issue in stratified data analysis for noninferiority trials is
the test of treatment by stratum interaction. Gail and Simon (1985) used the terms
quantitative and qualitative interactions and proposed likelihood-based testing
procedures. Pan and Wolfe (1997) generalized their methodology to interactions
with clinical significance, that is, clinically meaningful interaction. For quantitative
interaction with binary responses, Mehrotra (2002) proposed a new test, which, in a
comprehensive simulation study done by Mehrotra and Chan (2000) controlled the
Type I error rate and was generally at least as powerful as several other published
methods. Wiens and Heyse (2003) presented and compared five different analysis
strategies to test for qualitative treatment-by-stratum interaction in noninferiority
trials.
Statistical Methods in Handling Missing Immunogenicity Data
Most vaccine regimens include a sequence of one or more “priming”
inoculations followed by a “booster” shot later, if necessary. Blood samples
are collected at one or more time points after each inoculation and assayed
for immune activity. Missing data and losses to follow-up do occur in vaccine
noninferiority/equivalence (and superiority) trials. This situation is similar to the
incomplete longitudinal data problem for drug trials. However, there are two key
differences. First, while the missing data resulting from dropouts in vaccine trials
are typically missing completely at random (MCAR), they are more likely to be
either missing at random (MAR) or non-ignorably missing (NM) for drug trials.
The reason is that patients often drop out from drug trials because they are not
responding favorably to their assigned treatment (e.g., high blood pressure not
declining); this concept is generally not applicable for vaccine trials! The second
key difference is that the ability to predict or impute the missing data at, say,
436 WANG ET AL.
the post-boost visit of interest may be better for vaccine trials compared with
drug trials. This happens because subjects in vaccine trials are inherently less
heterogeneous than patients in drug trials. Moreover, for some (but not all) vaccines,
the post-prime responses are highly correlated with (and hence predictive of) the
post-boost responses.
A common source of missing data in vaccine immunogenicity trials is
mishaps or errors in blood sample storage, sample handling, or assay testing for
immunogenicity measurements. In such trials, missing data can occur at baseline
(pre-vaccination) as at post-vaccination. This type of missing data can complicate
the data analysis if the baseline immune status is to be used as a covariate, as, for
example, in an ANCOVA model for continuous responses with the baseline value
as the covariate. A simple and common way to tackle this missing immunogenicity
data problem is to use a “complete case” (CC) analysis which excludes subjects with
missing pre- or post-vaccination data. Although this approach is unbiased under
MCAR, it is inefficient because it fails to utilize the information of the excluded
subjects. A better alternative is to use principled methods for longitudinal data
analysis such as “restricted maximum likelihood” (REML), “generalized estimating
equations” (GEE), or “multiple imputation” (MI), all of which are readily available
in standard software. The gains in efficiency of the latter approaches over the
complete case analysis can be significant when the amount of missing data is
large. For example, there could be a substantial amount of missing post-boost
immunogenicity data in the case of an interim analysis of an ongoing trial when the
later enrolled subjects have received priming inoculations but not yet been boosted.
If the post-prime information from these subjects is not included in the interim
analysis, the analysis could be missing a large amount of available data. As a result,
it would be preferable to use an analytical approach that could actually incorporate
all the data.
Li et al. (in press) proposed a propensity score-based multiple imputation (MI)
method to tackle missing data in longitudinal clinical trials with binary responses,
with particular emphasis on vaccine immunogenicity trials. They noted three key
results. First, if data are missing completely at random, MI can be notably more
efficient than the CC and GEE methods. Second, with small samples, GEE often
fails because of “convergence problems,” but MI is free of those problems. Finally,
if the data are missing at random, MI generally yields results with negligible bias,
while the CC and GEE methods yield results with moderate to large bias.
Wang et al. (2003) applied a longitudinal regression approach with a
“restricted” linear model structure (Liang and Zeger, 2000) to analyze vaccine
immunogenicity data; the restriction is that the population means for all treatment
groups are identical at baseline. This model allows for the comparison of
postvaccination immune responses between two vaccination groups, adjusting for
pre-vaccination immunogenicity levels in the presence of incomplete data (missing
either at pre- or post-vaccination). Under the assumption of equal population
means at baseline, which is justified in randomized trials, this longitudinal model
provides an unbiased estimate of the treatment effect while increasing the power for
the noninferiority/equivalence comparison as compared with the “complete case”
analysis. In the case of no missing data, estimates of treatment differences from this
longitudinal model are identical to those from the traditional analysis of covariance
(ANCOVA) model.
CONSIDERATIONS FOR NONINFERIORITY/EQUIVALENCE TRIALS 437
4. POWER CALCULATION FOR VACCINE
NONINFERIORITY/EQUIVALENCE TRIALS
There are many methods for calculating sample size and power for vaccine
noninferiority/equivalence trials. Here, we briefly describe the methods based on the
commonly used asymptotic Z-test for noninferiority immunogenicity trials.
For bridging studies that are designed to test the noninferiority hypothesis
in Eq. (1) with respect to immune response rates, a commonly used method is the
sample size formula proposed by Farrington and Manning (1990), which is based
on the Z-type test statistic in Eq. (2). To have 1 power to claim noninferiority at
the one-sided level, the approximate sample size for testing hypothesis in Eq. (1)
should be
NT=Z¯02 +Z122PTPC+02and NC=uNT(6)
where uis the prespecified sample size ratio between the control and the test vaccine
groups, PCand PTare the expected immune response rates for the control and test
vaccine groups in the planned study, respectively, 12 is the value of the expression in
(2) when the constrained maximum likelihood estimates
PTand
PCare replaced with
PTand PC, respectively, and ¯02 is the value of ˜02 in Eq. (2) obtained when (
PT
PC
are calculated taking
PT
PC=PTP
C. Similarly, the power of a one-sided level
ZDtest based on normal approximation is given by
1=1
12
Z¯02 +NTPTPC+0(7)
where is the standard normal cumulative distribution function.
For bridging studies that are designed to test noninferiority hypothesis in
Eq. (1) with respect to GMCs, the sample size calculation is similar to that
for testing the conventional hypothesis of no difference under the log-normal
assumption. Let be the standard deviation of the log-transformed immune
responses. In order to achieve 1 power for testing hypothesis in Eq. (1) at the
one-sided level, the sample size for the test vaccine group is
NT=1+1/uZ+Z22/logRGMC 02and NC=uNT(8)
where uis as specified earlier, and RGMC =GMCT/GMCCis the expected ratio
of GMCs between the new and control vaccine groups under the alternative
hypothesis. The power of a one-sided level test based on normal approximation is
given by
1=Z+1
NT
1+1/ulogRGMC0(9)
In order to demonstrate the stability of the vaccine manufacturing process,
consistency lot studies are often required by regulatory authorities on vaccine
products to show equivalence among three clinical lots. In such trials, there are
multiple pairwise comparisons and no closed-form formula exists for sample size
438 WANG ET AL.
calculation, but simulations can be used to adequately plan the sample size. For
power estimation, for example, for hypothesis testing with respect to immune
response rates, Wiens et al. (1996) recommended performing simulations under the
setup that the true immune response rates for the (typically) three lots are P1,P1,
and P1+05, instead of all being equal. This setup gives a conservative estimate of
power when the alternative hypothesis is true (Wiens et al., 1996).
In a combination or multivalent vaccine study, the hypotheses regarding each
component or each serotype are often considered to be co-primary. In planning
such a study, the sample size for each primary hypothesis can be calculated using
the formulas in Eq. (6), Eq. (8), or any other related sample size formulas for
the particular hypothesis in question. To address multiplicity concerns, a popular
approach is to plan the power for each primary hypothesis large enough such that
the overall power is acceptable under the assumption of independence among the
multiple hypotheses. For example, for a multivalent vaccine with kcomponents,
one can plan the power for the hypothesis testing on each component to be at
least 1 /k so that the overall power is guaranteed to be no less than 1 . This
approach controls the overall Type I error very stringently (much less than the
nominal level if each null is true), but it is rather conservative in that insignificance
for just one component implies a failure to reject the overall null hypothesis. This
approach is also conservative (and provides a lower bound for power) because
the correlation between the components is usually positive. Based on extensive
simulations, Kong et al. (2004) pointed out that trial designs with high power
(>80%) under the assumption of independence have only a modest increase in
power when the correlations between the components are taken into consideration.
At this point, there is no commonly accepted testing strategy to overcome the
conservatism of this popular approach, and further research is needed.
5. CONCLUDING REMARKS
Vaccine noninferiority/equivalence trials involve unique statistical issues when
compared with drug trials or other types of vaccine trials (e.g., vaccine efficacy
trials). Unlike drug trials, these vaccine noninferiority trials generally aim to assess
the biological effect of a vaccine on healthy and inherently less heterogeneous study
population. Adherence in these trials is usually high (completion of the vaccination
series if more than one dose is administered) because there tend to be few drop-outs
due to adverse experiences. A per-protocol approach is often considered to be the
primary approach for vaccine noninferiority trials, rather than the intention-to-treat
approach that is used in drug studies.
In contrast with other types of vaccine trials, the conclusions drawn from
noninferiority/equivalence vaccine trials are highly dependent on the identification
of appropriate correlates of protection and on the measurement of immune
responses by properly validated immunogenicity assays. In this paper, we have
focused on parametric inferential methods and related issues for establishing the
equivalence/noninferiority of immune response rates and geometric means of
immune responses. In addition to these parametric analyses, the comparability of
immune responses to vaccinations is commonly illustrated by graphical displays
of the reverse cumulative distribution curves of PrX x. These curves give
the percentages of participants with immune responses greater than or equal to
CONSIDERATIONS FOR NONINFERIORITY/EQUIVALENCE TRIALS 439
varying fixed levels x. Nonparametric methods have also been proposed by Stine
and Heyse (2001) to estimate the overlap or proportion of a similar response in
distributions, which can be used to measure the similarity between two distributions
of immune responses. In addition, it should be noted that, even though this paper
has focused on the most common types of vaccine noninferiority/equivalence trials
where the primary endpoints are immunogenicity related, the statistical techniques
described here can be used for efficacy and safety related endpoints as well
(Chan et al., 2003; Chan, 2003).
From a statistical research perspective, vaccine noninferiority/equivalence
trials are a fertile area of investigation. Each of the four aforementioned common
types of noninferiority/equivalence immunogenicity trials offer unique statistical
challenges. Further statistical research is needed in establishing immunogenicity
markers, selecting immunogenicity endpoints, choosing noninferiority margins,
and finding optimal statistical strategies (e.g., possible applications of Bayesian
techniques) to establish noninferiority/equivalence. Due to the emerging need of
multivalent combination vaccines to simplify vaccination schedules, the multiplicity
issue should also be a priority area for future research. The use of composite
endpoints or gate-keeping strategies may be useful in this area. The ultimate goal is
to have a readily available statistical toolkit that can be used to minimize the risk
of abandoning potentially beneficial vaccines or licensing inferior ones.
ACKNOWLEDGMENTS
We thank the referees for their thoughtful comments that led to an improved
manuscript.
REFERENCES
Agresti, A., Min, Y. (2001). On small-sample confidence intervals for parameters in discrete
distributions. Biometrics 57:963–971.
Blackwelder, W. C. (1982). Proving the null hypothesis in clinical trials. Controlled Clinical
Trials 3:345–353.
Blackwelder, W. C. (1995). Similarity/equivalence trials for combination vaccine. Williams,
J. C.; Goldenthal, K. L., Burns, D. L., Lewis, B. P. eds. Combined Vaccines and
Simultaneous Administration. Ann. New York Acad. Sciences, 754:321–328.
Chan, I. S. F. (1998). Exact tests of equivalence and efficacy with a non-zero lower bound
for comparative studies. Statistics in Medicine 17:1403–1413.
Chan, I. S. F. (2002). Power and sample size determination for noninferiority trials using an
exact method. Journal of Biopharmaceutical Statistics 12(4):457–469.
Chan, I. S. F. (2003). Proving noninferiority or equivalence of two treatments with
dichotomous endpoints using exact methods, Statistical Methods in Medical Research
12:37–58.
Chan, I. S. F., Mehrotra, D. V. (2003). Confidence intervals and hypothesis testing.
Encyclopedia of Biopharmaceutical Statistics, 2nd ed. New York: Marcel Dekker,
pp. 231–234.
Chan, I. S. F., Wang, W. W. B., Heyse, J. F. (2003). Vaccine clinical trials. Encyclopedia of
Biopharmaceutical Statistics. Chow, S. C. eds. 2nd ed. New York: Marcel Dekker, Inc.
pp. 1005–1022.
Chan, I. S. F., Zhang, Z. X. (1999). Test-based exact confidence intervals for the difference
of two binomial proportions. Biometrics 55:1202–1209.
440 WANG ET AL.
Dunnett, C. W., Gent, M. (1977). Significance testing to establish equivalence between
treatments with special reference to data in the form of 2 ×2 tables. Biometrics
33:593–602.
Ebbutt, A. F., Frith, L. (1999). Practical issues in equivalence trials. Stat. in Med. 1691–1701.
European Medicines Agency (EMEA). (2005). Committee for Medicinal Products for
Human Uses (CHMP), Guideline on the choice of noninferiority margin. July.
Farrington, C. P., Manning, G. (1990). Test statistics and sample size formulae for
comparative binomial trials with null hypothesis of non-zero risk difference or non-
unity relative risk. Statistics in Medicine 9:1447–1454.
FDA. (1997). Guidance for industry for the evaluation of combination vaccines for
preventable diseases: production, testing and clinical studies. Center for Biologics
Evaluation and Research, Food and Drug Administration.
Frey et al. (1999). Interference of antibody production to hepatitis B surface antigen in a
combination hepatitis A/hepatitis B vaccine. J. Infect. Dis. 180(6):2018–2022.
Gail, M., Simon, R. (1985). Testing for qualitative interactions between treatment effects
and patient subsets. Biometrics 41:361–372.
Gail, M. H., Mark, S. D., Carroll, R. J., Green, S. B., Pee, D. (1996). On design
considerations and randomization-based inference for community intervention trials.
Statistics in Medicine 15:1069–1092.
Ganju, J., Mehrotra, D. V. (2003). Stratified experiments re-examined with emphasis on
multicenter clinical trials. Controlled Clinical Trials 24:167–181.
Horne, A. D. (1995). The statistical analysis of immunogenicity data in vaccine trials: a
review of methodologies and issues. Ann. N Y Acad. Sci. 754:329–46.
Horne, A. D., Lachenbruch, P. A., Getson, P. R., Hsu, H. (2001). Analysis of studies
to evaluate immune response to combination vaccines. Clinical Infectious Diseases
33(Suppl 4):S306–11.
ICH E9 Expert Working Group. (1999). Statistical principles for clinical trials: ICH
harmonized tripartite guidelines. Statistics in Medicine 18:1905–1942.
ICH E10 Guideline. (2000). Choice of control group and related issues in clinical trials.
International Conference on Harmonization (ICH), July.
Kong, L., Kohberger, R. C., Koch, G. G. (2004). Type I error and power in
noninferiority/equivalence trials with correlated endpoints: an example from vaccine
development trials. J. of Biopharmaceutical Statistics 14:893–907.
Li, X., Mehrotra, D. V., Barnard, J. (in press). Analysis of incomplete longitudinal binary
data using multiple imputation. Statistics in Medicine.
Liang, K., Zeger, S. (2000). Longitudinal data analysis of continuous and discrete response
for pre-post designs. Sankhya 62:134–138.
Mehrotra, D. V. (2001). Stratification issues with binary endpoints. Drug Information Journal
35(4):1343–1350.
Mehrotra, D. V. (2002) Stratified comparative clinical trials—analysis and interpretation
issues. Proceeding of International Biometrics Conference.
Mehrotra, D. V. Vaccine clinical trials – A statistical primer. Journal of Biopharmaceutical
Statistics (in press).
Mehrotra, D. V., Chan, I. S. F. (2000). Testing for treatment by stratum interaction in
stratified comparative binomial trial. Joint Statistical Meetings. August.
Mehrotra, D. V., Chan, I. S. F., Berger, R. L. (2003). A cautionary note on
exact unconditional inference for a difference between two independent binomial
proportions. Biometrics 59:441–450.
Mehrotra, D. V., Railkar, R. (2000). Minimum risk weights for comparing treatments in
stratified binomial trials. Statistics in Medicine 19:811–825.
Miettinen, O, Nurminen, M. (1985). Comparative analysis of two rates. Statistics in Medicine
4:213–226.
CONSIDERATIONS FOR NONINFERIORITY/EQUIVALENCE TRIALS 441
Pan, G. H., Wolfe, D. A. (1997). Test for qualitative interaction of clinical significance.
Statistics in Medicine 16:1645–1652.
Plikaytis, B. D., Carlone, P. M. (2004). Statistical considerations for vaccine immunogenicity
trials: Part 1 and Part 2, Vaccine 23(13):1596–1614.
Schuirmann, D. J. (1987). A comparison of the two one-sided tests procedure and the
power approach for assessing the equivalence of average bioavailability. Journal of
Pharmacokinetics and Biopharmaceutics 15:657–680.
Stine, R. A., Heyse, J. F. (2001). Nonparametric measures of overlap. Stat in Med 20:215–236.
Temple, R. (1996). Problems in interpreting active control equivalence trials. Accountability
in Research 4:267–275.
Wang, W. W. B, Li, D., Liu, F., Chan, I. S. F. (2003). Analysis of immune responses in
vaccine clinical trials with a pre-post design. Joint Statistical Meeting. San Francisco,
August.
Wiens, B. L. Heyse, J. F. (2003). Testing for interaction in studies of noninferiority. Journal
of Biopharmaceutical Statistics 13:103–115.
Wiens, B. L., Heyse, J. F., Matthews, H. (1996). Similarity of three treatments, with
application to vaccine development. Proceedings of the Biopharmaceutical Section, Joint
Statistical Meeting 203–206.
... These comparisons were done using log-transformed mean titres in linear regression models to obtain GMT ratios and their corresponding 95 % confidence intervals (CIs). Non-inferiority of the single-and two-dose groups compared to the three-dose group was concluded if the lower bound of the 95 % CI for its GMT ratio was greater than 0.5 In keeping with other HPV immunogenicity studies [16,17]. For comparisons with the unvaccinated cohort, 5 % was taken as the level of significance. ...
Article
Full-text available
Background The recent World Health Organization recommendation supporting single-dose of HPV vaccine will significantly reduce programmatic cost, mitigate the supply shortage, and simplify logistics, thus allowing more low- and middle-income countries to introduce the vaccine. From a programmatic perspective the durability of protection offered by a single-dose will be a key consideration. The primary objectives of the present study were to determine whether recipients of a single-dose of quadrivalent HPV vaccine had sustained immune response against targeted HPV types (HPV 6,11,16,18) at 10 years post-vaccination and whether this response was superior to the natural antibody titres observed in unvaccinated women. Methods Participants received at age 10–18 years either one, two or three doses of the quadrivalent HPV vaccine. Serology samples were obtained at different timepoints up to 10 years after vaccination from a convenience sample of vaccinated participants and from age-matched unvaccinated women at one timepoint. The evolution of the binding and neutralizing antibody response was presented by dose received. 10-year durability of immune responses induced by a single-dose was compared to that after three doses of the vaccine and in unvaccinated married women. Results The dynamics of antibody response among the single-dose recipients observed over 120 months show stabilized levels 18 months after vaccination for all four HPV types. Although the HPV type-specific (binding or neutralizing) antibody titres after a single-dose were significantly inferior to those after three doses of the vaccine (lower bounds of GMT ratios < 0.5), they were all significantly higher than those observed in unvaccinated women following natural infections (GMT ratios: 2.05 to 4.04-fold higher). The results correlate well with the high vaccine efficacy of single-dose against persistent HPV 16/18 infections reported by us earlier at 10-years post-vaccination. Conclusion Our study demonstrates the high and durable immune response in single-dose recipients of HPV vaccine at 10-years post vaccination.
... serum antibody levels, immune response rates, etc. are of great interest and used for vaccine dose finding and optimal schedule selection during early phases. The immunogenicity outcomes remain primary endpoints for vaccine bridging trials, combination trials, concomitant trials, and lot-to-lot consistency trials, and play a critical role throughout the vaccine development [2]. ...
Article
In the past decades, the world has experienced several major virus outbreaks, e.g. West African Ebola outbreak, Zika virus in South America and most recently global coronavirus (COVID-19) pandemic. Many vaccines have been developed to prevent a variety of infectious diseases successfully. However, several infections have not been preventable so far, like COVID-19, which induces an immediate urgent need for effective vaccines. These emerging infectious diseases often pose unprecedent challenges for the global heath community as well as the conventional vaccine development paradigm. With a long and costly traditional vaccine development process, there are extensive needs in innovative vaccine trial designs and analyses, which aim to design more efficient vaccines trials. Featured with reduced development timeline, less resource consuming or improved estimate for the endpoints of interests, these more efficient trials bring effective medicine to target population in a faster and less costly way. In this paper, we will review a few vaccine trials equipped with adaptive design features, Bayesian designs that accommodate historical data borrowing, the master protocol strategy emerging during COVID-19 vaccine development, Real-World-Data (RWD) embedded trials and the correlate of protection framework and relevant research works. We will also discuss some statistical methodologies that improve the vaccine efficacy, safety and immunogenicity analyses. Innovative clinical trial designs and analyses, together with advanced research technologies and deeper understanding of the human immune system, are paving the way for the efficient development of new vaccines in the future.
Article
Immunogenicity of Two or Three Doses of 9vHPV VaccineThis noninferiority trial examined two versus three doses of 9-valent human papillomavirus (9vHPV) vaccine in individuals 15 to 26 years of age in the United States. In an unplanned interim analysis of female participants, two doses of 9vHPV vaccine appeared to elicit similar rates of seroconversion and antibody titers for each of the nine HPV genotypes to three doses at 1 month postvaccination.
Article
Non‐inferiority trials compare new experimental therapies to standard ones (active control). In these experiments, historical information on the control treatment is often available. This makes Bayesian methodology appealing since it allows a natural way to exploit information from past studies. In the present paper, we suggest the use of previous data for constructing the prior distribution of the control effect parameter. Specifically, we consider a dynamic power prior that possibly allows to discount the level of borrowing in the presence of heterogeneity between past and current control data. The discount parameter of the prior is based on the Hellinger distance between the posterior distributions of the control parameter based, respectively, on historical and current data. We develop the methodology for comparing normal means and we handle the unknown variance assumption using MCMC. We also provide a simulation study to analyze the proposed test in terms of frequentist size and power, as it is usually requested by regulatory agencies. Finally, we investigate comparisons with some existing methods and we illustrate an application to a real case study.
Article
For non-inferiority/superiority and equivalence tests of two Poisson rates, the determination of the required number of sample sizes has been studied but the studies for the number of events to be observed are very limited. To fill the gap, the present study first is aimed toward determining the number of events to be observed for testing non-inferiority/superiority and equivalence of two Poisson rates, respectively. Also, considering the cost for each event, the second purpose is to apply an exhaustive search to find the unequal but optimal allocation of events for each group such that the budget is minimal for a user-specified power level, or the statistical power is maximal for a user-specified budget. Four R Shiny apps were developed to obtain the number of events needed for each group. A simulation study showed the proposed approach to be valid in terms of Type I error and statistical power. A comparison of the proposed approach with extant methods from various disciplines was performed, and an illustrative example of comparing the adverse reactions to the COVID-19 vaccines was demonstrated. By applying the proposed approach, researchers also can estimate the most economical number of subjects or time intervals after determining the number of events.
Article
Equivalence tests establish whether treatments are similar in their intended outcomes. This is in contrast to superiority tests, which establish whether a new treatment is better than a standard treatment or placebo. Few equivalence trials have employed a cluster randomized design, but they are subject to some of the same analysis pitfalls that are common to superiority trials—namely, a failure to adjust for either cluster effects or covariate imbalances resulting from randomization. Using real and simulated data from a cluster randomized trial comparing exercise protocols among U.S. Army soldiers, this study empirically demonstrates the consequences for power and Type I error rates when either or both of these effects have been ignored in the analysis. Analysis of real trial data showed that equivalence test outcomes can change depending on whether appropriate adjustments are applied. Simulations demonstrated that failing to adjust for important baseline covariates severely reduces statistical power, and failing to adjust for cluster effects increases the risk of false declarations of equivalence. As cluster randomized designs are increasingly employed for equivalence trials, analysts must be aware of the importance of adjusting for cluster effects and covariate imbalances to avoid false conclusions.
Article
Non-inferiority vaccine trials compare new candidates to active controls that provide clinically significant protection against a disease. Bayesian statistics allows to exploit pre-experimental information available from previous studies to increase precision and reduce costs. Here, historical knowledge is incorporated into the analysis through a power prior that dynamically regulates the degree of information-borrowing. We examine non-inferiority tests based on credible intervals for the unknown effects-difference between two vaccines on the log odds ratio scale, with an application to new Covid-19 vaccines. We explore the frequentist properties of the method and we address the sample size determination problem.
Article
Full-text available
Background The SARS-CoV-2 coronavirus epidemic is hastening the discovery of the most efficient vaccines. The development of cost-effective vaccines seems to be the only solution to terminate this Pandemic. However, the vaccines’ effectiveness has been questioned due to recurrent mutations in the SARS-CoV-2 genome. Most of the mutations are associated with the spike protein, a vital target for several marketed vaccines. Many Countries were highly affected by the 2nd wave of the SARS-CoV-2, like the UK, India, Brazil, France. Experts are also alarming the further COVID-19 wave with the emergence of Omicron, which is highly affecting the South African populations. This review encompasses the detailed description of all vaccine candidates and COVID-19 mutants that will add value to design further studies to combat the COVID-19 Pandemic. Methods The information was generated using various search engines like google scholar, PubMed, clinicaltrial.gov.in, WHO database, ScienceDirect, and news portals by using keywords SARS-CoV-2 Mutants, COVID-19 Vaccines, Efficacy of SARS-CoV-2 Vaccines, COVID-19 waves. Results This review has highlighted the evolution of SARS-CoV2 variants and the vaccine efficacy. Currently, various vaccine candidates are also undergoing several phases of development. Their efficacy still needs to check for newly emerged variants. We have focused on the evolution, multiple mutants, waves of the SARS-CoV-2, and different marketed vaccines undergoing various clinical trials and the design of the trials to determine vaccine efficacy. Conclusion Various mutants of SARS-CoV-2 arrived, mainly concerned with the spike protein, a key component to design the vaccine candidates. Various vaccines are undergoing clinical trial and show impressive results, but their efficacy still needs to be checked in different SARS-CoV-2 mutants. We discussed all mutants of SARS-CoV-2 and the vaccine's efficacy against them. The safety concern of these vaccines is also discussed. It is important to understand how coronavirus gets mutated to design better new vaccines, providing long-term protection and neutralizing broad mutant variants. A proper study approach also needs to be considered while designing the vaccine efficacy trials, which further improved the study outcomes. Taking preventive measures to protect from the virus is also equally important, like vaccine development.
Article
Aim The aim of this non-inferiority randomized clinical trial was to compare the efficacy of an iodoform-based paste (Guedes-Pinto paste - GP) as a filling material in pulpectomies of primary teeth, and a standard material composed by calcium Hydroxide and iodoform (CaOH/Iodof paste; Vitapex®). Design 104 teeth from 61 children (3 to 8 years old) were randomly allocated to two groups according to filling materials. Children were followed for 24 months. The primary endpoint was the treatment success rate evaluated through clinical and radiographic examination at follow-up. The canal filling quality was analysed as secondary outcome. Differences in the proportion of treatment success was calculated with the Miettinen and Nurminen’s method in the intention-to-treat population, considering a -20% of non-inferiority limit. Results From 104 randomized teeth, 102 were followed after 24 months (attrition rate of 1.9%). The success rate (95% confidence intervals – 95%CI) of teeth treated with GP paste was 86.8% (69.9 to 94.9%) and 78.4% (61.8 to 89.1%) with CaOH/Iodof paste. Consequently, a non-inferiority of GP paste was observed when compared to CaOH/Iodof paste (p < 0.001). Conclusion GP paste has a non-inferior success rate against CaOH/Iodof paste used as filling material for pulpectomy in primary teeth.
Article
Full-text available
This note addresses four interrelated issues for a stratified comparative trial with a binary endpoint: 1. How to define the true overall treatment effect parameter, 2. How the strata should be weighted when conducting inference and estimation involving the overall treatment effect, 3. How to (and how not to) test for a treatment by stratum (T × S) interaction, and 4. When, why, and how the outcome of the T × S test should influence the weights assigned to each stratum. Numerical examples are provided to reinforce the key points.
Article
Full-text available
In this paper, we examine comparative analysis of rates with a view to each of the usual comparative parameters-rate difference (RD), rate ratio (RR) and odds ratio (OR)-and with particular reference to first principles. For RD and RR we show the prevailing statistical practices to be rather poor. We stress the need for restricted estimation of variance in the chi-square function underlying interval estimation (and also point estimation and hypothesis testing). For RR analysis we propose a chi-square formulation analogous to that for RD and, thus, one which obviates the present practice of log transformation and its associated use of Taylor series approximation of the variance. As for OR analysis, we emphasize that the chi-square function, introduced by Cornfield for unstratified data, and extended by Gart to the case of stratified analysis, is based on the efficient score and thus embodies its optimality properties. We provide simulation results to evince the better performance of the proposed (parameter-constrained) procedures over the traditional ones.
Article
When comparing two treatments in a stratified trial with a binary endpoint, data are commonly analysed using a weighted averaging of the stratum-specific differences between proportions. Two popular sets of weights are the harmonic means of the stratum-specific sample sizes (SSIZE) and the reciprocals of the variances of the stratum-specific differences (INVAR). Either the SSIZE or INVAR weights are chosen and prespecified in the data analysis plan. We show that the ‘wrong’ choice between SSIZE and INVAR can result in a significantly inefficient analysis. To circumvent this potential problem, we propose a ‘minimum risk’ (MR) weighting strategy. The easy-to-compute MR weights are designed to yield more precise and less biased estimates of the overall treatment difference relative to the SSIZE and INVAR weights, respectively. We show, via a simulation study, that the proposed weights are an attractive compromise between the SSIZE and INVAR weights in terms of statistical power. Numerical examples are presented to illustrate the utility of the MR weights. Copyright © 2000 John Wiley & Sons, Ltd.
Article
This paper discusses design considerations and the role of randomization-based inference in randomized community intervention trials. We stress that longitudinal follow-up of cohorts within communities often yields useful information on the effects of intervention on individuals, whereas cross-sectional surveys can usefully assess the impact of intervention on group indices of health. We also discuss briefly special design considerations, such as sampling cohorts from targeted subpopulations (for example, heavy smokers), matching the communities, calculating sample size, and other practical issues. We present randomization tests for matched and unmatched cohort designs. As is well known, these tests necessarily have proper size under the strong null hypothesis that treatment has no effect on any community response. It is less well known, however, that the size of randomization tests can exceed nominal levels under the ‘weak’ null hypothesis that intervention does not affect the average community response. Because this weak null hypothesis is of interest in community intervention trials, we study the size of randomization tests by simulation under conditions in which the weak null hypothesis holds but the strong null hypothesis does not. In unmatched studies, size may exceed nominal levels under the weak null hypothesis if there are more intervention than control communities and if the variance among community responses is larger among control communities than among intervention communities; size may also exceed nominal levels if there are more control than intervention communities and if the variance among community responses is larger among intervention communities. Otherwise, size is likely near nominal levels. To avoid such problems, we recommend use of the same numbers of control and intervention communities in unmatched designs. Pair-matched designs usually have size near nominal levels, even under the weak null hypothesis. We have identified some extreme cases, unlikely to arise in practice, in which even the size of pair-matched studies can exceed nominal levels. These simulations, however, tend to confirm the robustness of randomization tests for matched and unmatched community intervention trials, particularly if the latter designs have equal numbers of intervention and control communities. We also describe adaptations of randomization tests to allow for covariate adjustment, missing data, and application to cross-sectional surveys. We show that covariate adjustment can increase power, but such power gains diminish as the random component of variation among communities increases, which corresponds to increasing intraclass correlation of responses within communities. We briefly relate our results to model-based methods of inference for community intervention trials that include hierarchical models such as an analysis of variance model with random community effects and fixed intervention effects. Although we have tailored this paper to the design of community intervention trials, many of the ideas apply to other experiments in which one allocates groups or clusters of subjects at random to intervention or control treatments.
Article
Kernel densities provide accurate non-parametric estimates of the overlapping coefficient or the proportion of similar responses (PSR) in two populations. Non-parametric estimates avoid strong assumptions on the shape of the populations, such as normality or equal variance, and possess sampling variation approaching that of parametric estimates. We obtain accurate standard error estimates by bootstrap resampling. We illustrate the practical use of these methods in two examples and use simulations to explore the properties of the estimators under various sampling situations. Copyright (C) 2001 John Wiley & Sons, Ltd.
Article
The availability of biomarkers has led to the increasing adoption of the pre- and post-randomization designs in clinical trials. In this paper we discuss the use of random effects models when the primary objective of the trial is to assess the efficacy of newly developed treatments in terms of progression of biomarkers over time. One issue of particular interest is how to best utilize the pre-randomization responses. We discuss the pros and cons of several inferential procedures including the conditional and full likelihood approaches. Throughout, these issues are illustrated by an analysis of data from a schizophrenic trial by the Janssen Research Foundation.
Article
This paper discusses design considerations and the role of randomization-based inference in randomized community intervention trials. We stress that longitudinal follow-up of cohorts within communities often yields useful information on the effects of intervention on individuals, whereas cross-sectional surveys can usefully assess the impact of intervention on group indices of health. We also discuss briefly special design considerations, such as sampling cohorts from targeted subpopulations (for example, heavy smokers), matching the communities, calculating sample size, and other practical issues. We present randomization tests for matched and unmatched cohort designs. As is well known, these tests necessarily have proper size under the strong null hypothesis that treatment has no effect on any community response. It is less well known, however, that the size of randomization tests can exceed nominal levels under the ‘weak’ null hypothesis that intervention does not affect the average community response. Because this weak null hypothesis is of interest in community intervention trials, we study the size of randomization tests by simulation under conditions in which the weak null hypothesis holds but the strong null hypothesis does not. In unmatched studies, size may exceed nominal levels under the weak null hypothesis if there are more intervention than control communities and if the variance among community responses is larger among control communities than among intervention communities; size may also exceed nominal levels if there are more control than intervention communities and if the variance among community responses is larger among intervention communities. Otherwise, size is likely near nominal levels. To avoid such problems, we recommend use of the same numbers of control and intervention communities in unmatched designs. Pair-matched designs usually have size near nominal levels, even under the weak null hypothesis. We have identified some extreme cases, unlikely to arise in practice, in which even the size of pair-matched studies can exceed nominal levels. These simulations, however, tend to confirm the robustness of randomization tests for matched and unmatched community intervention trials, particularly if the latter designs have equal numbers of intervention and control communities. We also describe adaptations of randomization tests to allow for covariate adjustment, missing data, and application to cross-sectional surveys. We show that covariate adjustment can increase power, but such power gains diminish as the random component of variation among communities increases, which corresponds to increasing intraclass correlation of responses within communities. We briefly relate our results to model-based methods of inference for community intervention trials that include hierarchical models such as an analysis of variance model with random community effects and fixed intervention effects. Although we have tailored this paper to the design of community intervention trials, many of the ideas apply to other experiments in which one allocates groups or clusters of subjects at random to intervention or control treatments.