ArticlePDF Available

A comparison of methods for meta-analysis of a small number of studies with binary outcomes

Authors:
  • University Medical Center Goettingen

Abstract and Figures

Meta‐analyses often include only a small number of studies (≤5). Estimating between study heterogeneity is difficult in this situation. An inaccurate estimation of heterogeneity can result in biased effect estimates and too narrow confidence intervals. The beta‐binominal model has shown good statistical properties for meta‐analysis of sparse data. We compare the beta‐binominal model with different inverse variance random (e.g., DerSimonian‐Laird, modified Hartung‐Knapp, Paule‐Mandel) and fixed effect methods (Mantel‐Haenszel, Peto) in a simulation study. The underlying true parameters were obtained from empirical data of actually performed meta‐analyses to best mirror real life situations. We show that valid methods for meta‐analysis of a small number of studies are available. In fixed effect situations the Mantel‐Haenszel and Peto method performed best. In random effect situations the beta‐binominal model performed best for meta‐analysis of few studies considering the balance between coverage probability and power. We recommended the beta‐binominal model for practical application. If very strong evidence is needed, using the Paule‐Mandel heterogeneity variance estimator combined with modified Hartung‐Knapp confidence intervals might be useful to confirm the results. Notable, most inverse variance random effects models showed unsatisfactory statistical properties also if more studies (10‐50) were included in the meta‐analysis.
Content may be subject to copyright.
RESEARCH ARTICLE
A comparison of methods for metaanalysis of a small
number of studies with binary outcomes
Tim Mathes
1,2
| Oliver Kuss
3,4
1
Institute for Research in Operative
Medicine, Witten/Herdecke University,
Ostmerheimer Str 200, Building 38, 51109
Cologne, Germany
2
Institute of Medical Biometry and
Informatics, University of Heidelberg,
Heidelberg, Germany
3
Institute for Biometrics and
Epidemiology, German Diabetes Center,
Leibniz Institute for Diabetes Research at
Heinrich Heine University Düsseldorf,
Düsseldorf, Germany
4
Institute of Medical Statistics, Düsseldorf
University Hospital and Medical Faculty,
Heinrich Heine University Düsseldorf,
Düsseldorf, Germany
Correspondence
Tim Mathes, Institute for Research in
Operative Medicine, Witten/Herdecke
University, Ostmerheimer Str 200,
Building 38, 51109 Cologne, Germany.
Email: tim.mathes@uniwh.de
Metaanalyses often include only a small number of studies (5). Estimating
betweenstudy heterogeneity is difficult in this situation. An inaccurate
estimation of heterogeneity can result in biased effect estimates and too narrow
confidence intervals. The betabinominal model has shown good statistical
properties for metaanalysis of sparse data. We compare the betabinominal
model with different inverse variance random (eg, DerSimonianLaird,
modified HartungKnapp, and PauleMandel) and fixed effects methods
(MantelHaenszel and Peto) in a simulation study. The underlying true param-
eters were obtained from empirical data of actually performed metaanalyses to
best mirror reallife situations. We show that valid methods for metaanalysis
of a small number of studies are available. In fixed effects situations, the
MantelHaenszel and Peto methods performed best. In random effects
situations, the betabinominal model performed best for metaanalysis of few
studies considering the balance between coverage probability and power. We
recommended the betabinominal model for practical application. If very
strong evidence is needed, using the PauleMandel heterogeneity variance
estimator combined with modified HartungKnapp confidence intervals might
be useful to confirm the results. Notable most inverse variance random effects
models showed unsatisfactory statistical properties also if more studies (1050)
were included in the metaanalysis.
KEYWORDS
few studies, heterogeneity variance estimators, metaanalysis, simulation study
Abbreviations: BB1N, betabinomial regression (maximum likelihood) with confidence intervals using a Student tdistribution with the degrees of
freedom equals the number of studies; BB2N, betabinomial regression (maximum likelihood) with confidence intervals using a Student t
distribution with the degrees of freedom are the number of studies multiplied by 2; BBFM, estimation of the betabinomial using PROC FM; BBIN,
betabinomial regression (maximum likelihood) with Waldtype confidence intervals; BBMP, betabinomial regression (maximum likelihood) with
confidence intervals using a Student tdistribution with the degrees of freedom are the number of studies multiplied by 2 minus 3 to account for
the estimation of the 3 distributional parameters; BBQD, betabinomial regression (quasilikelihood) with Student tdistribution with the degrees of
freedom are the number of studies multiplied by 2 minus 3 to account for the estimation of the 3 distributional parameters; BBQL, betabinomial
regression (quasilikelihood) with Waldtype confidence intervals; DLRE, random effects model with DerSimonianLaird betweenstudy variance
estimator and Waldtype confidence intervals; HKSJ, random effects model with DerSimonianLaird betweenstudy variance estimator and
HartungKnappSidikJonkman confidence intervals; MaHa, MantelHaenszel fixed effects model; MKH, random effects model with DerSimonian
Laird betweenstudy variance estimator and modified HartungKnapp confidence intervals; PMHK, random effects model with PauleMandel
betweenstudy variance estimator and HartungKnappSidikJonkman confidence intervals; PMRE, random effects model with PauleMandel
betweenstudy variance estimator and Waldtype confidence intervals; YPET, Peto odds ratio method.
Received: 28 September 2017 Revised: 13 December 2017 Accepted: 12 February 2018
DOI: 10.1002/jrsm.1296
Res Syn Meth. 2018;116. Copyright © 2018 John Wiley & Sons, Ltd.wileyonlinelibrary.com/journal/jrsm 1
1|INTRODUCTION
Metaanalyses frequently include only a small number of
studies. An analysis of 14 886 metaanalyses from the
Cochrane Library found that the median number of stud-
ies in the metaanalyses
1
was as low as 3. About 50% of
metaanalyses include 2 or 3 studies, and less than 10%
include 10 or more studies.
2
In some research areas, few
studies in metaanalyses are rather the rule than the
exception. This particularly applies to the assessment of
new interventions in health technology assessments,
orphan diseases, and metaanalysis of subgroups (eg, in
stratified medicine).
3,4
In addition to the problem of few
studies for inclusion in the metaanalysis, the studies can
be small and the number of events can be low (eg, when
considering adverse events).
1
For such application, areas
of sparse data, metaanalytic techniques are probably most
valuable because they enable collecting the complete
existing evidence. However, the use of inadequate meta
analytic methods for sparse data can result in invalid effect
estimates and even in wrong conclusions.
5,6
Metaanalysis of only a handful of studies poses a
number of challenges.
7
The reason for this is that the cen-
tral limit theorem does not apply when only a few studies
are included in a metaanalysis. The standard inverse var-
iance random effects metaanalysis incorporates the
betweenstudy variation (heterogeneity, t
2
) to estimate
the overall effect. In case of sparse data, the estimation
of t
2
and consequently θcan be highly imprecise.
8
Simulation studies have shown invalid results if the stan-
dard DerSimonianLaird method is used for low event
rates and few studies in the metaanalysis.
2,5,9
Also other
random effects methods for constructing confidence
intervals based on t
2
often perform poorly when only a
few studies are included in the metaanalysis.
10
Previous work of our group has shown that for meta
analysis of rare or even zero events, random effects
models are more accurate than fixed effects models,
especially in a truly heterogeneous situation.
11
In particu-
lar, the betabinominal regression model provided valid
pooled effect measures and confidence intervals in this
study.
11
Another simulation study that compared the
statistical properties of different random effects models
suggested that the HartungKnappSidikJonkman
method outperforms the DerSimonianLaird random
effects approach for metaanalysis of few studies.
12
The
HartungKnappSidikJonkman method can be overcon-
servative, when less than 5 studies are included in the
metaanalysis.
13
In sum, choosing the right metaanalytic
approach for metaanalysis of 2 to 5 studies is difficult.
9,10
Our aim was thus to compare different frequentist
methods for metaanalysis including only a small num-
ber of studies (5) with a focus on the betabinominal
regression and the modified HartungKnappSidik
Jonkman approach.
11,14
2|METHODS
2.1 |Statistical methods for metaanalysis
We consider situations where 2 interventions are com-
pared in a series of studies i(i=1,,I) considering
binary outcomes. We are interested in the estimation of
the overall intervention effect θand use the log odds ratio
to quantify the difference of interventions between
groups. The data for each study consist of the interven-
tion effect θ
i
, the sample size in the intervention group
n
iT
, and the sample size in the control group n
iC
(overall
sample size N
i
). In each study, there are some (or zero)
events in the intervention group y
iT
and in the control
group y
ic
. Each study has a specific sampling error 0
i
and a withinstudy variance s
i2
.
In the first section, we describe the statistical models
for metaanalysis that are included in the comparison.
In the second section, we describe the design of the
simulation study. The third section provides an overview
of the measures to assess the statistical properties, and in
the fourth section, we introduce our empirical example
dataset.
We consider 2 different types of models: fixed effects
models and random effects models.
2.1.1 |Fixed effects models
The fixed effects model is based on the assumption that
all studies in the metaanalysis have a common effect
(θ). The fixed effects model can be written as
15
b
θi¼θþσi0i;0iN0;1ðÞ:
The study effects b
θiin study iare distributed about the
common effect with the studyspecific variance (s
i2
) and
the sampling error (0
i
).
MantelHaenszel method
The MantelHaenszel (MaHa) method is a weighted aver-
age of the studyspecific risk ratios or risk differences.
16
The MaHa odds ratio for the overall effect is given by
17
ORMaHa ¼I
i¼1b
θiwiMaHaðÞ
I
i¼1wiMaHaðÞ
:
The weights are given with w
i(MaHa)
=ziTyiC
Ni
, where z
i
is the number of nonevents in the intervention group.
2MATHES AND KUSS
We included the MaHa (SAS PROC FREQ) method in
our analysis because it performs better than do the
standard fixed effects model in case of sparse data and it
is the standard fixed effects model in Cochrane
Reviews.
17-19
Peto odds ratio method
The Peto odds ratio (YPET) is an inverse variance
approach; ie, the studies are weighted by the inverse of
the studyspecific variance (w
i(FIX)
=1
σ2
i
).
20
The pooled
Peto log odds ratio is estimated as
21
Log ORYPET
ðÞ¼
I
i¼1OiEi
I
i¼1V2
i
;
where O
i
is the observed number of events, E
i
is the
expected number of events, and V
i
is the variance of their
difference.
We considered the YPET in our analysis because it is
the standard method for metaanalysis for small interven-
tion effects or very rare events.
19,21
2.1.2 |Inverse variance random effects
models
The random effects models are based on the assumption
that there is no common effect in all studies but that
the mean of the effects of the study population is the
mean of the distribution of the true effect. The study
effects randomly follow (usually) a normal distribution.
In general, the intervention effect in study iof the
random effects model can be expressed as
b
θi¼θiþσiϵi;ϵiN0;1ðÞ;θiNθ;t2Þ:ð
The pooled study effect of an inverse random effects
metaanalysis can be estimated by
b
θR¼I
i¼1wiREMðÞ
b
θi
I
i¼1wiREM
ðÞ
;
where w
i
are the studyspecific weights. The study
weights are adjusted according to the betweenstudy
variation. The betweenstudy variance (t
2
) has to be
estimated for random effects models in addition to the
withinstudy variance s
i2
. Actually, the study weights
are the inverse of the sum of the withinstudy variance
and the betweenstudy variance wiREMðÞ
¼1
σ2
iþτ2

.
Various methods to estimate the betweenstudy
variance t
2
for an inverse random effects model exist
(eg, ordinary least squares and maximum likelihood).
22
In this analysis, we consider the DerSimonianLaird
estimator and the PauleMandel estimator
23,24
for
betweenstudy variance t
2
.
Betweenstudy variance estimators (DerSimonian/
Laird and Paule/Mandel)
The standard DerSimonianLaird estimator is a
noniterative betweenstudy variance estimator motivated
by the methodofmoments principle.
23,25
To be concrete,
t
2DL
is estimated by the following equation:
t2DL ¼0;
QI1ðÞ
I
i¼1wiFIX
ðÞ
I
iw2
iFIXðÞ
I
i¼1wiFIXðÞ
8
>
>
>
>
<
>
>
>
>
:
9
>
>
>
>
=
>
>
>
>
;
;
where Qis the heterogeneity statistic (Qstatistic) given by
Qτ2DL

¼I
i¼1wiFIXðÞ
b
θib
θF

;
where b
θFis the pooled effect of a fixed effects model. The
weights w
i(FIX)
in this equation are given by the inverse
of the withinstudy variance wiFIXðÞ
¼1
σ2
i

;which is, in
general, unknown and has to be replaced by an estimate.
We estimated Qusing PROC GLM.
26
The betweenstudy variance estimator t
2DL
is most
widely used for random effects metaanalysis and also
the standard random effects model in Cochrane
Reviews.
19
Therefore, we include the DerSimonianLaird
method in the analysis as reference method.
The PauleMandel method is an iterative between
study variance estimator.
24
The estimation of t
2PM
is
based on the generalized Qstatistic.
25,27
This can be
expressed as
25,28
Qτ2PM

¼
I
i¼1
wiPMðÞ
b
θi
b
θτ
2PM


;
where b
θτ
2PM

is given by
b
θτ
2PM

¼I
i¼1wiPMðÞ
I
i¼1wiPMðÞ
×b
θi
and wiPMðÞ
¼1
σ2
iþτ2PM
·Qτ2PM
ðÞhas an expectation of
I1 and is solved by iterating t
2PM
until convergence
is reached.
28
The PauleMandel estimator was incorporated in the
analysis because it has been recently recommended in a
review paper on the performance of betweenstudy
variance estimators.
22
Our calculation of t
2PM
was based
MATHES AND KUSS 3
on algorithm proposed by DerSimonian and Kacker (SAS
macro; see Supporting Information I).
25
We validated our
estimations of t
2
with the respective results of the R
metafor package.
29
Confidence intervals for inverse random effects
models (Waldtype, HartungKnappSidikJonkman,
and HartungKnapp modification)
Confidence intervals using the DerSimonianLaird
method are constructed with
b
θ±z1
α
2
×b
σ;
where z
1
α
2
is the 1
α
2quantile of the standard normal
distribution (Waldtype confidence intervals) and the
standard error is given by
b
σ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
I
i¼1wiDLðÞ
s:
HartungKnapp and SidikJonkman suggested an
adjustment factor for the standard error that uses the
quantile of the Student tdistribution with I1 degrees
of freedom instead of the Waldtype interval to calculate
confidence intervals.
30-32
The adjustment factor is calcu-
lated by
q¼1
I1
I
i¼1
wiDLðÞ
b
θib
θR

;
where b
θRis the estimated pooled effect from a
DerSimonianLaird random effects metaanalysis. This
leads to the adjusted confidence interval
12
:
b
θ±t
I1ðÞ;1
α
2

×qb
σ:
Although in general the confidence interval tends to be
larger than the Waldtype confidence interval, it can be
smaller, when qis very small. For this reason, Hartung
and Knapp proposed the following ad hoc modification
of q
12,33
:
q*¼max 1;qðÞ:
If t
2PM
is iterated via the Qstatistic above, then q*
always equals 1 because qalways equals 1 or less if no
solution exists.
12
This means that combining the Paule/
Mandel estimator with HartungKnappSidikJonkman
(and also the modified HartungKnapp method)
confidence intervals is similar to using a Paule and
Mandelderived pooled effect estimator with confidence
intervals based on the quantiles of the Student tdistribu-
tion with I1 degrees of freedom:
b
θτ
2PM

±t
I1ðÞ;1
α
2

×b
στ
2PM

:
The HartungKnapp confidence intervals and its mod-
ification were validated with the respective results of the
R metafor package.
29
We considered the HartungKnapp method and its
modification because of the promising results in various
recent simulation studies also in the case of few
studies.
12,13,34
We combined both betweenstudy variance estimators
with 3 different methods to estimate the confidence
intervals (Waldtype, HartungKnappSidikJonkman,
and HartungKnapp modification).
12,14,25,31
The combination of the different betweenstudy vari-
ance estimators and confidence intervals results in the
following 5 inverse variance random effects models:
(1) DerSimonianLaird betweenstudy variance estima-
tor and Waldtype confidence intervals (standard
random effects model, DLRE).
23
(2) DerSimonianLaird betweenstudy variance estima-
tor and HartungKnappSidikJonkman adjusted
confidence intervals (HKSJ).
23,31,32
(3) DerSimonianLaird betweenstudy variance
estimator and modified HartungKnapp adjusted
confidence intervals (MHK).
14,23
(4) PauleMandel betweenstudy variance estimator and
Waldtype confidence intervals (PMRE).
24
(5) PauleMandel betweenstudy variance estimator and
HartungKnappSidikJonkman confidence intervals
(PMHK).
24,31
2.1.3 |Betabinomial regression
The models described above are 2 stage models. This
means that in the first step, aggregated measures are
calculated for each study separately and in the second,
subsequent step these measures are combined.
35
Opposed
to this, the betabinominal regression is a 1 stage model.
This means the analysis is performed in 1 step similar
to individual patient data regression analysis.
35
Moreover,
the betabinomial model is a true (studyspecific) random
effects regression model.
11
Betabinomial (random effects) regression model
In the betabinomial model, we assume that the observed
proportions (p
iC
=y
iC
/n
iC
) in the control group follow a
binominal distribution B(p,n
iC
). Further, we assume
4MATHES AND KUSS
the success probability pto be betadistributed with
parameters aand b. The mean and variance of pare
given by E(p)=μ=a/(a+b) and Var(p)=μ(1 μ)ϑ/
(1 + ϑ), respectively, where ϑ= 1/(a+b). Consequently,
the response y
iC
is betabinominally distributed with
mean E(y
iC
)=n
iC
μand variance Var(y
iC
)=n
iC
μ(1μ)
[1 + (n
iC
1) ϑ/(1+ϑ)]. The outcomes of 2 observations
from the same study are then correlated with corr(y
iT
,
y
iC
)=q= 1/(a+b+1).
The intervention effect is modelled via a link function
gby g(μ)=b
0
+b
T
x
T
, where x
T
= 1 for the intervention
group and x
T
= 0 for the control group. In our study, we
use the logit link for gto arrive at log odds ratios to mea-
sure the intervention effect.
The betabinomial model performed well in a prior
model comparison of metaanalysis for rare events.
11
This
raises the question whether the betabinominal regres-
sion model also performs well for metaanalysis with a
small number of studies.
Estimation methods and confidence intervals for the
betabinomial regression model
We used different parameters (maximum likelihood and
quasilikelihood) and confidence interval estimation
methods (Waldtype and tdistributed with different num-
bers for the degrees of freedom) for the betabinomial
model, resulting in the 6 following implementations of
the betabinomial model in the comparison:
(1) Estimation of the betabinominal model via maxi-
mum likelihood (SAS PROC NLMIXED).
(a) Combined with Waldtype confidence intervals
(z
1
α
2
,1
α
2quantile of the standard normal dis-
tribution, denoted BBIN in the following).
(b) Combined with confidence intervals using a
tdistribution with the degrees of freedom equal
to the number of studies (t
IðÞ;1
α
2

, BB1N).
(c) Combined with confidence intervals using a tdis-
tribution with the degrees of freedom equal to
twice the number of studies (t
I×2ðÞ;1
α
2

, BB2N).
(d) Combined with confidence intervals using a tdis-
tribution with the degrees of freedom equal to
twice the number of studies minus 3 to account
for the estimation of the 3 distributional parame-
ters (t
I×23ðÞ;1
α
2

, BBMP).
36,37
(2) Estimation of the betabinominal model via quasi
likelihood (SAS PROC GLIMMIX). When using the
quasilikelihood principle, one specifies only mean
and variance but not the complete betabinomial
distribution. We assumed that this would increase
robustness of results with a small loss in efficiency.
We combined this effect estimate with the following:
(a) Waldtype confidence intervals (BBQL).
(b) Confidence intervals using a tdistribution with
the degrees of freedom equal to the number of
studies multiplied by 2 minus 3 to account for
the estimation of the 3 distributional parameters
(BBQD).
(3) Estimation of the betabinominal model via
maximum likelihood using SAS PROC FMM and
Waldtype confidence intervals. The advantage of
SAS PROC FMM is that the starting values for the
betadistribution are directly estimated by the proce-
dure. Consequently, it is not necessary to estimate
starting values beforehand that facilitates the imple-
mentation of the betabinominal model.
Starting values for the betabinominal models using
SAS PROC NLMIXED (BBIN,BB1N,BB2N, and BBMP)
were computed from raw proportions, their variances,
and correlations.
2.2 |Simulations
We performed a simulation study to compare the statisti-
cal properties of the different metaanalytic methods. If
feasible, all true values for the design factors of the
simulation study were chosen from actually performed
metaanalyses to reflect reallife conditions as best as
possible. The most suitable source to get true values for
the simulation was the review of Turner et al, which
analysed 1991 systematic reviews from the Cochrane
Database of Systematic Reviews.
1
Overall this review
included 14 886 metaanalyses (which included at least
2 studies) of dichotomous outcomes on 77 237 single
studies.
2.2.1 |Design of simulation
We generated 10 000 metaanalyses for each simulation
scenario.
The focus of our study was to assess the performance
of the metaanalytic methods for pooling results of 5
studies. For the main analysis, we compared the methods
generating metaanalysis with 2, 3, 4, and 5 included
studies.
3
In a supplemental analysis, we considered sce-
narios with 10, 15, 20, 30, and 50 studies to get an impres-
sion how the compared methods behave, if the number of
studies included in the metaanalysis increases.
We generated event probabilities, sample sizes, and
heterogeneity estimates for each study included in the
MATHES AND KUSS 5
metaanalysis based on the Turner data.
1
Furthermore,
we explicitly varied some parameters to check the robust-
ness of the results.
Table 1 shows the input parameters for the simula-
tion, including the information whether these were
varied implicitly based on distributional assumptions or
varied explicitly based on fixed values. In addition, the
table provides the source/rational of the parameter choice
and whether the scenarios are the base case analysis or
are a sensitivity analysis for assessing the robustness of
the results.
The combination of the different (random effects/
fixed effects, H0/H1) scenarios and of the number of
studies led to 36 base case simulation scenarios (360.000
metaanalysis) in total. We performed the sensitivity
analysis (one randomly selected study 10 times larger,
TABLE 1 Description of the simulation
Distributional Assumptions Fixed Values
Parameter Input Parameter Specification
Description of
Resulting Data
Input Parameter
Specification
Base case analyses
Sample size of single study
a
Generated from a lognormal
distribution with mean = 4.615
and SD = 1.1
Median = 103
Q1 = 50
Q3 = 204
Allocation of sample size to
control and intervention group
(balance of study size between
groups)
Random allocation with
probability 0.5
Event probabilities control group
a
Generated from a betadistribution
with α= .4230 and β= 1.433
Mean = 0.223
Median = 0.126
SD = 0.256
Effect (event probabilities
intervention group)
Generated from a standard
inversevariance random effects
model (t
2
see column fixed and
random effects scenario)
a
Medium effect:
OR = 0.684
Events control/intervention group Binominal draw
Fixed effects scenario t
2
=0
Random effects scenario
a
Generated from a lognormal
with mean t
2
=1.47,
SD = 1.65, skewness = 0.55
b
Median
t
2
= 0.274
25% percentile
t
2
= 0.079
75% percentile
t
2
= 0.806
c
Mean I
2
= 17%,
SD = 30, range
099%
Sensitivity analyses
d
Balance of study size
13
One randomly selected
study 10 times larger
Event probabilities control group
d
Low = 0.1
High = 0.5
Effect (event probabilities
intervention group)
d
a
Large effect OR = 0.466
a
Small effect OR = 0.855
Abbreviations: OR, odds ratio; Q1, first quartile, 25% percentile; Q3, third quartile, 75% percentile; SD, standard deviation.
a
Informed by Turner et al.
1
b
Based on Fleishman power transformation.
c
Under H0.
d
Only simulated in the random effects scenario for 2 to 5 studies.
6MATHES AND KUSS
small intervention effect, large intervention effect, low
baseline event probability, and high baseline event prob-
ability in the control group) only for the setting with 2
to 5 studies, random effects scenario under H1 (medium
effect), leading to 20 additional simulation scenarios.
The simulation code is available in Supporting
Information I.
2.2.2 |Measures to assess performance of
methods
We performed all comparisons of the effect estimates on
the log odds ratio scale. We estimated median bias and
empirical coverage of the 95% confidence interval to
assess the statistical properties.
11
In the medium effect
scenario, we also calculated the empirical power for all
methods. Moreover, we counted the number of
completely missing pooled effect estimates, to judge the
numerical robustness of the compared methods. For this
analysis, we present the number of converged runs. The
analysis code is available in Supporting Information II.
3|RESULTS
3.1 |Base case analysis
We present the results separately for the fixed effects and
the random effects situation.
First, we describe the statistical properties in the no
effect scenario. Second, we describe the results of the sim-
ulation in the medium effect scenario. Here, we focus on
describing the difference in the performance compared
with H0 for the metaanalytic methods that performed well
regarding coverage under H0 and on the power analysis.
3.1.1 |No effect scenario
As was to be expected for almost all methods, the conver-
gence increased with increasing number of studies in the
metaanalysis (Figures 1 and 2; see data in Table S1 in
Supporting Information III). The main reason is that the
number of metaanalysis without any events decreased.
The betabinomial models that were estimated with
quasilikelihood (BBQD and BBQL) showed the tendency
of a decreasing convergence with increasing number of
studies. This is because the convergence criterion of the
procedure (SAS PROC GLIMMIX) was more often not
satisfied and thus the procedures stop without providing
parameter estimates. The reason is probably that with
an increasing number of studies, the estimation of param-
eters becomes more challenging, when the model has to
be estimated only based on means and variances.
The lines of BBIN, BB1N, BB2N, and BBMP as well as
BBQD and BBQL follow the same pattern because the log
odds ratio is estimated with the same procedure. Also the
lines for DLRE, HKSJ, MKH, PMRE, and PMHK show
the same shape because all metaanalyses without any
event in one arm are not estimable.
For metaanalysis of few studies, the betabinominal
models with a priori estimated starting values were most
robust (BBIN, BB1N, BB2N, and BBMP) followed by
YPET. All other methods performed quite similar with
about 15% to 20% nonconverged metaanalyses in the
few study situations.
All methods showed small median bias even in the
few study scenarios (Figures 3 and 4; see data in Table
S1 in Supporting Information III). The strongest negative
median bias was 0.047 (BBIN, BB1N, BB2N, and
BBMP), which corresponds to an odds ratio of 0.95. The
strongest positive bias could be also recognized in the
random effects scenario with 0.022 (BBQL and BBQD),
which corresponds to an odds ratio of 1.02.
The shape of the lines was exactly the same for BBIN,
BB1N, BB2N, and BBMP; BBQD and BBQL; DLRE, HKSJ,
and MKH; and PMRE and PMHK because these methods
are based on the same point estimators for the effect.
Bias of the effect estimators based on the
DerSimonianLaird heterogeneity variance estimator
FIGURE 1 Converged runs (H0 FEM)
MATHES AND KUSS 7
(DLRE, HKSJ, and MKH) was very similar to bias of
effect estimators based on the PauleMandel heterogene-
ity variance estimator (PMRE and PMHK).
In the fixed effects scenario, almost all methods
satisfied the empirical mean coverage to the 95% level
or were only marginally below this level (Figure 5; see
data in Table S1 in Supporting Information III).
Only the HKSJ method fell heavily below the
95% coverage level.
Again, in the random effects situation, only BBMP,
BB1N, BBQD, MKH, and PMHK satisfied or nearly satis-
fied the 95% empirical coverage level in the scenarios
with few studies (Figure 6). In particular, the fixed effects
models (MaHa and YPET) as well as DLRE and HKSJ
showed unacceptable low coverage probability in the ran-
dom effects situations. BBQD, BBQL, DLRE, HKSJ, and
MKH showed also very low coverage probability in situa-
tions with more than 5 studies included in the meta
analysis.
3.1.2 |Medium effect scenario
In general, the results for converged runs under H1 were
quite similar to the results under H0 (Table S1 in
Supporting Information IV).
Overall bias was small also under the alternative
(Figures S1 and S2 in Supporting Information V). Bias
was strongest in the random effects scenario. Maximum
FIGURE 2 Median bias (H0 FEM)
8MATHES AND KUSS
FIGURE 3 Mean empirical coverage (H0 FEM)
FIGURE 4 Converged runs (H0 REM)
MATHES AND KUSS 9
negative bias occurred using BBIN, BB1N, BB2N, and
BBMP with a log odds ratio of 0.06 (odds ratio, 0.94).
The effect estimates based on the DerSimonianLaird
heterogeneity variance estimator and the PauleMandel
heterogeneity variance estimator were strongest posi-
tively biased (log odds ratio, 0.074 and 0.075, respectively;
odds ratio, 1.08 in both methods).
Also the results for empirical coverage probability for
metaanalysis including few studies were similar to the
results of the no effect scenario. As in the zero effect
scenario, all but HKSJ satisfied the mean empirical
coverage to the 95% level in the fixed effects, few studies
scenarios (Figure S3 in Supporting Information V).
BBMP, BB1N, and BBQD as well as MKH and PMHK
satisfied the coverage probability throughout in all
random effects situations (Figure S4 in Supporting
Information V). BB1N was only marginally below the
empirical 95% coverage level in the scenarios with 2 to
5 studies (worst case: empirical coverage probabil-
ity = 0.926). In the random effects scenarios, in meta
analysis including more than 20 studies, not only DLRE
and HKSJ had a very low coverage probability but also
MKH. The coverage probability in the random effects,
large study scenarios was below 0.7 throughout for the
fixed effects models (MaHa and YPET).
Power depended strongly on the number of studies in
the metaanalysis (Figures 7 and 8). The fixed effects
models and HKSJ showed highest median power for
FIGURE 5 Median bias (H0 REM)
10 MATHES AND KUSS
metaanalysis of 5 studies. Also with BBFM, BBIN,
BBQL, and DLRE, there was a small probability to detect
a statistical significant difference in the few study meta
analysis, in the fixed effects as well as the random effects
situations. All other methods had a power close to zero
in the metaanalyses of 2 studies. It should be regarded
that the power of the methods showing favourable
results considering power might be overestimated
because these methods were the less robust methods
(BBFM, BBQL, BBQD, and MaHa); ie, challenging
situations with low event rates are not estimable and
consequently are not included in the median power
analysis. In metaanalysis of 10 studies, power of all
methods was quite similar.
3.2 |Sensitivity analysis
We performed the sensitivity analysis only for the
random effects, few study scenario because this is most
challenging and most relevant for practice. In the
sensitivity analysis, we include only BB1N and BBMP
because these betabinominal models performed best in
the base case analysis.
3.2.1 |One large study
The empirical coverage probability of DLRE, HKSJ, and
the fixed effects models (YPET and MaHa) further
decreased compared with the results of the random
FIGURE 6 Mean empirical coverage (H0 REM)
MATHES AND KUSS 11
effects situation under H0 with similar study sizes. The
empirical coverage probability of BB1N fell slightly below
the 95% confidence interval.
Figures for the sensitivity analysis on unbalanced
study size are given in Supporting Information VI
(Figures S1S3).
FIGURE 7 Mean power (H1 FEM)
FIGURE 8 Mean power (H1 REM)
12 MATHES AND KUSS
3.2.2 |Different intervention effect sizes
Using a small as well as a large intervention effect did not
alter bias and empirical coverage as compared with the
medium effect scenario. As expected, power increased in
the large intervention effect scenario and decreased in
the small intervention effect scenario.
Figures S4 to S6 in Supporting Information VI show
the results for the small intervention effect and Figures
S7 to S9 the results for large intervention effect.
3.2.3 |Different baseline risks in the con-
trol group
Using a low baseline as well as high baseline event prob-
ability in the control group did also not change bias and
empirical coverage as compared with the medium effect
scenario. Also as expected power increased in the high
baseline event probability scenario and decreased in the
low baseline event probability scenario.
Figures for the sensitivity analysis on the baseline risk
in the control group are available in Supporting Informa-
tion VI: Figures S10 to S12 are for the low baseline risk
and Figures S13 to S15 are for the high baseline risk.
4|DISCUSSION
This simulation shows that valid statistical methods for
metaanalysis of few studies are available. Our simulation
was designed based on empirical data to reproduce real
world situations. Considering the frequency of meta
analysis including only few studies, our results are highly
relevant for applied research.
1
This is in particular true,
in areas where few study metaanalysis is the normal case
like rare diseases, health technology assessment, sub-
group, or sensitivity analysis.
The standard approach for metaanalysis is still the
DerSimonianLaird method. Our simulation shows that
the DerSimonianLaird method was heavily above the
5% type I error rate in all random effects situations.
Consequently, it can be supposed that a large number of
false positive metaanalysis exist.
5,13
False positive meta
analysis results might have serious consequences for
clinical and health policy decisionmaking because
results of metaanalysis build the basis for statements in
clinical practice guidelines and health technology
assessments.
38,39
On the one hand, using adequate methods is very
important to avoid incorrect conclusions.
2,11
On the other
hand, the power has a central role because metaanalysis
of few studies often agrees with future results of meta
analysis including more studies.
40
Very conservative
methods might lead to unnecessary research or rejection
of useful new interventions by health technology assess-
ment agencies. The bias of all methods was very small
in all simulation scenarios. Therefore, the main issue to
judge the validity of methods here, to arrive at
recommendations on the right metaanalysis method, is
the critical balance between the empirical coverage
probability and power.
In the fixed effects scenario, the classical fixed effects
models (MaHa and YPET) had the best statistical
properties. If the assumption of a fixed effects model
can be reliably justified, ie, the included studies are clin-
ically homogenous, then fixed effects models seem prefer-
able. On the one hand, this suggestion is supported by the
fact that a sufficient estimation of heterogeneity is diffi-
cult in metaanalysis of few studies.
2
On the other hand,
caution is needed in the case of small studies because
these are often more heterogeneous than large studies.
41
The betabinominal model was most robust and can be
recommended as the first choice in challenging situations
(few studies, rare events), in particular for the case that
the other models cannot estimate effects.
11
In the random effects situation, only MKH, PMHK,
and BBMP satisfied the type I error rate. As in prior
studies, in particular, MKH and PMHK were very conser-
vative for 2 to 3 studies in the metaanalysis also in the
random effects situations.
12,15,34
These overly conserva-
tive confidence intervals resulted in a power almost of
zero. The betabinominal model (BB1N) and PMRE seem
a good alternative for metaanalysis including only a few
studies because both have a higher power and fell only
slightly below the 95% coverage probability.
In general, using Student tdistributed confidence
intervals accounts better for the uncertainty that is
induced by estimating the betweenstudy variance. The
termination of the optimal number of degrees of freedom
for the Student tdistribution is difficult because it
depends on betweenand withinstudy variances and
the number of included studies in the metaanalysis.
8
Our simulation indicates that for the number of degrees
of freedom, BB1N (number of studies minus 1 degree of
freedom) is the best choice, whereas BBMP (2 times num-
ber of studies minus 3 degrees of freedom) is more con-
servative but still valid.
Prior studies have recommended HKSJ.
9,13,15,42
In our
simulation, the HKSJ was heavily below the specified
type I error level in all situations. This occurs when the
correction factor qis arbitrary small.
12
The risk is higher
in the case of small event rates because results are more
homogeneous than expected under a random effects
model and in the case that the standard errors of included
studies are heterogeneous.
18
Considering that our simula-
tion is based on empirical data, there seems to be a very
high risk of anticonservative results in practice when
MATHES AND KUSS 13
HKSJ is used.
12
Thus, we also discourage the use of the
original HKSJ for few as well as many studies in meta
analysis.
43
DLRE and HKSJ showed very low empirical coverage
levels, also in the metaanalysis of more than 20 studies,
suggesting that heterogeneity is underestimated by these
methods. Also MKH, which has been recently suggested
as new standard methods for random effects metaanaly-
sis, showed unsatisfactory empirical coverage.
43
The
reason might be that if true heterogeneity exists, these
methods might fail to detect it (the value of t
2
being
zero).
2
The impact is stronger with increasing number
of studies in the metaanalysis because the influence of
t
2
is missing in the weight of each missed multiple time
in the calculation of the total variance.
We could show that also in the case of metaanalysis
including more studies (20), statistical properties of 2
stage random effects metaanalyses can be substantially
improved as compared with the standard DerSimonian
Laird method by combining the PauleMandel heteroge-
neity variance estimator with HartungKnapp confidence
intervals.
Our finding on the PauleMandel estimator disagrees
with a study that compared 20 heterogeneity variance
estimators. In this study, it is concluded that the Paule
Mandel estimator provides good estimation behaviour
but not markedly better than the other alternatives.
44
In our study, the PauleMandel estimator outperformed
the DerSimonianLaird method in all scenarios, in partic-
ular if combined with tdistributed confidence intervals.
The main reasons are probably that in contrast to our
simulation only scenarios with 5 to 30 studies were
simulated and only confidence intervals based on the
normal approximation were used in the comparison.
Moreover, the simulation was not informed by reallife
metaanalysis and consequently is very different to our
simulation (eg, in heterogeneity, effect sizes).
Our findings were quite robust across all sensitivity
analysis. Neither the size of the intervention effect nor
the baseline risk in the control group had considerable
influence on the results. We varied the distribution of
study sizes in our simulation because these are common
scenarios and other studies have shown that this can
influence the statistical performance.
12,13
In our
simulation, we could not ascertain the finding by IntHout
et al that different study sizes had a strong influence on
the empirical coverage probability.
13
We found only a
very slight negative effect on coverage probability of the
2 stage random effects models in the random effects
situations.
Our study has some limitations.
First, we did not explicitly vary the distribution of
group size in our simulation because most studies apply
equal allocation ratios. An unbalanced size of study arms
can influence the performance of the statistical methods.
7
Our results might not pertain fully to metaanalysis of
studies with unbalanced sample sizes (eg, different alloca-
tion ratios and nonrandomized studies).
Second, heterogeneity is often hard to detect in prac-
tice.
2
As we used realworld heterogeneity estimates in
the simulation, we might have tended to underestimate
heterogeneity, ie, overestimated the number of meta
analysis with zero heterogeneity as well.
Third, we did not consider Bayesian methods, which
also have shown promising results for metaanalysis of
few studies in recently published studies.
24,34,45,46
In
particular, if appropriate (weakly) informative prior infor-
mation exists, such Bayes methods could be consid-
ered.
22,47
Simulations have shown that Bayes methods
might increase power in few study metaanalysis while
holding the type I error rate compared with other
methods (eg, PMHK).
34
Moreover, Bayes methods
become more interesting for applied scientists because
recently approximated methods were developed that
facilitate their application compared with full Bayesian/
Markov chain Monte Carlo approaches.
47
Future studies should directly compare the well
performing frequentist methods with well performing
Bayesian methods for metaanalysis including few stud-
ies. Future research should also assess extensions of the
betabinominal models. In metaanalysis including more
than 5 studies, betabinominal models that model the
betabinomial distributions for control and intervention
groups separately and link intervention and control
groups from the same study by a random effects (eg, esti-
mated from DLRE) also showed satisfactorily statistical
properties.
48
An advantage of such betabinominal
models is that the randomization is not broken.
5|CONCLUSION AND
RECOMMENDATIONS FOR
PRACTICE
If only a few studies for inclusion in a metaanalysis are
available, then the choice of the right method is challeng-
ing because of the narrow path regarding the balance
between correct empirical coverage of the 95% confidence
interval and power.
In fixed effects situations, we recommend the classical
fixed effects models (MaHa and YPET). A premise is that
the assumption of a fixed effects model (common effect of
all studies) must be duly justified. If the standard fixed
effects models are not converging (eg, because of many
double zero studies), then the more robust BB1N or
BBMP is an alternative instead.
14 MATHES AND KUSS
In random effects situations, only MKH, PMHK, and
BBMP kept the type I error rate. Results of these models
can be judged reliable for estimating intervention effects.
We discourage the use of other inverse variance random
effects models than MKH and PMHK in situations where
strictly keeping the type I error rate is important. There is
a high risk not to detect a true existing difference using
these methods because of their very low power, in
particular for metaanalysis of 2 to 3 studies. Thus, if
one is willing to accept a slightly higher type I error rate,
we recommend using BB1N and PMRE.
An advantage of the betabinominal models (BB1N
and BBMP) is their robustness and ability to include the
information from single and double zero studies.
11
We
recommend these models for pooling rare events.
ACKNOWLEDGEMENT
This work was not funded.
ORCID
Tim Mathes http://orcid.org/0000-0002-5304-1717
REFERENCES
1. Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT.
Predicting the extent of heterogeneity in metaanalysis, using
empirical data from the Cochrane Database of Systematic
Reviews. Int J Epidemiol. 2012;41(3):818827.
2. Kontopantelis E, Springate DA, Reeves D. A reanalysis of the
Cochrane Library data: the dangers of unobserved heterogeneity
in metaanalyses. PLoS One. 2013;8(7):e69930.
3. Davey J, Turner RM, Clarke MJ, Higgins JP. Characteristics of
metaanalyses and their component studies in the Cochrane
Database of Systematic Reviews: a crosssectional, descriptive
analysis. BMC Med Res Methodol. 2011;11(1):111.
4. Borenstein M, Higgins JP. Metaanalysis and subgroups. Prev
Sci. 2013;14(2):134143.
5. Shuster JJ, Walker MA. Loweventrate metaanalyses of
clinical trials: implementing good practices. Stat Med.
2016;35(14):24672478.
6. Friedrich JO, Adhikari NK, Beyene J. Inclusion of zero total
event trials in metaanalyses maintains analytic consistency
and incorporates all available data. BMC Med Res Methodol.
2007;7(1):16.
7. Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado
about nothing: a comparison of the performance of metaanalyt-
ical methods with rare events. Stat Med. 2007;26(1):5377.
8. Higgins JPT, Thompson SG, Spiegelhalter DJ. A reevaluation
of randomeffects metaanalysis. J Royal Stat Soc Series A,
(Statistics in Society). 2009;172(1):137159.
9. Guolo A, Varin C. Randomeffects metaanalysis: the number of
studies matters. Stat Methods Med Res. 2015.
10. Jackson D, Bowden J, Baker R. How does the DerSimonian and
Laird procedure for random effects metaanalysis compare with
its more efficient but harder to compute counterparts? J Stat
Plann Infer. 2010;140(4):961970.
11. Kuss O. Statistical methods for metaanalyses including informa-
tion from studies without any eventsadd nothing to nothing
and succeed nevertheless. Stat Med. 2015;34(7):10971116.
12. Röver C, Knapp G, Friede T. HartungKnappSidikJonkman
approach and its modification for randomeffects metaanalysis
with few studies. BMC Med Res Methodol. 2015;15(1):17.
13. IntHout J, Ioannidis JP, Borm GF. The HartungKnappSidik
Jonkman method for random effects metaanalysis is
straightforward and considerably outperforms the standard
DerSimonianLaird method. BMC Med Res Methodol. 2014;
14(1):112.
14. Hartung J, Knapp G. A refined method for the metaanalysis of
controlled clinical trials with binary outcome. Stat Med.
2001;20(24):38753889.
15. Wiksten A, Rucker G, Schwarzer G. HartungKnapp method is
not always conservative compared with fixedeffect meta
analysis. Stat Med. 2016;35(15):25032515.
16. Mantel N, Haenszel W. Statistical aspects of the analysis of data
from retrospective studies of disease. J Natl Cancer Inst.
1959;22(4):719748.
17. Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for
examining heterogeneity and combining results from several
studies in metaanalysis. In: Systematic Reviews in Health Care.
BMJ Publishing Group; 2008:285312.
18. Sweeting J, Sutton MAJ, Lambert PC. What to add to nothing?
Use and avoidance of continuity corrections in metaanalysis
of sparse data. Stat Med. 2004;23(9):13511375.
19. Deeks JJ, Higgins J, Altman DG. Analysing Data and
Undertaking MetaAnalyses.Cochrane handbook for systematic
reviews of interventions: Cochrane book series; 2008:243296.
20. Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade
during and after myocardial infarction: an overview of the
randomized trials. Prog Cardiovasc Dis. 1985;27(5):335371.
21. Brockhaus AC, Bender R, Skipka G. The Peto odds ratio viewed
as a new effect measure. Stat Med. 2014;33(28):48614874.
22. Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to
estimate the betweenstudy variance and its uncertainty in
metaanalysis. Res Synth Meth. 2016;7(1):5579.
23. DerSimonian R, Laird N. Metaanalysis in clinical trials. Control
Clin Trials. 1986;7(3):177188.
24. Paule RC, Mandel J. Consensus values and weighting factors.
J Res Natl Bur Stand. 1982;87(5):377385.
25. DerSimonian R, Kacker R. Randomeffects model for meta
analysis of clinical trials: an update. Contemp Clin Trials.
2007;28(2):105114.
26. Whitehead A. Estimating the treatment difference in an
individual trial. In: MetaAnalysis of Controlled Clinical Trials.
John Wiley & Sons, Ltd; 2003:2355.
27. Rukhin AL, Biggerstaff BJ, Vangel MG. Restricted maximum
likelihood estimation of a common mean and the MandelPaule
algorithm. J Stat Plann Infer. 2000;83(2):319330.
MATHES AND KUSS 15
28. Bowden J, Tierney JF, Copas AJ, Burdett S. Quantifying,
displaying and accounting for heterogeneity in the meta
analysis of RCTs using standard and generalised Qstatistics.
BMC Med Res Methodol. 2011;11(1):41.
29. Viechtbauer, W., R Metafor Package. 2016.
30. Hartung J. An alternative method for metaanalysis. Biom J.
1999;41(8):901916.
31. Hartung J, Knapp G. On tests of the overall treatment effect in
metaanalysis with normally distributed responses. Stat Med.
2001;20(12):17711782.
32. Sidik K, Jonkman JN. A simple confidence interval for meta
analysis. Stat Med. 2002;21(21):31533159.
33. Knapp G, Hartung J. Improved tests for a random effects
metaregression with a single covariate. Stat Med. 2003;
22(17):26932710.
34. Friede T, Röver C, Wandel S, Neuenschwander B. Metaanalysis
of few small studies in orphan diseases. Res Synth Meth. 2016.
p. n/an/a
35. Burke DL, Ensor J, Riley RD. Metaanalysis using individual
participant data: onestage and twostage approaches, and why
they may differ. Stat Med. 2017;36(5):855875.
36. Berkey CS, Hoaglin DC, Mosteller F, Colditz GA. A random
effects regression model for metaanalysis. Stat Med.
1995;14(4):395411.
37. Raghunathan TE, Yoichi II. Analysis of binary data from a
multicentre clinical trial. Biometrika. 1993;80(1):127139.
38. Stephens JM, Handke B, Doshi JA. International survey of
methods used in health technology assessment (HTA): does
practice meet the principles proposed for good research? 2012.
2012;2:2944.
39. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1.
IntroductionGRADE evidence profiles and summary of find-
ings tables. J Clin Epidemiol. 2011;64(4):383394.
40. Herbison P, HaySmith J, Gillespie WJ. Metaanalyses of small
numbers of trials often agree with longerterm results. J Clin
Epidemiol. 2011;64(2):145153.
41. IntHout J, Ioannidis JPA, Borm GF, Goeman JJ. Small studies
are more heterogeneous than large ones: a metametaanalysis.
J Clin Epidemiol. 2015;68(8):860869.
42. Partlett C, Riley RD. Random effects metaanalysis: coverage
performance of 95% confidence and prediction intervals
following REML estimation. Stat Med. 2017;36(2):301317.
43. Jackson D, Law M, Rücker G, Schwarzer G. The HartungKnapp
modification for randomeffects metaanalysis: a useful refine-
ment but are there any residual concerns? Stat Med.
2017;36(25):39233934.
44. Petropoulou M, Mavridis D. A comparison of 20 heterogeneity
variance estimators in statistical synthesis of results from
studies: a simulation study. Stat Med. 2017;36(27):42664280.
45. Bodnar O, Link A, Arendacká B, Possolo A, Elster C. Bayesian
estimation in random effects metaanalysis using a non
informative prior. Stat Med. 2017;36(2):378399.
46. Friede T, Röver C, Wandel S, Neuenschwander B. Metaanalysis
of two studies in the presence of heterogeneity with applications
in rare diseases. Biom J. 2016. p. n/an/a
47. Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ,
Higgins JPT. Implementing informative priors for heterogeneity
in metaanalysis using metaregression and pseudo data. Stat
Med. 2016;35(29):54955511.
48. Bakbergenuly I, Kulinskaya E. Betabinomial model for meta
analysis of odds ratios. Stat Med. 2017. p. n/an/a
SUPPORTING INFORMATION
Additional Supporting Information may be found online
in the supporting information tab for this article.
How to cite this article: Mathes T, Kuss O. A
comparison of methods for metaanalysis of a small
number of studies with binary outcomes. Res Syn
Meth. 2018;116. https://doi.org/10.1002/jrsm.1296
16 MATHES AND KUSS
... One limitation of our proposed method could be that MAGEC requires a reasonable number of eligible studies to be included in meta-analysis of drug safety. In cases with very few studies, it becomes important to apply proper adjustments to enable stable estimation for random effects [35,61]. In the Bayesian modeling setting, a robust prior for the standard deviation parameter, such as the half-Cauchy prior with heavy tails [14], is preferred for the between-study random effect to address this limitation. ...
Article
Full-text available
Meta-analysis is a powerful tool for assessing drug safety by combining treatment-related toxicological findings across multiple studies, as clinical trials are typically underpowered for detecting adverse drug effects. However, incomplete reporting of adverse events (AEs) in published clinical studies is frequently encountered, especially if the observed number of AEs is below a pre-specified study-dependent threshold. Ignoring the censored AE information, often found in lower frequency, can significantly bias the estimated incidence rate of AEs. Despite its importance, this prevalent issue in meta-analysis has received little statistical or analytic attention in the literature. To address this challenge, we propose a Bayesian approach to accommodating the censored and possibly rare AEs for meta-analysis of safety data. Through simulation studies, we demonstrate that the proposed method can improve accuracy in point and interval estimation of incidence probabilities, particularly in the presence of censored data. Overall, the proposed method provides a practical solution that can facilitate better-informed decisions regarding drug safety.
... For each dependent variable, at least five effect sizes were desirable to conduct a meta-analysis (Mathes & Kuss, 2018). It was found that after the study quality assessment, less than five effect sizes were retrievable for educational aspirations, SAT scores, state assessment, and college persistence. ...
Article
Full-text available
There has been much effort to minimize the educational disparities among students with diverse linguistic, cultural, and socioeconomic backgrounds. Gaining Early Awareness and Readiness for Undergraduate Programs (GEAR UP) is one of the national efforts to combat the educational inequity that started in the late 1990s. Since then, evidence of how GEAR UP contributes to diminishing the educational gap in students’ educational outcomes accumulated, including academic and behavioral outcomes. Yet, no study comprehensively evaluated the overall effects of such empirical studies. Thus, the goal of the current meta-analysis is to quantitatively synthesize the studies that investigated the effects of GEAR UP on historically underrepresented students’ college readiness. Across various educational outcomes, eight studies were identified. A random-effects model was employed to account for heterogeneity across the studies, followed by moderator analyses. Findings from four separate meta-analyses revealed that the magnitude of the overall effects of GEAR UP on educational outcomes varied from small (e.g., American College Test scores) to large (e.g., attendance rate), according to Kraft (2020). Results of two proportional meta-analyses revealed that both college enrollment rate and graduation rate were higher for GEAR UP participants compared to non-GEAR UP participants. The program length was found to not moderate the effect of GEAR UP. Implications and future research directions are suggested.
... Rights reserved. confidence interval were computed using DerSimonian and Laird's generic inverse variance method [68]. A randomeffect model was employed instead of fixed-effect model given the diverse background populations and protocols across the included studies. ...
Article
Full-text available
This study aimed to evaluate the prevalence and risk of malignant neoplasm in primary hyperparathyroidism (PHPT) patients. Potentially eligible studies were retrieved from PubMed and Embase databases from inception to November 2023 using search strategy consisting of terms for “Primary hyperparathyroidism” and “Malignant neoplasm”. Eligible study must report prevalence of malignant neoplasm among patients with PHPT or compare the risk of malignant neoplasm between patients with PHPT and comparators. Point estimates with standard errors were extracted from each study and combined using the generic inverse variance method.A total of 11,926 articles were identified. After two rounds of systematic review, 50 studies were included. The meta-analysis revealed that pooled prevalence rates of overall cancer was 0.19 (95%CI: 0.13–0.25; I² 94%). The two most prevalent types of malignancy among patients with PHPT ware papillary thyroid cancer (pooled prevalence: 0.07; 95%CI: 0.06–0.08; I² 85%) and breast cancer (pooled prevalence: 0.05; 95%CI: 0.03–0.07; I² 87%). Subgroup analysis of studies focusing on patients undergoing parathyroidectomy reported a fourfold higher prevalence of papillary thyroid cancer than the remaining studies (0.08 versus 0.02). The meta-analysis of cohort studies found a significant association between PHPT and overall cancer with the pooled risk ratio of 1.28 (95%CI: 1.23–1.33; I² 66.9%).We found that the pooled prevalence of malignant neoplasm in PHPT was 19%, with papillary thyroid cancer and breast cancer being the most prevalent types. The meta-analysis of cohort studies showed that patient with PHPT carried an approximately 28% increased risk of malignancy.
... It is an established scientific fact that estimating between-study heterogeneity is difficult in the situation with small datasets. An inaccurate estimation of heterogeneity (high homogeneity in the analyzed data) often results in incorrect or biased effect estimations and narrow confidence intervals (Mathes & Kuss, 2018). Besides, the studies on the plowing depth effects were mainly conducted on the soils of one type (dark-chestnut soil), thus, making them extremely homogenous and making it difficult to create a statistically valuable sample dataset. ...
Article
Full-text available
Tillage is one of the major factors affecting soil biological activity, resulting in changes in soil organic carbon (SOC) content, providing for carbon sequestration and shifts in carbon dioxide emission from soils. Current climate change and aggravation of global warming through the increased emission of carbon dioxide are main driving forces for global transformation of agricultural practices in the direction of climate-smart agriculture (CSA), which requires the implementation of such crop cultivation practices that result in the minimization of SOC losses and carbon dioxide emissions. The magnitude and direction of different tillage practices affecting soil biological activity are different, therefore, the best tillage options should be chosen for implementation in national CSA systems to ensure achieving the global sustainability goals. This nationwide meta-analysis, conducted for tillage practices utilized in Ukrainian agriculture examines scientifically recorded effects of moldboard tillage depth, flat cutter and no-till options on soil respiration rates and cellulose decomposition intensity in dark-chestnut and chernozem soils of Ukraine. This meta-analysis enrolled 45 studies, which met the stipulated scientific quality criteria. Statistical processing was conducted through the standardized mean difference (SMD) model without subgroups at 95% confidence interval (CI). As a result, it was determined that there is subtle impact of moldboard tillage depth on soil biological activity, which is inconclusive and unclear. The similar results were obtained for the comparison between the tillage and no-till groups, where high heterogeneity of the dataset (I2 = 82.8%) resulted in low quality of evidence for the benefits of no-till in SOC sequestration. Besides, zero fail-safe numbers support the suggestion of low-quality evidence in favor of shallow plowing advantage over deep plowing, as well as no-till against tillage. As for the difference between the groups of moldboard and flat cutter tillage, it was established that there is strong enough evidence for the advantage of flat cutter tillage in terms of soil respiration rates and cellulose decomposition intensity reduction. Further studies in this direction are required to fill the gaps in current meta-analysis, especially in terms of no-till options and their effect on biological activity of Ukrainian soils in different cropping systems.
... For studies that did not provide mean and standard deviation for the endpoint, we calculated these values from the corresponding median, range, and interquartile range using the algorithm described by Hozo et al. [32] Double-arm studies were analyzed using a random-effects model for direct pairwise comparisons in standard meta-analysis. [33,34] For indirect pairwise comparisons in double-arm studies, a random-effects NMA was performed under a frequentist model using Markov chain Monte Carlo simulations. [35][36][37] To visually compare network estimates, we utilized forest plots and rank-heat plots. ...
Article
Full-text available
Background Benign thyroid nodules (BTNs) represent a prevalent clinical challenge globally, with various ultrasound-guided ablation techniques developed for their management. Despite the availability of these methods, a comprehensive evaluation to identify the most effective technique remains absent. This study endeavors to bridge this knowledge gap through a network meta-analysis (NMA), aiming to enhance the understanding of the comparative effectiveness of different ultrasound-guided ablation methods in treating BTNs. Methods We comprehensively searched PubMed, Embase, Cochrane, Web of Science, Ovid, SCOPUS, and ProQuest for studies involving 16 ablation methods, control groups, and head-to-head trials. NMA was utilized to evaluate methods based on the percentage change in nodule volume, symptom score, and cosmetic score. This study is registered in INPLASY (registration number 202260061). Results Among 35 eligible studies involving 5655 patients, NMA indicated that RFA2 (radiofrequency ablation, 2 sessions) exhibited the best outcomes at 6 months for percentage change in BTN volume (SUCRA value 74.6), closely followed by RFA (SUCRA value 73.7). At 12 months, RFA was identified as the most effective (SUCRA value 81.3). Subgroup analysis showed RFA2 as the most effective for solid nodule volume reduction at 6 months (SUCRA value 75.6), and polidocanol ablation for cystic nodules (SUCRA value 66.5). Conclusion Various ablation methods are effective in treating BTNs, with RFA showing notable advantages. RFA with 2 sessions is particularly optimal for solid BTNs, while polidocanol ablation stands out for cystic nodules.
... Many alternative methods exist to overcome the limitation of the DL method; the Paule-Mandel (PM) method [15], the Restricted maximum-likelihood (REML) method, the Hartung and Knapp [16] and the Sidik and Jonkman [17] (HKSJ) method. Many simulation studies [13,[18][19][20] investigated the effect of different between-study variance estimators on the pooled estimate, but the findings conflict. A previous study recommended the REML and HKSJ over the DL methods [19], while PM was recommended in three simulation studies reported in a systematic review [21]. ...
Article
Full-text available
Background Orthodontic systematic reviews (SRs) use different methods to pool the individual studies in a meta-analysis when indicated. However, the number of studies included in orthodontic meta-analyses is relatively small. This study aimed to evaluate the direction of estimate changes of orthodontic meta-analyses (MAs) using different between-study variance meth- ods considering the level of heterogeneity when few trials were pooled. Methods Search and study selection: Systematic reviews (SRs) published over the last three years, from the 1st of January 2020 to the 31st of December 2022, in six main orthodontic journals with at least one MA pooling five or lesser primary studies were identified. Data collection and analysis: Data were extracted from each eligible MA, which was replicated in a random effect model using DerSimonian and Laird (DL), Paule–Mandel (PM), Restricted maximum- likelihood (REML), Hartung Knapp and Sidik Jonkman (HKSJ) methods. The results were reported using median and interquartile range (IQR) for continuous data and frequencies for categorical data and analyzed using non-parametric tests. The Boruta algorithm was used to assess the significant predictors for the significant change in the confidence interval between the different methods compared to the DL method, which was only feasible using the HKSJ method. Results 146 MAs were included, most applying the random effect model (n = 111; 76%) and pooling continuous data using mean difference (n = 121; 83%). The median number of studies was three (range 2, 4), and the overall statistical heterogeneity (I2 ranged from 0 to 99% with a median of 68%). Close to 60% of the significant findings became non-significant when HKwas applied compared to the DL method and when the heterogeneity was present I2>0%. On the other hand, 30.43% of the non-significant meta-analyses using the DL method became significant when HKSJ was used when the heterogeneity was absent I2 = 0%. Conclusion Orthodontic MAs with few studies can produce different results based on the between-study variance method and the statistical heterogeneity level. Compared to DL, HKSJ method is overconservative when I2 is greater than 0% and may result in false positive findings when the heterogeneity is absent.
... Meta-analysis was conducted if both of the following two eligibility criteria were met to avoid low power [58]: 1) heterogeneity is acceptable (I 2 <50%) and 2) at least seven studies can be included. In meta-analyses that did not meet the criteria for homogeneity, we conducted subgroup analyses in an effort to account for heterogeneity due to observed effect parameters or a lack of confounder adjustment. ...
Article
Full-text available
Background Molecular pathways found to be important in pulmonary fibrosis are also involved in cancer pathogenesis, suggesting common pathways in the development of pulmonary fibrosis and lung cancer. Research question Is pulmonary fibrosis from exposure to occupational carcinogens an independent risk factor for lung cancer? Study design and methods A comprehensive search of PubMed, Embase, Web of Science and Cochrane databases with over 100 search terms regarding occupational hazards causing pulmonary fibrosis was conducted. After screening and extraction, quality of evidence and eligibility criteria for meta-analysis were assessed. Meta-analysis was performed using a random-effects model. Results 52 studies were identified for systematic review. Meta-analysis of subgroups identified silicosis as a risk factor for lung cancer when investigating odds ratios for silicosis in autopsy studies (OR 1.47, 95% CI 1.13–1.90) and for lung cancer mortality in patients with silicosis (OR 3.21, 95% CI 2.67–3.87). Only considering studies with an adjustment for smoking as a confounder identified a significant increase in lung cancer risk (OR 1.58, 95% CI 1.34–1.87). However, due to a lack of studies including cumulative exposure, no adjustments could be included. In a qualitative review, no definitive conclusion could be reached for asbestosis and silicosis as independent risk factors for lung cancer, partly because the studies did not take cumulative exposure into account. Interpretation This systematic review confirms the current knowledge regarding asbestosis and silicosis, indicating a higher risk of lung cancer in exposed individuals compared to exposed workers without fibrosis. These individuals should be monitored for lung cancer, especially when asbestosis or silicosis is present.
Article
Psychotic experiences (PE) are prevalent in general and clinical populations and can increase the risk for mental disorders in young people. The Community Assessment of Psychic Experiences (CAPE) is a widely used measure to assess PE in different populations and settings. However, the current knowledge on their overall reliability is limited. We examined the reliability of the CAPE-42 and later versions, testing the role of age, sex, test scores, and clinical status as moderators. A systematic search was conducted on the Scopus, Web of Science, PubMed, EBSCOhost, ProQuest, and GoogleScholar databases. Internal consistency and temporal stability indices were examined through reliability generalization meta-analysis (RGMA). Moderators were tested through meta-regression analysis. From a pool of 1,015 records, 90 independent samples were extracted from 71 studies. Four versions showed quantitative evidence for inclusion: CAPE-42, CAPE-20, CAPE-P15, and CAPE-P8. Internal consistency indices were good (α/ω≈.725–0.917). Temporal stability was only analyzed for the CAPE-P15, yielding a moderate but not-significant effect (r=0.672). The evidence for temporal stability is scant due to the limited literature, and definitive conclusions cannot be drawn. Further evidence on other potential moderators such as adverse experiences or psychosocial functioning is required.
Article
The meta‐analysis of rare events presents unique methodological challenges owing to the small number of events. Bayesian methods are often used to combine rare events data to inform decision‐making, as they can incorporate prior information and handle studies with zero events without the need for continuity corrections. However, the comparative performances of different Bayesian models in pooling rare events data are not well understood. We conducted a simulation to compare the statistical properties of four parameterizations based on the binomial‐normal hierarchical model, using two different priors for the treatment effect: weakly informative prior (WIP) and non‐informative prior (NIP), pooling randomized controlled trials with rare events using the odds ratio metric. We also considered the beta‐binomial model proposed by Kuss and the random intercept and slope generalized linear mixed models. The simulation scenarios varied based on the treatment effect, sample size ratio between the treatment and control arms, and level of heterogeneity. Performance was evaluated using median bias, root mean square error, median width of 95% credible or confidence intervals, coverage, Type I error, and empirical power. Two reviews are used to illustrate these methods. The results demonstrate that the WIP outperforms the NIP within the same model structure. Among the compared models, the model that included the treatment effect parameter in the risk model for the control arm did not perform well. Our findings confirm that rare events meta‐analysis faces the challenge of being underpowered, highlighting the importance of reporting the power of results in empirical studies.
Article
Context Health education using videos has been promoted for its potential to enhance community health by improving social and behavior change communication. Objective To provide stakeholders in maternal and child health with evidence that can inform policies and strategies integrating video education to improve maternal, newborn, and child health. Data sources Five databases (MEDLINE, Embase, Scopus, Web of Science, and CENTRAL) were searched on January 28, 2022, and November 10, 2022 (updated search). Quantitative and qualitative studies conducted in low- and middle-income countries on the effects of video-based interventions on nutrition, health, and health service use were eligible. There was no restriction on time or language. Study selection was done in 2 stages and in duplicate. Data extraction A total of 13 710 records were imported to EndNote. Of these, 8226 records were screened by title and abstract using Rayyan, and 76 records were included for full-text evaluation. Results Twenty-nine articles (n = 12 084 participants) were included in this systematic review, and 7 were included in the meta-analysis. Video interventions improved knowledge about newborn care (n = 234; odds ratio [OR], 1.20; 95% confidence interval [CI], 1.04–1.40), colostrum feeding (n = 990; OR, 60.38; 95%CI, 18.25–199.78), continued breastfeeding (BF; n = 1914; OR, 3.79; 95%CI, 1.14–12.64), intention to use family planning (FP) (n = 814; OR, 1.57; 95%CI, 1.10–2.23), and use of FP (n = 864; OR, 6.55; 95%CI, 2.30–18.70). Video interventions did not result in reduced prelacteal feeding or improvement in early initiation of BF. The qualitative studies showed that video interventions were acceptable and feasible, with perceived impacts on communities. Conclusion This systematic review and meta-analysis indicated that video interventions improved knowledge of newborn care, colostrum feeding, and continuing BF, and the intention to use FP. Given the high levels of heterogeneity and inconsistency in reporting, more research with stronger designs is recommended. Systematic review registration PROSPERO registration no. CRD42022292190.
Article
Full-text available
The modified method for random-effects meta-analysis, usually attributed to Hartung andKnapp and also proposed by Sidik and Jonkman, is easy to implement and is becoming advocated for general use. Here, we examine a range of potential concerns about the widespread adoption of this method. Motivated by these issues, a variety of different conventions can be adopted when using the modified method in practice. We describe and investigate the use of a variety of these conventions using a new taxonomy ofmeta-analysis datasets.We conclude that the Hartung and Knapp modificationmay be a suitable replacement for the standard method. Despite this, analysts who advocate the modified method should be ready to defend its use against the possible objections to it that we present. We further recommend that the results from more conventional approaches should be used as sensitivity analyses when using the modified method. It has previously been suggested that a common-effect analysis should be used for this purpose but we suggest amending this recommendation and argue that a standard random-effects analysis should be used instead.
Article
Full-text available
The modified method for random-effects meta-analysis, usually attributed to Hartung and Knapp and also proposed by Sidik and Jonkman, is easy to implement and is becoming advocated for general use. Here, we examine a range of potential concerns about the widespread adoption of this method. Motivated by these issues, a variety of different conventions can be adopted when using the modified method in practice. We describe and investigate the use of a variety of these conventions using a new taxonomy of meta-analysis datasets. We conclude that the Hartung and Knapp modification may be a suitable replacement for the standard method. Despite this, analysts who advocate the modified method should be ready to defend its use against the possible objections to it that we present. We further recommend that the results from more conventional approaches should be used as sensitivity analyses when using the modified method. It has previously been suggested that a common-effect analysis should be used for this purpose but we suggest amending this recommendation and argue that a standard random-effects analysis should be used instead.
Article
Full-text available
In meta-analysis of odds ratios (ORs), heterogeneity between the studies is usually modelled via the additive random effects model (REM). An alternative, multiplicative REM for ORs uses overdispersion. The multiplicative factor in this overdispersion model (ODM) can be interpreted as an intra-class correlation (ICC) parameter. This model naturally arises when the probabilities of an event in one or both arms of a comparative study are themselves beta-distributed, resulting in beta-binomial distributions. We propose two new estimators of the ICC for meta-analysis in this setting. One is based on the inverted Breslow-Day test, and the other on the improved gamma approximation by Kulinskaya and Dollinger (2015, p. 26) to the distribution of Cochran's Q. The performance of these and several other estimators of ICC on bias and coverage is studied by simulation. Additionally, the Mantel-Haenszel approach to estimation of ORs is extended to the beta-binomial model, and we study performance of various ICC estimators when used in the Mantel-Haenszel or the inverse-variance method to combine ORs in meta-analysis. The results of the simulations show that the improved gamma-based estimator of ICC is superior for small sample sizes, and the Breslow-Day-based estimator is the best for n⩾100. The Mantel-Haenszel-based estimator of OR is very biased and is not recommended. The inverse-variance approach is also somewhat biased for ORs≠1, but this bias is not very large in practical settings. Developed methods and R programs, provided in the Web Appendix, make the beta-binomial model a feasible alternative to the standard REM for meta-analysis of ORs. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Article
Full-text available
Meta-analysis using individual participant data (IPD) obtains and synthesises the raw, participant-level data from a set of relevant studies. The IPD approach is becoming an increasingly popular tool as an alternative to traditional aggregate data meta-analysis, especially as it avoids reliance on published results and provides an opportunity to investigate individual-level interactions, such as treatment-effect modifiers. There are two statistical approaches for conducting an IPD meta-analysis: one-stage and two-stage. The one-stage approach analyses the IPD from all studies simultaneously, for example, in a hierarchical regression model with random effects. The two-stage approach derives aggregate data (such as effect estimates) in each study separately and then combines these in a traditional meta-analysis model. There have been numerous comparisons of the one-stage and two-stage approaches via theoretical consideration, simulation and empirical examples, yet there remains confusion regarding when each approach should be adopted, and indeed why they may differ. In this tutorial paper, we outline the key statistical methods for one-stage and two-stage IPD meta-analyses, and provide 10 key reasons why they may produce different summary results. We explain that most differences arise because of different modelling assumptions, rather than the choice of one-stage or two-stage itself. We illustrate the concepts with recently published IPD meta-analyses, summarise key statistical software and provide recommendations for future IPD meta-analyses. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
Article
Full-text available
A random effects meta-analysis combines the results of several independent studies to summarise the evidence about a particular measure of interest, such as a treatment effect. The approach allows for unexplained between-study heterogeneity in the true treatment effect by incorporating random study effects about the overall mean. The variance of the mean effect estimate is conventionally calculated by assuming that the between study variance is known; however, it has been demonstrated that this approach may be inappropriate, especially when there are few studies. Alternative methods that aim to account for this uncertainty, such as Hartung-Knapp, Sidik-Jonkman and Kenward-Roger, have been proposed and shown to improve upon the conventional approach in some situations. In this paper, we use a simulation study to examine the performance of several of these methods in terms of the coverage of the 95% confidence and prediction intervals derived from a random effects meta-analysis estimated using restricted maximum likelihood. We show that, in terms of the confidence intervals, the Hartung-Knapp correction performs well across a wide-range of scenarios and outperforms other methods when heterogeneity was large and/or study sizes were similar. However, the coverage of the Hartung-Knapp method is slightly too low when the heterogeneity is low (I(2) < 30%) and the study sizes are quite varied. In terms of prediction intervals, the conventional approach is only valid when heterogeneity is large (I(2) > 30%) and study sizes are similar. In other situations, especially when heterogeneity is small and the study sizes are quite varied, the coverage is far too low and could not be consistently improved by either increasing the number of studies, altering the degrees of freedom or using variance inflation methods. Therefore, researchers should be cautious in deriving 95% prediction intervals following a frequentist random-effects meta-analysis until a more reliable solution is identified. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Chapter
This chapter describes the principles and methods used to carry out a meta-analysis for a comparison of two interventions for the main types of data encountered. A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse-variance method. This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many meta-analyses of both dichotomous and continuous data. Results may be expressed as count data when each participant may experience an event, and may experience it more than once. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual, continuous data and time-to-event data, as well as being analysed as rate data. Prediction intervals from random-effects meta-analyses are a useful device for presenting the extent of between-study variation. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.
Article
When we synthesize research findings via meta-analysis, it is common to assume that the true underlying effect differs across studies. Total variability consists of the within-study and between-study variances (heterogeneity). There have been established measures, such as I2, to quantify the proportion of the total variation attributed to heterogeneity. There is a plethora of estimation methods available for estimating heterogeneity. The widely used DerSimonian and Laird estimation method has been challenged, but knowledge of the overall performance of heterogeneity estimators is incomplete. We identified 20 heterogeneity estimators in the literature and evaluated their performance in terms of mean absolute estimation error, coverage probability, and length of the confidence interval for the summary effect via a simulation study. Although previous simulation studies have suggested the Paule-Mandel estimator, it has not been compared with all the available estimators. For dichotomous outcomes, estimating heterogeneity through Markov chain Monte Carlo is a good choice if an informative prior distribution for heterogeneity is employed (eg, by published Cochrane reviews). Nonparametric bootstrap and positive DerSimonian and Laird perform well for all assessment criteria for both dichotomous and continuous outcomes. Hartung-Makambi estimator can be the best choice when the heterogeneity values are close to 0.07 for dichotomous outcomes and medium heterogeneity values (0.01 , 0.05) for continuous outcomes. Hence, there are heterogeneity estimators (nonparametric bootstrap DerSimonian and Laird and positive DerSimonian and Laird) that perform better than the suggested Paule-Mandel. Maximum likelihood provides the best performance for both types of outcome in the absence of heterogeneity.
Article
Pooling information from multiple, independent studies (meta-analysis) adds great value to medical research. Random effects models are widely used for this purpose. However, there are many different ways of estimating model parameters, and the choice of estimation procedure may be influential upon the conclusions of the meta-analysis. In this paper, we describe a recently proposed Bayesian estimation procedure and compare it with a profile likelihood method and with the DerSimonian-Laird and Mandel-Paule estimators including the Knapp-Hartung correction. The Bayesian procedure uses a non-informative prior for the overall mean and the between-study standard deviation that is determined by the Berger and Bernardo reference prior principle. The comparison of these procedures focuses on the frequentist properties of interval estimates for the overall mean. The results of our simulation study reveal that the Bayesian approach is a promising alternative producing more accurate interval estimates than those three conventional procedures for meta-analysis. The Bayesian procedure is also illustrated using three examples of meta-analysis involving real data. Copyright © 2016 John Wiley & Sons, Ltd.