Content uploaded by Tim Mathes
Author content
All content in this area was uploaded by Tim Mathes on Aug 10, 2018
Content may be subject to copyright.
RESEARCH ARTICLE
A comparison of methods for meta‐analysis of a small
number of studies with binary outcomes
Tim Mathes
1,2
| Oliver Kuss
3,4
1
Institute for Research in Operative
Medicine, Witten/Herdecke University,
Ostmerheimer Str 200, Building 38, 51109
Cologne, Germany
2
Institute of Medical Biometry and
Informatics, University of Heidelberg,
Heidelberg, Germany
3
Institute for Biometrics and
Epidemiology, German Diabetes Center,
Leibniz Institute for Diabetes Research at
Heinrich Heine University Düsseldorf,
Düsseldorf, Germany
4
Institute of Medical Statistics, Düsseldorf
University Hospital and Medical Faculty,
Heinrich Heine University Düsseldorf,
Düsseldorf, Germany
Correspondence
Tim Mathes, Institute for Research in
Operative Medicine, Witten/Herdecke
University, Ostmerheimer Str 200,
Building 38, 51109 Cologne, Germany.
Email: tim.mathes@uni‐wh.de
Meta‐analyses often include only a small number of studies (≤5). Estimating
between‐study heterogeneity is difficult in this situation. An inaccurate
estimation of heterogeneity can result in biased effect estimates and too narrow
confidence intervals. The beta‐binominal model has shown good statistical
properties for meta‐analysis of sparse data. We compare the beta‐binominal
model with different inverse variance random (eg, DerSimonian‐Laird,
modified Hartung‐Knapp, and Paule‐Mandel) and fixed effects methods
(Mantel‐Haenszel and Peto) in a simulation study. The underlying true param-
eters were obtained from empirical data of actually performed meta‐analyses to
best mirror real‐life situations. We show that valid methods for meta‐analysis
of a small number of studies are available. In fixed effects situations, the
Mantel‐Haenszel and Peto methods performed best. In random effects
situations, the beta‐binominal model performed best for meta‐analysis of few
studies considering the balance between coverage probability and power. We
recommended the beta‐binominal model for practical application. If very
strong evidence is needed, using the Paule‐Mandel heterogeneity variance
estimator combined with modified Hartung‐Knapp confidence intervals might
be useful to confirm the results. Notable most inverse variance random effects
models showed unsatisfactory statistical properties also if more studies (10‐50)
were included in the meta‐analysis.
KEYWORDS
few studies, heterogeneity variance estimators, meta‐analysis, simulation study
Abbreviations: BB1N, beta‐binomial regression (maximum likelihood) with confidence intervals using a Student t‐distribution with the degrees of
freedom equals the number of studies; BB2N, beta‐binomial regression (maximum likelihood) with confidence intervals using a Student t‐
distribution with the degrees of freedom are the number of studies multiplied by 2; BBFM, estimation of the beta‐binomial using PROC FM; BBIN,
beta‐binomial regression (maximum likelihood) with Wald‐type confidence intervals; BBMP, beta‐binomial regression (maximum likelihood) with
confidence intervals using a Student t‐distribution with the degrees of freedom are the number of studies multiplied by 2 minus 3 to account for
the estimation of the 3 distributional parameters; BBQD, beta‐binomial regression (quasi‐likelihood) with Student t‐distribution with the degrees of
freedom are the number of studies multiplied by 2 minus 3 to account for the estimation of the 3 distributional parameters; BBQL, beta‐binomial
regression (quasi‐likelihood) with Wald‐type confidence intervals; DLRE, random effects model with DerSimonian‐Laird between‐study variance
estimator and Wald‐type confidence intervals; HKSJ, random effects model with DerSimonian‐Laird between‐study variance estimator and
Hartung‐Knapp‐Sidik‐Jonkman confidence intervals; MaHa, Mantel‐Haenszel fixed effects model; MKH, random effects model with DerSimonian‐
Laird between‐study variance estimator and modified Hartung‐Knapp confidence intervals; PMHK, random effects model with Paule‐Mandel
between‐study variance estimator and Hartung‐Knapp‐Sidik‐Jonkman confidence intervals; PMRE, random effects model with Paule‐Mandel
between‐study variance estimator and Wald‐type confidence intervals; YPET, Peto odds ratio method.
Received: 28 September 2017 Revised: 13 December 2017 Accepted: 12 February 2018
DOI: 10.1002/jrsm.1296
Res Syn Meth. 2018;1–16. Copyright © 2018 John Wiley & Sons, Ltd.wileyonlinelibrary.com/journal/jrsm 1
1|INTRODUCTION
Meta‐analyses frequently include only a small number of
studies. An analysis of 14 886 meta‐analyses from the
Cochrane Library found that the median number of stud-
ies in the meta‐analyses
1
was as low as 3. About 50% of
meta‐analyses include 2 or 3 studies, and less than 10%
include 10 or more studies.
2
In some research areas, few
studies in meta‐analyses are rather the rule than the
exception. This particularly applies to the assessment of
new interventions in health technology assessments,
orphan diseases, and meta‐analysis of subgroups (eg, in
stratified medicine).
3,4
In addition to the problem of few
studies for inclusion in the meta‐analysis, the studies can
be small and the number of events can be low (eg, when
considering adverse events).
1
For such application, areas
of sparse data, meta‐analytic techniques are probably most
valuable because they enable collecting the complete
existing evidence. However, the use of inadequate meta‐
analytic methods for sparse data can result in invalid effect
estimates and even in wrong conclusions.
5,6
Meta‐analysis of only a handful of studies poses a
number of challenges.
7
The reason for this is that the cen-
tral limit theorem does not apply when only a few studies
are included in a meta‐analysis. The standard inverse var-
iance random effects meta‐analysis incorporates the
between‐study variation (heterogeneity, t
2
) to estimate
the overall effect. In case of sparse data, the estimation
of t
2
and consequently θcan be highly imprecise.
8
Simulation studies have shown invalid results if the stan-
dard DerSimonian‐Laird method is used for low event
rates and few studies in the meta‐analysis.
2,5,9
Also other
random effects methods for constructing confidence
intervals based on t
2
often perform poorly when only a
few studies are included in the meta‐analysis.
10
Previous work of our group has shown that for meta‐
analysis of rare or even zero events, random effects
models are more accurate than fixed effects models,
especially in a truly heterogeneous situation.
11
In particu-
lar, the beta‐binominal regression model provided valid
pooled effect measures and confidence intervals in this
study.
11
Another simulation study that compared the
statistical properties of different random effects models
suggested that the Hartung‐Knapp‐Sidik‐Jonkman
method outperforms the DerSimonian‐Laird random
effects approach for meta‐analysis of few studies.
12
The
Hartung‐Knapp‐Sidik‐Jonkman method can be overcon-
servative, when less than 5 studies are included in the
meta‐analysis.
13
In sum, choosing the right meta‐analytic
approach for meta‐analysis of 2 to 5 studies is difficult.
9,10
Our aim was thus to compare different frequentist
methods for meta‐analysis including only a small num-
ber of studies (≤5) with a focus on the beta‐binominal
regression and the modified Hartung‐Knapp‐Sidik‐
Jonkman approach.
11,14
2|METHODS
2.1 |Statistical methods for meta‐analysis
We consider situations where 2 interventions are com-
pared in a series of studies i(i=1,…,I) considering
binary outcomes. We are interested in the estimation of
the overall intervention effect θand use the log odds ratio
to quantify the difference of interventions between
groups. The data for each study consist of the interven-
tion effect θ
i
, the sample size in the intervention group
n
iT
, and the sample size in the control group n
iC
(overall
sample size N
i
). In each study, there are some (or zero)
events in the intervention group y
iT
and in the control
group y
ic
. Each study has a specific sampling error 0
i
and a within‐study variance s
i2
.
In the first section, we describe the statistical models
for meta‐analysis that are included in the comparison.
In the second section, we describe the design of the
simulation study. The third section provides an overview
of the measures to assess the statistical properties, and in
the fourth section, we introduce our empirical example
dataset.
We consider 2 different types of models: fixed effects
models and random effects models.
2.1.1 |Fixed effects models
The fixed effects model is based on the assumption that
all studies in the meta‐analysis have a common effect
(θ). The fixed effects model can be written as
15
b
θi¼θþσi0i;0i∼N0;1ðÞ:
The study effects b
θiin study iare distributed about the
common effect with the study‐specific variance (s
i2
) and
the sampling error (0
i
).
Mantel‐Haenszel method
The Mantel‐Haenszel (MaHa) method is a weighted aver-
age of the study‐specific risk ratios or risk differences.
16
The MaHa odds ratio for the overall effect is given by
17
ORMaHa ¼∑I
i¼1b
θiwiMaHaðÞ
∑I
i¼1wiMaHaðÞ
:
The weights are given with w
i(MaHa)
=ziTyiC
Ni
, where z
i
is the number of nonevents in the intervention group.
2MATHES AND KUSS
We included the MaHa (SAS PROC FREQ) method in
our analysis because it performs better than do the
standard fixed effects model in case of sparse data and it
is the standard fixed effects model in Cochrane
Reviews.
17-19
Peto odds ratio method
The Peto odds ratio (YPET) is an inverse variance
approach; ie, the studies are weighted by the inverse of
the study‐specific variance (w
i(FIX)
=1
σ2
i
).
20
The pooled
Peto log odds ratio is estimated as
21
Log ORYPET
ðÞ¼
∑I
i¼1Oi−Ei
∑I
i¼1V2
i
;
where O
i
is the observed number of events, E
i
is the
expected number of events, and V
i
is the variance of their
difference.
We considered the YPET in our analysis because it is
the standard method for meta‐analysis for small interven-
tion effects or very rare events.
19,21
2.1.2 |Inverse variance random effects
models
The random effects models are based on the assumption
that there is no common effect in all studies but that
the mean of the effects of the study population is the
mean of the distribution of the true effect. The study
effects randomly follow (usually) a normal distribution.
In general, the intervention effect in study iof the
random effects model can be expressed as
b
θi¼θiþσiϵi;ϵi∼N0;1ðÞ;θi∼Nθ;t2Þ:ð
The pooled study effect of an inverse random effects
meta‐analysis can be estimated by
b
θR¼∑I
i¼1wiREMðÞ
b
θi
∑I
i¼1wiREM
ðÞ
;
where w
i
are the study‐specific weights. The study
weights are adjusted according to the between‐study
variation. The between‐study variance (t
2
) has to be
estimated for random effects models in addition to the
within‐study variance s
i2
. Actually, the study weights
are the inverse of the sum of the within‐study variance
and the between‐study variance wiREMðÞ
¼1
σ2
iþτ2
.
Various methods to estimate the between‐study
variance t
2
for an inverse random effects model exist
(eg, ordinary least squares and maximum likelihood).
22
In this analysis, we consider the DerSimonian‐Laird
estimator and the Paule‐Mandel estimator
23,24
for
between‐study variance t
2
.
Between‐study variance estimators (DerSimonian/
Laird and Paule/Mandel)
The standard DerSimonian‐Laird estimator is a
noniterative between‐study variance estimator motivated
by the method‐of‐moments principle.
23,25
To be concrete,
t
2DL
is estimated by the following equation:
t2DL ¼0;
Q−I−1ðÞ
∑I
i¼1wiFIX
ðÞ
−
∑I
iw2
iFIXðÞ
∑I
i¼1wiFIXðÞ
8
>
>
>
>
<
>
>
>
>
:
9
>
>
>
>
=
>
>
>
>
;
;
where Qis the heterogeneity statistic (Q‐statistic) given by
Qτ2DL
¼∑I
i¼1wiFIXðÞ
b
θi−b
θF
;
where b
θFis the pooled effect of a fixed effects model. The
weights w
i(FIX)
in this equation are given by the inverse
of the within‐study variance wiFIXðÞ
¼1
σ2
i
;which is, in
general, unknown and has to be replaced by an estimate.
We estimated Qusing PROC GLM.
26
The between‐study variance estimator t
2DL
is most
widely used for random effects meta‐analysis and also
the standard random effects model in Cochrane
Reviews.
19
Therefore, we include the DerSimonian‐Laird
method in the analysis as reference method.
The Paule‐Mandel method is an iterative between‐
study variance estimator.
24
The estimation of t
2PM
is
based on the generalized Q‐statistic.
25,27
This can be
expressed as
25,28
Qτ2PM
¼∑
I
i¼1
wiPMðÞ
b
θi−
b
θτ
2PM
;
where b
θτ
2PM
is given by
b
θτ
2PM
¼∑I
i¼1wiPMðÞ
∑I
i¼1wiPMðÞ
×b
θi
and wiPMðÞ
¼1
σ2
iþτ2PM
·Qτ2PM
ðÞhas an expectation of
I−1 and is solved by iterating t
2PM
until convergence
is reached.
28
The Paule‐Mandel estimator was incorporated in the
analysis because it has been recently recommended in a
review paper on the performance of between‐study
variance estimators.
22
Our calculation of t
2PM
was based
MATHES AND KUSS 3
on algorithm proposed by DerSimonian and Kacker (SAS
macro; see Supporting Information I).
25
We validated our
estimations of t
2
with the respective results of the R
metafor package.
29
Confidence intervals for inverse random effects
models (Wald‐type, Hartung‐Knapp‐Sidik‐Jonkman,
and Hartung‐Knapp modification)
Confidence intervals using the DerSimonian‐Laird
method are constructed with
b
θ±z1−
α
2
×b
σ;
where z
1−
α
2
is the 1−
α
2quantile of the standard normal
distribution (Wald‐type confidence intervals) and the
standard error is given by
b
σ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
∑I
i¼1wiDLðÞ
s:
Hartung‐Knapp and Sidik‐Jonkman suggested an
adjustment factor for the standard error that uses the
quantile of the Student t‐distribution with I−1 degrees
of freedom instead of the Wald‐type interval to calculate
confidence intervals.
30-32
The adjustment factor is calcu-
lated by
q¼1
I−1∑
I
i¼1
wiDLðÞ
b
θi−b
θR
;
where b
θRis the estimated pooled effect from a
DerSimonian‐Laird random effects meta‐analysis. This
leads to the adjusted confidence interval
12
:
b
θ±t
I−1ðÞ;1−
α
2
×qb
σ:
Although in general the confidence interval tends to be
larger than the Wald‐type confidence interval, it can be
smaller, when qis very small. For this reason, Hartung
and Knapp proposed the following ad hoc modification
of q
12,33
:
q*¼max 1;qðÞ:
If t
2PM
is iterated via the Q‐statistic above, then q*
always equals 1 because qalways equals 1 or less if no
solution exists.
12
This means that combining the Paule/
Mandel estimator with Hartung‐Knapp‐Sidik‐Jonkman
(and also the modified Hartung‐Knapp method)
confidence intervals is similar to using a Paule and
Mandel–derived pooled effect estimator with confidence
intervals based on the quantiles of the Student t‐distribu-
tion with I−1 degrees of freedom:
b
θτ
2PM
±t
I−1ðÞ;1−
α
2
×b
στ
2PM
:
The Hartung‐Knapp confidence intervals and its mod-
ification were validated with the respective results of the
R metafor package.
29
We considered the Hartung‐Knapp method and its
modification because of the promising results in various
recent simulation studies also in the case of few
studies.
12,13,34
We combined both between‐study variance estimators
with 3 different methods to estimate the confidence
intervals (Wald‐type, Hartung‐Knapp‐Sidik‐Jonkman,
and Hartung‐Knapp modification).
12,14,25,31
The combination of the different between‐study vari-
ance estimators and confidence intervals results in the
following 5 inverse variance random effects models:
(1) DerSimonian‐Laird between‐study variance estima-
tor and Wald‐type confidence intervals (standard
random effects model, DLRE).
23
(2) DerSimonian‐Laird between‐study variance estima-
tor and Hartung‐Knapp‐Sidik‐Jonkman adjusted
confidence intervals (HKSJ).
23,31,32
(3) DerSimonian‐Laird between‐study variance
estimator and modified Hartung‐Knapp adjusted
confidence intervals (MHK).
14,23
(4) Paule‐Mandel between‐study variance estimator and
Wald‐type confidence intervals (PMRE).
24
(5) Paule‐Mandel between‐study variance estimator and
Hartung‐Knapp‐Sidik‐Jonkman confidence intervals
(PMHK).
24,31
2.1.3 |Beta‐binomial regression
The models described above are 2 stage models. This
means that in the first step, aggregated measures are
calculated for each study separately and in the second,
subsequent step these measures are combined.
35
Opposed
to this, the beta‐binominal regression is a 1 stage model.
This means the analysis is performed in 1 step similar
to individual patient data regression analysis.
35
Moreover,
the beta‐binomial model is a true (study‐specific) random
effects regression model.
11
Beta‐binomial (random effects) regression model
In the beta‐binomial model, we assume that the observed
proportions (p
iC
=y
iC
/n
iC
) in the control group follow a
binominal distribution B(p,n
iC
). Further, we assume
4MATHES AND KUSS
the success probability pto be beta‐distributed with
parameters aand b. The mean and variance of pare
given by E(p)=μ=a/(a+b) and Var(p)=μ(1 −μ)ϑ/
(1 + ϑ), respectively, where ϑ= 1/(a+b). Consequently,
the response y
iC
is beta‐binominally distributed with
mean E(y
iC
)=n
iC
μand variance Var(y
iC
)=n
iC
μ(1−μ)
[1 + (n
iC
−1) ϑ/(1+ϑ)]. The outcomes of 2 observations
from the same study are then correlated with corr(y
iT
,
y
iC
)=q= 1/(a+b+1).
The intervention effect is modelled via a link function
gby g(μ)=b
0
+b
T
x
T
, where x
T
= 1 for the intervention
group and x
T
= 0 for the control group. In our study, we
use the logit link for gto arrive at log odds ratios to mea-
sure the intervention effect.
The beta‐binomial model performed well in a prior
model comparison of meta‐analysis for rare events.
11
This
raises the question whether the beta‐binominal regres-
sion model also performs well for meta‐analysis with a
small number of studies.
Estimation methods and confidence intervals for the
beta‐binomial regression model
We used different parameters (maximum likelihood and
quasi‐likelihood) and confidence interval estimation
methods (Wald‐type and t‐distributed with different num-
bers for the degrees of freedom) for the beta‐binomial
model, resulting in the 6 following implementations of
the beta‐binomial model in the comparison:
(1) Estimation of the beta‐binominal model via maxi-
mum likelihood (SAS PROC NLMIXED).
(a) Combined with Wald‐type confidence intervals
(z
1−
α
2
,1−
α
2quantile of the standard normal dis-
tribution, denoted BBIN in the following).
(b) Combined with confidence intervals using a
t‐distribution with the degrees of freedom equal
to the number of studies (t
IðÞ;1−
α
2
, BB1N).
(c) Combined with confidence intervals using a t‐dis-
tribution with the degrees of freedom equal to
twice the number of studies (t
I×2ðÞ;1−
α
2
, BB2N).
(d) Combined with confidence intervals using a t‐dis-
tribution with the degrees of freedom equal to
twice the number of studies minus 3 to account
for the estimation of the 3 distributional parame-
ters (t
I×2−3ðÞ;1−
α
2
, BBMP).
36,37
(2) Estimation of the beta‐binominal model via quasi‐
likelihood (SAS PROC GLIMMIX). When using the
quasi‐likelihood principle, one specifies only mean
and variance but not the complete beta‐binomial
distribution. We assumed that this would increase
robustness of results with a small loss in efficiency.
We combined this effect estimate with the following:
(a) Wald‐type confidence intervals (BBQL).
(b) Confidence intervals using a t‐distribution with
the degrees of freedom equal to the number of
studies multiplied by 2 minus 3 to account for
the estimation of the 3 distributional parameters
(BBQD).
(3) Estimation of the beta‐binominal model via
maximum likelihood using SAS PROC FMM and
Wald‐type confidence intervals. The advantage of
SAS PROC FMM is that the starting values for the
beta‐distribution are directly estimated by the proce-
dure. Consequently, it is not necessary to estimate
starting values beforehand that facilitates the imple-
mentation of the beta‐binominal model.
Starting values for the beta‐binominal models using
SAS PROC NLMIXED (BBIN,BB1N,BB2N, and BBMP)
were computed from raw proportions, their variances,
and correlations.
2.2 |Simulations
We performed a simulation study to compare the statisti-
cal properties of the different meta‐analytic methods. If
feasible, all true values for the design factors of the
simulation study were chosen from actually performed
meta‐analyses to reflect real‐life conditions as best as
possible. The most suitable source to get true values for
the simulation was the review of Turner et al, which
analysed 1991 systematic reviews from the Cochrane
Database of Systematic Reviews.
1
Overall this review
included 14 886 meta‐analyses (which included at least
2 studies) of dichotomous outcomes on 77 237 single
studies.
2.2.1 |Design of simulation
We generated 10 000 meta‐analyses for each simulation
scenario.
The focus of our study was to assess the performance
of the meta‐analytic methods for pooling results of ≤5
studies. For the main analysis, we compared the methods
generating meta‐analysis with 2, 3, 4, and 5 included
studies.
3
In a supplemental analysis, we considered sce-
narios with 10, 15, 20, 30, and 50 studies to get an impres-
sion how the compared methods behave, if the number of
studies included in the meta‐analysis increases.
We generated event probabilities, sample sizes, and
heterogeneity estimates for each study included in the
MATHES AND KUSS 5
meta‐analysis based on the Turner data.
1
Furthermore,
we explicitly varied some parameters to check the robust-
ness of the results.
Table 1 shows the input parameters for the simula-
tion, including the information whether these were
varied implicitly based on distributional assumptions or
varied explicitly based on fixed values. In addition, the
table provides the source/rational of the parameter choice
and whether the scenarios are the base case analysis or
are a sensitivity analysis for assessing the robustness of
the results.
The combination of the different (random effects/
fixed effects, H0/H1) scenarios and of the number of
studies led to 36 base case simulation scenarios (360.000
meta‐analysis) in total. We performed the sensitivity
analysis (one randomly selected study 10 times larger,
TABLE 1 Description of the simulation
Distributional Assumptions Fixed Values
Parameter Input Parameter Specification
Description of
Resulting Data
Input Parameter
Specification
Base case analyses
Sample size of single study
a
Generated from a log‐normal
distribution with mean = 4.615
and SD = 1.1
Median = 103
Q1 = 50
Q3 = 204
Allocation of sample size to
control and intervention group
(balance of study size between
groups)
Random allocation with
probability 0.5
Event probabilities control group
a
Generated from a beta‐distribution
with α= .4230 and β= 1.433
Mean = 0.223
Median = 0.126
SD = 0.256
Effect (event probabilities
intervention group)
Generated from a standard
inverse‐variance random effects
model (t
2
see column fixed and
random effects scenario)
a
Medium effect:
OR = 0.684
Events control/intervention group Binominal draw
Fixed effects scenario t
2
=0
Random effects scenario
a
Generated from a log‐normal
with mean t
2
=−1.47,
SD = 1.65, skewness = −0.55
b
Median
t
2
= 0.274
25% percentile
t
2
= 0.079
75% percentile
t
2
= 0.806
c
Mean I
2
= 17%,
SD = 30, range
0‐99%
Sensitivity analyses
d
Balance of study size
13
One randomly selected
study 10 times larger
Event probabilities control group
d
Low = 0.1
High = 0.5
Effect (event probabilities
intervention group)
d
a
Large effect OR = 0.466
a
Small effect OR = 0.855
Abbreviations: OR, odds ratio; Q1, first quartile, 25% percentile; Q3, third quartile, 75% percentile; SD, standard deviation.
a
Informed by Turner et al.
1
b
Based on Fleishman power transformation.
c
Under H0.
d
Only simulated in the random effects scenario for 2 to 5 studies.
6MATHES AND KUSS
small intervention effect, large intervention effect, low
baseline event probability, and high baseline event prob-
ability in the control group) only for the setting with 2
to 5 studies, random effects scenario under H1 (medium
effect), leading to 20 additional simulation scenarios.
The simulation code is available in Supporting
Information I.
2.2.2 |Measures to assess performance of
methods
We performed all comparisons of the effect estimates on
the log odds ratio scale. We estimated median bias and
empirical coverage of the 95% confidence interval to
assess the statistical properties.
11
In the medium effect
scenario, we also calculated the empirical power for all
methods. Moreover, we counted the number of
completely missing pooled effect estimates, to judge the
numerical robustness of the compared methods. For this
analysis, we present the number of converged runs. The
analysis code is available in Supporting Information II.
3|RESULTS
3.1 |Base case analysis
We present the results separately for the fixed effects and
the random effects situation.
First, we describe the statistical properties in the no
effect scenario. Second, we describe the results of the sim-
ulation in the medium effect scenario. Here, we focus on
describing the difference in the performance compared
with H0 for the meta‐analytic methods that performed well
regarding coverage under H0 and on the power analysis.
3.1.1 |No effect scenario
As was to be expected for almost all methods, the conver-
gence increased with increasing number of studies in the
meta‐analysis (Figures 1 and 2; see data in Table S1 in
Supporting Information III). The main reason is that the
number of meta‐analysis without any events decreased.
The beta‐binomial models that were estimated with
quasi‐likelihood (BBQD and BBQL) showed the tendency
of a decreasing convergence with increasing number of
studies. This is because the convergence criterion of the
procedure (SAS PROC GLIMMIX) was more often not
satisfied and thus the procedures stop without providing
parameter estimates. The reason is probably that with
an increasing number of studies, the estimation of param-
eters becomes more challenging, when the model has to
be estimated only based on means and variances.
The lines of BBIN, BB1N, BB2N, and BBMP as well as
BBQD and BBQL follow the same pattern because the log
odds ratio is estimated with the same procedure. Also the
lines for DLRE, HKSJ, MKH, PMRE, and PMHK show
the same shape because all meta‐analyses without any
event in one arm are not estimable.
For meta‐analysis of few studies, the beta‐binominal
models with a priori estimated starting values were most
robust (BBIN, BB1N, BB2N, and BBMP) followed by
YPET. All other methods performed quite similar with
about 15% to 20% nonconverged meta‐analyses in the
few study situations.
All methods showed small median bias even in the
few study scenarios (Figures 3 and 4; see data in Table
S1 in Supporting Information III). The strongest negative
median bias was −0.047 (BBIN, BB1N, BB2N, and
BBMP), which corresponds to an odds ratio of 0.95. The
strongest positive bias could be also recognized in the
random effects scenario with 0.022 (BBQL and BBQD),
which corresponds to an odds ratio of 1.02.
The shape of the lines was exactly the same for BBIN,
BB1N, BB2N, and BBMP; BBQD and BBQL; DLRE, HKSJ,
and MKH; and PMRE and PMHK because these methods
are based on the same point estimators for the effect.
Bias of the effect estimators based on the
DerSimonian‐Laird heterogeneity variance estimator
FIGURE 1 Converged runs (H0 FEM)
MATHES AND KUSS 7
(DLRE, HKSJ, and MKH) was very similar to bias of
effect estimators based on the Paule‐Mandel heterogene-
ity variance estimator (PMRE and PMHK).
In the fixed effects scenario, almost all methods
satisfied the empirical mean coverage to the 95% level
or were only marginally below this level (Figure 5; see
data in Table S1 in Supporting Information III).
Only the HKSJ method fell heavily below the
95% coverage level.
Again, in the random effects situation, only BBMP,
BB1N, BBQD, MKH, and PMHK satisfied or nearly satis-
fied the 95% empirical coverage level in the scenarios
with few studies (Figure 6). In particular, the fixed effects
models (MaHa and YPET) as well as DLRE and HKSJ
showed unacceptable low coverage probability in the ran-
dom effects situations. BBQD, BBQL, DLRE, HKSJ, and
MKH showed also very low coverage probability in situa-
tions with more than 5 studies included in the meta‐
analysis.
3.1.2 |Medium effect scenario
In general, the results for converged runs under H1 were
quite similar to the results under H0 (Table S1 in
Supporting Information IV).
Overall bias was small also under the alternative
(Figures S1 and S2 in Supporting Information V). Bias
was strongest in the random effects scenario. Maximum
FIGURE 2 Median bias (H0 FEM)
8MATHES AND KUSS
FIGURE 3 Mean empirical coverage (H0 FEM)
FIGURE 4 Converged runs (H0 REM)
MATHES AND KUSS 9
negative bias occurred using BBIN, BB1N, BB2N, and
BBMP with a log odds ratio of −0.06 (odds ratio, 0.94).
The effect estimates based on the DerSimonian‐Laird
heterogeneity variance estimator and the Paule‐Mandel
heterogeneity variance estimator were strongest posi-
tively biased (log odds ratio, 0.074 and 0.075, respectively;
odds ratio, 1.08 in both methods).
Also the results for empirical coverage probability for
meta‐analysis including few studies were similar to the
results of the no effect scenario. As in the zero effect
scenario, all but HKSJ satisfied the mean empirical
coverage to the 95% level in the fixed effects, few studies
scenarios (Figure S3 in Supporting Information V).
BBMP, BB1N, and BBQD as well as MKH and PMHK
satisfied the coverage probability throughout in all
random effects situations (Figure S4 in Supporting
Information V). BB1N was only marginally below the
empirical 95% coverage level in the scenarios with 2 to
5 studies (worst case: empirical coverage probabil-
ity = 0.926). In the random effects scenarios, in meta‐
analysis including more than ≥20 studies, not only DLRE
and HKSJ had a very low coverage probability but also
MKH. The coverage probability in the random effects,
large study scenarios was below 0.7 throughout for the
fixed effects models (MaHa and YPET).
Power depended strongly on the number of studies in
the meta‐analysis (Figures 7 and 8). The fixed effects
models and HKSJ showed highest median power for
FIGURE 5 Median bias (H0 REM)
10 MATHES AND KUSS
meta‐analysis of ≤5 studies. Also with BBFM, BBIN,
BBQL, and DLRE, there was a small probability to detect
a statistical significant difference in the few study meta‐
analysis, in the fixed effects as well as the random effects
situations. All other methods had a power close to zero
in the meta‐analyses of 2 studies. It should be regarded
that the power of the methods showing favourable
results considering power might be overestimated
because these methods were the less robust methods
(BBFM, BBQL, BBQD, and MaHa); ie, challenging
situations with low event rates are not estimable and
consequently are not included in the median power
analysis. In meta‐analysis of ≥10 studies, power of all
methods was quite similar.
3.2 |Sensitivity analysis
We performed the sensitivity analysis only for the
random effects, few study scenario because this is most
challenging and most relevant for practice. In the
sensitivity analysis, we include only BB1N and BBMP
because these beta‐binominal models performed best in
the base case analysis.
3.2.1 |One large study
The empirical coverage probability of DLRE, HKSJ, and
the fixed effects models (YPET and MaHa) further
decreased compared with the results of the random
FIGURE 6 Mean empirical coverage (H0 REM)
MATHES AND KUSS 11
effects situation under H0 with similar study sizes. The
empirical coverage probability of BB1N fell slightly below
the 95% confidence interval.
Figures for the sensitivity analysis on unbalanced
study size are given in Supporting Information VI
(Figures S1‐S3).
FIGURE 7 Mean power (H1 FEM)
FIGURE 8 Mean power (H1 REM)
12 MATHES AND KUSS
3.2.2 |Different intervention effect sizes
Using a small as well as a large intervention effect did not
alter bias and empirical coverage as compared with the
medium effect scenario. As expected, power increased in
the large intervention effect scenario and decreased in
the small intervention effect scenario.
Figures S4 to S6 in Supporting Information VI show
the results for the small intervention effect and Figures
S7 to S9 the results for large intervention effect.
3.2.3 |Different baseline risks in the con-
trol group
Using a low baseline as well as high baseline event prob-
ability in the control group did also not change bias and
empirical coverage as compared with the medium effect
scenario. Also as expected power increased in the high
baseline event probability scenario and decreased in the
low baseline event probability scenario.
Figures for the sensitivity analysis on the baseline risk
in the control group are available in Supporting Informa-
tion VI: Figures S10 to S12 are for the low baseline risk
and Figures S13 to S15 are for the high baseline risk.
4|DISCUSSION
This simulation shows that valid statistical methods for
meta‐analysis of few studies are available. Our simulation
was designed based on empirical data to reproduce real‐
world situations. Considering the frequency of meta‐
analysis including only few studies, our results are highly
relevant for applied research.
1
This is in particular true,
in areas where few study meta‐analysis is the normal case
like rare diseases, health technology assessment, sub-
group, or sensitivity analysis.
The standard approach for meta‐analysis is still the
DerSimonian‐Laird method. Our simulation shows that
the DerSimonian‐Laird method was heavily above the
5% type I error rate in all random effects situations.
Consequently, it can be supposed that a large number of
false positive meta‐analysis exist.
5,13
False positive meta‐
analysis results might have serious consequences for
clinical and health policy decision‐making because
results of meta‐analysis build the basis for statements in
clinical practice guidelines and health technology
assessments.
38,39
On the one hand, using adequate methods is very
important to avoid incorrect conclusions.
2,11
On the other
hand, the power has a central role because meta‐analysis
of few studies often agrees with future results of meta‐
analysis including more studies.
40
Very conservative
methods might lead to unnecessary research or rejection
of useful new interventions by health technology assess-
ment agencies. The bias of all methods was very small
in all simulation scenarios. Therefore, the main issue to
judge the validity of methods here, to arrive at
recommendations on the right meta‐analysis method, is
the critical balance between the empirical coverage
probability and power.
In the fixed effects scenario, the classical fixed effects
models (MaHa and YPET) had the best statistical
properties. If the assumption of a fixed effects model
can be reliably justified, ie, the included studies are clin-
ically homogenous, then fixed effects models seem prefer-
able. On the one hand, this suggestion is supported by the
fact that a sufficient estimation of heterogeneity is diffi-
cult in meta‐analysis of few studies.
2
On the other hand,
caution is needed in the case of small studies because
these are often more heterogeneous than large studies.
41
The beta‐binominal model was most robust and can be
recommended as the first choice in challenging situations
(few studies, rare events), in particular for the case that
the other models cannot estimate effects.
11
In the random effects situation, only MKH, PMHK,
and BBMP satisfied the type I error rate. As in prior
studies, in particular, MKH and PMHK were very conser-
vative for 2 to 3 studies in the meta‐analysis also in the
random effects situations.
12,15,34
These overly conserva-
tive confidence intervals resulted in a power almost of
zero. The beta‐binominal model (BB1N) and PMRE seem
a good alternative for meta‐analysis including only a few
studies because both have a higher power and fell only
slightly below the 95% coverage probability.
In general, using Student t‐distributed confidence
intervals accounts better for the uncertainty that is
induced by estimating the between‐study variance. The
termination of the optimal number of degrees of freedom
for the Student t‐distribution is difficult because it
depends on between‐and within‐study variances and
the number of included studies in the meta‐analysis.
8
Our simulation indicates that for the number of degrees
of freedom, BB1N (number of studies minus 1 degree of
freedom) is the best choice, whereas BBMP (2 times num-
ber of studies minus 3 degrees of freedom) is more con-
servative but still valid.
Prior studies have recommended HKSJ.
9,13,15,42
In our
simulation, the HKSJ was heavily below the specified
type I error level in all situations. This occurs when the
correction factor qis arbitrary small.
12
The risk is higher
in the case of small event rates because results are more
homogeneous than expected under a random effects
model and in the case that the standard errors of included
studies are heterogeneous.
18
Considering that our simula-
tion is based on empirical data, there seems to be a very
high risk of anticonservative results in practice when
MATHES AND KUSS 13
HKSJ is used.
12
Thus, we also discourage the use of the
original HKSJ for few as well as many studies in meta‐
analysis.
43
DLRE and HKSJ showed very low empirical coverage
levels, also in the meta‐analysis of more than 20 studies,
suggesting that heterogeneity is underestimated by these
methods. Also MKH, which has been recently suggested
as new standard methods for random effects meta‐analy-
sis, showed unsatisfactory empirical coverage.
43
The
reason might be that if true heterogeneity exists, these
methods might fail to detect it (the value of t
2
being
zero).
2
The impact is stronger with increasing number
of studies in the meta‐analysis because the influence of
t
2
is missing in the weight of each missed multiple time
in the calculation of the total variance.
We could show that also in the case of meta‐analysis
including more studies (≥20), statistical properties of 2
stage random effects meta‐analyses can be substantially
improved as compared with the standard DerSimonian‐
Laird method by combining the Paule‐Mandel heteroge-
neity variance estimator with Hartung‐Knapp confidence
intervals.
Our finding on the Paule‐Mandel estimator disagrees
with a study that compared 20 heterogeneity variance
estimators. In this study, it is concluded that the Paule‐
Mandel estimator “provides good estimation behaviour
but not markedly better than the other alternatives.”
44
In our study, the Paule‐Mandel estimator outperformed
the DerSimonian‐Laird method in all scenarios, in partic-
ular if combined with t‐distributed confidence intervals.
The main reasons are probably that in contrast to our
simulation only scenarios with 5 to 30 studies were
simulated and only confidence intervals based on the
normal approximation were used in the comparison.
Moreover, the simulation was not informed by real‐life
meta‐analysis and consequently is very different to our
simulation (eg, in heterogeneity, effect sizes).
Our findings were quite robust across all sensitivity
analysis. Neither the size of the intervention effect nor
the baseline risk in the control group had considerable
influence on the results. We varied the distribution of
study sizes in our simulation because these are common
scenarios and other studies have shown that this can
influence the statistical performance.
12,13
In our
simulation, we could not ascertain the finding by IntHout
et al that different study sizes had a strong influence on
the empirical coverage probability.
13
We found only a
very slight negative effect on coverage probability of the
2 stage random effects models in the random effects
situations.
Our study has some limitations.
First, we did not explicitly vary the distribution of
group size in our simulation because most studies apply
equal allocation ratios. An unbalanced size of study arms
can influence the performance of the statistical methods.
7
Our results might not pertain fully to meta‐analysis of
studies with unbalanced sample sizes (eg, different alloca-
tion ratios and nonrandomized studies).
Second, heterogeneity is often hard to detect in prac-
tice.
2
As we used real‐world heterogeneity estimates in
the simulation, we might have tended to underestimate
heterogeneity, ie, overestimated the number of meta‐
analysis with zero heterogeneity as well.
Third, we did not consider Bayesian methods, which
also have shown promising results for meta‐analysis of
few studies in recently published studies.
24,34,45,46
In
particular, if appropriate (weakly) informative prior infor-
mation exists, such Bayes methods could be consid-
ered.
22,47
Simulations have shown that Bayes methods
might increase power in few study meta‐analysis while
holding the type I error rate compared with other
methods (eg, PMHK).
34
Moreover, Bayes methods
become more interesting for applied scientists because
recently approximated methods were developed that
facilitate their application compared with full Bayesian/
Markov chain Monte Carlo approaches.
47
Future studies should directly compare the well
performing frequentist methods with well performing
Bayesian methods for meta‐analysis including few stud-
ies. Future research should also assess extensions of the
beta‐binominal models. In meta‐analysis including more
than 5 studies, beta‐binominal models that model the
beta‐binomial distributions for control and intervention
groups separately and link intervention and control
groups from the same study by a random effects (eg, esti-
mated from DLRE) also showed satisfactorily statistical
properties.
48
An advantage of such beta‐binominal
models is that the randomization is not broken.
5|CONCLUSION AND
RECOMMENDATIONS FOR
PRACTICE
If only a few studies for inclusion in a meta‐analysis are
available, then the choice of the right method is challeng-
ing because of the narrow path regarding the balance
between correct empirical coverage of the 95% confidence
interval and power.
In fixed effects situations, we recommend the classical
fixed effects models (MaHa and YPET). A premise is that
the assumption of a fixed effects model (common effect of
all studies) must be duly justified. If the standard fixed
effects models are not converging (eg, because of many
double zero studies), then the more robust BB1N or
BBMP is an alternative instead.
14 MATHES AND KUSS
In random effects situations, only MKH, PMHK, and
BBMP kept the type I error rate. Results of these models
can be judged reliable for estimating intervention effects.
We discourage the use of other inverse variance random
effects models than MKH and PMHK in situations where
strictly keeping the type I error rate is important. There is
a high risk not to detect a true existing difference using
these methods because of their very low power, in
particular for meta‐analysis of 2 to 3 studies. Thus, if
one is willing to accept a slightly higher type I error rate,
we recommend using BB1N and PMRE.
An advantage of the beta‐binominal models (BB1N
and BBMP) is their robustness and ability to include the
information from single and double zero studies.
11
We
recommend these models for pooling rare events.
ACKNOWLEDGEMENT
This work was not funded.
ORCID
Tim Mathes http://orcid.org/0000-0002-5304-1717
REFERENCES
1. Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT.
Predicting the extent of heterogeneity in meta‐analysis, using
empirical data from the Cochrane Database of Systematic
Reviews. Int J Epidemiol. 2012;41(3):818‐827.
2. Kontopantelis E, Springate DA, Reeves D. A re‐analysis of the
Cochrane Library data: the dangers of unobserved heterogeneity
in meta‐analyses. PLoS One. 2013;8(7):e69930.
3. Davey J, Turner RM, Clarke MJ, Higgins JP. Characteristics of
meta‐analyses and their component studies in the Cochrane
Database of Systematic Reviews: a cross‐sectional, descriptive
analysis. BMC Med Res Methodol. 2011;11(1):1‐11.
4. Borenstein M, Higgins JP. Meta‐analysis and subgroups. Prev
Sci. 2013;14(2):134‐143.
5. Shuster JJ, Walker MA. Low‐event‐rate meta‐analyses of
clinical trials: implementing good practices. Stat Med.
2016;35(14):2467‐2478.
6. Friedrich JO, Adhikari NK, Beyene J. Inclusion of zero total
event trials in meta‐analyses maintains analytic consistency
and incorporates all available data. BMC Med Res Methodol.
2007;7(1):1‐6.
7. Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado
about nothing: a comparison of the performance of meta‐analyt-
ical methods with rare events. Stat Med. 2007;26(1):53‐77.
8. Higgins JPT, Thompson SG, Spiegelhalter DJ. A re‐evaluation
of random‐effects meta‐analysis. J Royal Stat Soc Series A,
(Statistics in Society). 2009;172(1):137‐159.
9. Guolo A, Varin C. Random‐effects meta‐analysis: the number of
studies matters. Stat Methods Med Res. 2015.
10. Jackson D, Bowden J, Baker R. How does the DerSimonian and
Laird procedure for random effects meta‐analysis compare with
its more efficient but harder to compute counterparts? J Stat
Plann Infer. 2010;140(4):961‐970.
11. Kuss O. Statistical methods for meta‐analyses including informa-
tion from studies without any events—add nothing to nothing
and succeed nevertheless. Stat Med. 2015;34(7):1097‐1116.
12. Röver C, Knapp G, Friede T. Hartung‐Knapp‐Sidik‐Jonkman
approach and its modification for random‐effects meta‐analysis
with few studies. BMC Med Res Methodol. 2015;15(1):1‐7.
13. IntHout J, Ioannidis JP, Borm GF. The Hartung‐Knapp‐Sidik‐
Jonkman method for random effects meta‐analysis is
straightforward and considerably outperforms the standard
DerSimonian‐Laird method. BMC Med Res Methodol. 2014;
14(1):1‐12.
14. Hartung J, Knapp G. A refined method for the meta‐analysis of
controlled clinical trials with binary outcome. Stat Med.
2001;20(24):3875‐3889.
15. Wiksten A, Rucker G, Schwarzer G. Hartung‐Knapp method is
not always conservative compared with fixed‐effect meta‐
analysis. Stat Med. 2016;35(15):2503‐2515.
16. Mantel N, Haenszel W. Statistical aspects of the analysis of data
from retrospective studies of disease. J Natl Cancer Inst.
1959;22(4):719‐748.
17. Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for
examining heterogeneity and combining results from several
studies in meta‐analysis. In: Systematic Reviews in Health Care.
BMJ Publishing Group; 2008:285‐312.
18. Sweeting J, Sutton MAJ, Lambert PC. What to add to nothing?
Use and avoidance of continuity corrections in meta‐analysis
of sparse data. Stat Med. 2004;23(9):1351‐1375.
19. Deeks JJ, Higgins J, Altman DG. Analysing Data and
Undertaking Meta‐Analyses.Cochrane handbook for systematic
reviews of interventions: Cochrane book series; 2008:243‐296.
20. Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade
during and after myocardial infarction: an overview of the
randomized trials. Prog Cardiovasc Dis. 1985;27(5):335‐371.
21. Brockhaus AC, Bender R, Skipka G. The Peto odds ratio viewed
as a new effect measure. Stat Med. 2014;33(28):4861‐4874.
22. Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to
estimate the between‐study variance and its uncertainty in
meta‐analysis. Res Synth Meth. 2016;7(1):55‐79.
23. DerSimonian R, Laird N. Meta‐analysis in clinical trials. Control
Clin Trials. 1986;7(3):177‐188.
24. Paule RC, Mandel J. Consensus values and weighting factors.
J Res Natl Bur Stand. 1982;87(5):377‐385.
25. DerSimonian R, Kacker R. Random‐effects model for meta‐
analysis of clinical trials: an update. Contemp Clin Trials.
2007;28(2):105‐114.
26. Whitehead A. Estimating the treatment difference in an
individual trial. In: Meta‐Analysis of Controlled Clinical Trials.
John Wiley & Sons, Ltd; 2003:23‐55.
27. Rukhin AL, Biggerstaff BJ, Vangel MG. Restricted maximum
likelihood estimation of a common mean and the Mandel–Paule
algorithm. J Stat Plann Infer. 2000;83(2):319‐330.
MATHES AND KUSS 15
28. Bowden J, Tierney JF, Copas AJ, Burdett S. Quantifying,
displaying and accounting for heterogeneity in the meta‐
analysis of RCTs using standard and generalised Qstatistics.
BMC Med Res Methodol. 2011;11(1):41.
29. Viechtbauer, W., R Metafor Package. 2016.
30. Hartung J. An alternative method for meta‐analysis. Biom J.
1999;41(8):901‐916.
31. Hartung J, Knapp G. On tests of the overall treatment effect in
meta‐analysis with normally distributed responses. Stat Med.
2001;20(12):1771‐1782.
32. Sidik K, Jonkman JN. A simple confidence interval for meta‐
analysis. Stat Med. 2002;21(21):3153‐3159.
33. Knapp G, Hartung J. Improved tests for a random effects
meta‐regression with a single covariate. Stat Med. 2003;
22(17):2693‐2710.
34. Friede T, Röver C, Wandel S, Neuenschwander B. Meta‐analysis
of few small studies in orphan diseases. Res Synth Meth. 2016.
p. n/a‐n/a
35. Burke DL, Ensor J, Riley RD. Meta‐analysis using individual
participant data: one‐stage and two‐stage approaches, and why
they may differ. Stat Med. 2017;36(5):855‐875.
36. Berkey CS, Hoaglin DC, Mosteller F, Colditz GA. A random‐
effects regression model for meta‐analysis. Stat Med.
1995;14(4):395‐411.
37. Raghunathan TE, Yoichi II. Analysis of binary data from a
multicentre clinical trial. Biometrika. 1993;80(1):127‐139.
38. Stephens JM, Handke B, Doshi JA. International survey of
methods used in health technology assessment (HTA): does
practice meet the principles proposed for good research? 2012.
2012;2:29‐44.
39. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1.
Introduction‐GRADE evidence profiles and summary of find-
ings tables. J Clin Epidemiol. 2011;64(4):383‐394.
40. Herbison P, Hay‐Smith J, Gillespie WJ. Meta‐analyses of small
numbers of trials often agree with longer‐term results. J Clin
Epidemiol. 2011;64(2):145‐153.
41. IntHout J, Ioannidis JPA, Borm GF, Goeman JJ. Small studies
are more heterogeneous than large ones: a meta‐meta‐analysis.
J Clin Epidemiol. 2015;68(8):860‐869.
42. Partlett C, Riley RD. Random effects meta‐analysis: coverage
performance of 95% confidence and prediction intervals
following REML estimation. Stat Med. 2017;36(2):301‐317.
43. Jackson D, Law M, Rücker G, Schwarzer G. The Hartung‐Knapp
modification for random‐effects meta‐analysis: a useful refine-
ment but are there any residual concerns? Stat Med.
2017;36(25):3923‐3934.
44. Petropoulou M, Mavridis D. A comparison of 20 heterogeneity
variance estimators in statistical synthesis of results from
studies: a simulation study. Stat Med. 2017;36(27):4266‐4280.
45. Bodnar O, Link A, Arendacká B, Possolo A, Elster C. Bayesian
estimation in random effects meta‐analysis using a non‐
informative prior. Stat Med. 2017;36(2):378‐399.
46. Friede T, Röver C, Wandel S, Neuenschwander B. Meta‐analysis
of two studies in the presence of heterogeneity with applications
in rare diseases. Biom J. 2016. p. n/a‐n/a
47. Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ,
Higgins JPT. Implementing informative priors for heterogeneity
in meta‐analysis using meta‐regression and pseudo data. Stat
Med. 2016;35(29):5495‐5511.
48. Bakbergenuly I, Kulinskaya E. Beta‐binomial model for meta‐
analysis of odds ratios. Stat Med. 2017. p. n/a‐n/a
SUPPORTING INFORMATION
Additional Supporting Information may be found online
in the supporting information tab for this article.
How to cite this article: Mathes T, Kuss O. A
comparison of methods for meta‐analysis of a small
number of studies with binary outcomes. Res Syn
Meth. 2018;1–16. https://doi.org/10.1002/jrsm.1296
16 MATHES AND KUSS