ArticlePDF Available

A comparison of methods for meta-analysis of a small number of studies with binary outcomes

March 2018
Research Synthesis Methods 9(3)

March 2018
9(3)

DOI:10.1002/jrsm.1296

Authors:

Tim Mathes

University Medical Center Goettingen

Oliver Kuss

German Diabetes Center

Meta‐analyses often include only a small number of studies (≤5). Estimating between study heterogeneity is difficult in this situation. An inaccurate estimation of heterogeneity can result in biased effect estimates and too narrow confidence intervals. The beta‐binominal model has shown good statistical properties for meta‐analysis of sparse data. We compare the beta‐binominal model with different inverse variance random (e.g., DerSimonian‐Laird, modified Hartung‐Knapp, Paule‐Mandel) and fixed effect methods (Mantel‐Haenszel, Peto) in a simulation study. The underlying true parameters were obtained from empirical data of actually performed meta‐analyses to best mirror real life situations. We show that valid methods for meta‐analysis of a small number of studies are available. In fixed effect situations the Mantel‐Haenszel and Peto method performed best. In random effect situations the beta‐binominal model performed best for meta‐analysis of few studies considering the balance between coverage probability and power. We recommended the beta‐binominal model for practical application. If very strong evidence is needed, using the Paule‐Mandel heterogeneity variance estimator combined with modified Hartung‐Knapp confidence intervals might be useful to confirm the results. Notable, most inverse variance random effects models showed unsatisfactory statistical properties also if more studies (10‐50) were included in the meta‐analysis.

Converged runs (H0 FEM)

…

Description of the simulation

Mean empirical coverage (H0 REM)

…

Figures - uploaded by Tim Mathes

Content may be subject to copyright.

Content uploaded by Tim Mathes

Content may be subject to copyright.

RESEARCH ARTICLE

A comparison of methods for meta‐analysis of a small

number of studies with binary outcomes

Tim Mathes

1,2

| Oliver Kuss

3,4

Institute for Research in Operative

Medicine, Witten/Herdecke University,

Ostmerheimer Str 200, Building 38, 51109

Cologne, Germany

Institute of Medical Biometry and

Informatics, University of Heidelberg,

Heidelberg, Germany

Institute for Biometrics and

Epidemiology, German Diabetes Center,

Leibniz Institute for Diabetes Research at

Heinrich Heine University Düsseldorf,

Düsseldorf, Germany

Institute of Medical Statistics, Düsseldorf

University Hospital and Medical Faculty,

Heinrich Heine University Düsseldorf,

Düsseldorf, Germany

Correspondence

Tim Mathes, Institute for Research in

Operative Medicine, Witten/Herdecke

University, Ostmerheimer Str 200,

Building 38, 51109 Cologne, Germany.

Email: tim.mathes@uni‐wh.de

Meta‐analyses often include only a small number of studies (≤5). Estimating

between‐study heterogeneity is difficult in this situation. An inaccurate

estimation of heterogeneity can result in biased effect estimates and too narrow

confidence intervals. The beta‐binominal model has shown good statistical

properties for meta‐analysis of sparse data. We compare the beta‐binominal

model with different inverse variance random (eg, DerSimonian‐Laird,

modified Hartung‐Knapp, and Paule‐Mandel) and fixed effects methods

(Mantel‐Haenszel and Peto) in a simulation study. The underlying true param-

eters were obtained from empirical data of actually performed meta‐analyses to

best mirror real‐life situations. We show that valid methods for meta‐analysis

of a small number of studies are available. In fixed effects situations, the

Mantel‐Haenszel and Peto methods performed best. In random effects

situations, the beta‐binominal model performed best for meta‐analysis of few

studies considering the balance between coverage probability and power. We

recommended the beta‐binominal model for practical application. If very

strong evidence is needed, using the Paule‐Mandel heterogeneity variance

estimator combined with modified Hartung‐Knapp confidence intervals might

be useful to confirm the results. Notable most inverse variance random effects

models showed unsatisfactory statistical properties also if more studies (10‐50)

were included in the meta‐analysis.

KEYWORDS

few studies, heterogeneity variance estimators, meta‐analysis, simulation study

Abbreviations: BB1N, beta‐binomial regression (maximum likelihood) with confidence intervals using a Student t‐distribution with the degrees of

freedom equals the number of studies; BB2N, beta‐binomial regression (maximum likelihood) with confidence intervals using a Student t‐

distribution with the degrees of freedom are the number of studies multiplied by 2; BBFM, estimation of the beta‐binomial using PROC FM; BBIN,

beta‐binomial regression (maximum likelihood) with Wald‐type confidence intervals; BBMP, beta‐binomial regression (maximum likelihood) with

confidence intervals using a Student t‐distribution with the degrees of freedom are the number of studies multiplied by 2 minus 3 to account for

the estimation of the 3 distributional parameters; BBQD, beta‐binomial regression (quasi‐likelihood) with Student t‐distribution with the degrees of

freedom are the number of studies multiplied by 2 minus 3 to account for the estimation of the 3 distributional parameters; BBQL, beta‐binomial

regression (quasi‐likelihood) with Wald‐type confidence intervals; DLRE, random effects model with DerSimonian‐Laird between‐study variance

estimator and Wald‐type confidence intervals; HKSJ, random effects model with DerSimonian‐Laird between‐study variance estimator and

Hartung‐Knapp‐Sidik‐Jonkman confidence intervals; MaHa, Mantel‐Haenszel fixed effects model; MKH, random effects model with DerSimonian‐

Laird between‐study variance estimator and modified Hartung‐Knapp confidence intervals; PMHK, random effects model with Paule‐Mandel

between‐study variance estimator and Hartung‐Knapp‐Sidik‐Jonkman confidence intervals; PMRE, random effects model with Paule‐Mandel

between‐study variance estimator and Wald‐type confidence intervals; YPET, Peto odds ratio method.

Received: 28 September 2017 Revised: 13 December 2017 Accepted: 12 February 2018

DOI: 10.1002/jrsm.1296

1|INTRODUCTION

Meta‐analyses frequently include only a small number of

studies. An analysis of 14 886 meta‐analyses from the

Cochrane Library found that the median number of stud-

ies in the meta‐analyses

was as low as 3. About 50% of

meta‐analyses include 2 or 3 studies, and less than 10%

include 10 or more studies.

In some research areas, few

studies in meta‐analyses are rather the rule than the

exception. This particularly applies to the assessment of

new interventions in health technology assessments,

orphan diseases, and meta‐analysis of subgroups (eg, in

stratified medicine).

3,4

In addition to the problem of few

studies for inclusion in the meta‐analysis, the studies can

be small and the number of events can be low (eg, when

considering adverse events).

For such application, areas

of sparse data, meta‐analytic techniques are probably most

valuable because they enable collecting the complete

existing evidence. However, the use of inadequate meta‐

analytic methods for sparse data can result in invalid effect

estimates and even in wrong conclusions.

5,6

Meta‐analysis of only a handful of studies poses a

number of challenges.

The reason for this is that the cen-

tral limit theorem does not apply when only a few studies

are included in a meta‐analysis. The standard inverse var-

iance random effects meta‐analysis incorporates the

between‐study variation (heterogeneity, t

) to estimate

the overall effect. In case of sparse data, the estimation

of t

and consequently θcan be highly imprecise.

Simulation studies have shown invalid results if the stan-

dard DerSimonian‐Laird method is used for low event

rates and few studies in the meta‐analysis.

2,5,9

Also other

random effects methods for constructing confidence

intervals based on t

often perform poorly when only a

few studies are included in the meta‐analysis.

Previous work of our group has shown that for meta‐

analysis of rare or even zero events, random effects

models are more accurate than fixed effects models,

especially in a truly heterogeneous situation.

In particu-

lar, the beta‐binominal regression model provided valid

pooled effect measures and confidence intervals in this

study.

Another simulation study that compared the

statistical properties of different random effects models

suggested that the Hartung‐Knapp‐Sidik‐Jonkman

method outperforms the DerSimonian‐Laird random

effects approach for meta‐analysis of few studies.

The

Hartung‐Knapp‐Sidik‐Jonkman method can be overcon-

servative, when less than 5 studies are included in the

meta‐analysis.

In sum, choosing the right meta‐analytic

approach for meta‐analysis of 2 to 5 studies is difficult.

9,10

Our aim was thus to compare different frequentist

methods for meta‐analysis including only a small num-

ber of studies (≤5) with a focus on the beta‐binominal

regression and the modified Hartung‐Knapp‐Sidik‐

Jonkman approach.

11,14

2|METHODS

2.1 |Statistical methods for meta‐analysis

We consider situations where 2 interventions are com-

pared in a series of studies i(i=1,…,I) considering

binary outcomes. We are interested in the estimation of

the overall intervention effect θand use the log odds ratio

to quantify the difference of interventions between

groups. The data for each study consist of the interven-

tion effect θ

, the sample size in the intervention group

, and the sample size in the control group n

(overall

sample size N

). In each study, there are some (or zero)

events in the intervention group y

and in the control

group y

. Each study has a specific sampling error 0

and a within‐study variance s

In the first section, we describe the statistical models

for meta‐analysis that are included in the comparison.

In the second section, we describe the design of the

simulation study. The third section provides an overview

of the measures to assess the statistical properties, and in

the fourth section, we introduce our empirical example

dataset.

We consider 2 different types of models: fixed effects

models and random effects models.

2.1.1 |Fixed effects models

The fixed effects model is based on the assumption that

all studies in the meta‐analysis have a common effect

(θ). The fixed effects model can be written as

θi¼θþσi0i;0i∼N0;1ðÞ:

The study effects b

θiin study iare distributed about the

common effect with the study‐specific variance (s

) and

the sampling error (0

Mantel‐Haenszel method

The Mantel‐Haenszel (MaHa) method is a weighted aver-

age of the study‐specific risk ratios or risk differences.

The MaHa odds ratio for the overall effect is given by

ORMaHa ¼∑I

i¼1b

θiwiMaHaðÞ

∑I

i¼1wiMaHaðÞ

The weights are given with w

i(MaHa)

=ziTyiC

, where z

is the number of nonevents in the intervention group.

2MATHES AND KUSS

We included the MaHa (SAS PROC FREQ) method in

our analysis because it performs better than do the

standard fixed effects model in case of sparse data and it

is the standard fixed effects model in Cochrane

Reviews.

17-19

Peto odds ratio method

The Peto odds ratio (YPET) is an inverse variance

approach; ie, the studies are weighted by the inverse of

the study‐specific variance (w

i(FIX)

σ2

The pooled

Peto log odds ratio is estimated as

Log ORYPET

ðÞ¼

∑I

i¼1Oi−Ei

∑I

i¼1V2

;

where O

is the observed number of events, E

is the

expected number of events, and V

is the variance of their

difference.

We considered the YPET in our analysis because it is

the standard method for meta‐analysis for small interven-

tion effects or very rare events.

19,21

2.1.2 |Inverse variance random effects

models

The random effects models are based on the assumption

that there is no common effect in all studies but that

the mean of the effects of the study population is the

mean of the distribution of the true effect. The study

effects randomly follow (usually) a normal distribution.

In general, the intervention effect in study iof the

random effects model can be expressed as

θi¼θiþσiϵi;ϵi∼N0;1ðÞ;θi∼Nθ;t2Þ:ð

The pooled study effect of an inverse random effects

meta‐analysis can be estimated by

θR¼∑I

i¼1wiREMðÞ

θi

∑I

i¼1wiREM

ðÞ

;

where w

are the study‐specific weights. The study

weights are adjusted according to the between‐study

variation. The between‐study variance (t

) has to be

estimated for random effects models in addition to the

within‐study variance s

. Actually, the study weights

are the inverse of the sum of the within‐study variance

and the between‐study variance wiREMðÞ

¼1

σ2

iþτ2



Various methods to estimate the between‐study

variance t

for an inverse random effects model exist

(eg, ordinary least squares and maximum likelihood).

In this analysis, we consider the DerSimonian‐Laird

estimator and the Paule‐Mandel estimator

23,24

for

between‐study variance t

Between‐study variance estimators (DerSimonian/

Laird and Paule/Mandel)

The standard DerSimonian‐Laird estimator is a

noniterative between‐study variance estimator motivated

by the method‐of‐moments principle.

23,25

To be concrete,

2DL

is estimated by the following equation:

t2DL ¼0;

Q−I−1ðÞ

∑I

i¼1wiFIX

ðÞ

−

∑I

iw2

iFIXðÞ

∑I

i¼1wiFIXðÞ

;

where Qis the heterogeneity statistic (Q‐statistic) given by

Qτ2DL



¼∑I

i¼1wiFIXðÞ

θi−b

θF



;

where b

θFis the pooled effect of a fixed effects model. The

weights w

i(FIX)

in this equation are given by the inverse

of the within‐study variance wiFIXðÞ

¼1

σ2



;which is, in

general, unknown and has to be replaced by an estimate.

We estimated Qusing PROC GLM.

The between‐study variance estimator t

2DL

is most

widely used for random effects meta‐analysis and also

the standard random effects model in Cochrane

Reviews.

Therefore, we include the DerSimonian‐Laird

method in the analysis as reference method.

The Paule‐Mandel method is an iterative between‐

study variance estimator.

The estimation of t

2PM

based on the generalized Q‐statistic.

25,27

This can be

expressed as

25,28

Qτ2PM



¼∑

i¼1

wiPMðÞ

θi−

θτ

2PM





;

where b

θτ

2PM



is given by

θτ

2PM



¼∑I

i¼1wiPMðÞ

∑I

i¼1wiPMðÞ

×b

θi

and wiPMðÞ

¼1

σ2

iþτ2PM

·Qτ2PM

ðÞhas an expectation of

I−1 and is solved by iterating t

2PM

until convergence

is reached.

The Paule‐Mandel estimator was incorporated in the

analysis because it has been recently recommended in a

review paper on the performance of between‐study

variance estimators.

Our calculation of t

2PM

was based

MATHES AND KUSS 3

on algorithm proposed by DerSimonian and Kacker (SAS

macro; see Supporting Information I).

We validated our

estimations of t

with the respective results of the R

metafor package.

Confidence intervals for inverse random effects

models (Wald‐type, Hartung‐Knapp‐Sidik‐Jonkman,

and Hartung‐Knapp modification)

Confidence intervals using the DerSimonian‐Laird

method are constructed with

θ±z1−

×b

σ;

where z

1−

is the 1−

2quantile of the standard normal

distribution (Wald‐type confidence intervals) and the

standard error is given by

σ¼ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

∑I

i¼1wiDLðÞ

Hartung‐Knapp and Sidik‐Jonkman suggested an

adjustment factor for the standard error that uses the

quantile of the Student t‐distribution with I−1 degrees

of freedom instead of the Wald‐type interval to calculate

confidence intervals.

30-32

The adjustment factor is calcu-

lated by

q¼1

I−1∑

i¼1

wiDLðÞ

θi−b

θR



;

where b

θRis the estimated pooled effect from a

DerSimonian‐Laird random effects meta‐analysis. This

leads to the adjusted confidence interval

θ±t

I−1ðÞ;1−



×qb

σ:

Although in general the confidence interval tends to be

larger than the Wald‐type confidence interval, it can be

smaller, when qis very small. For this reason, Hartung

and Knapp proposed the following ad hoc modification

of q

12,33

q*¼max 1;qðÞ:

If t

2PM

is iterated via the Q‐statistic above, then q*

always equals 1 because qalways equals 1 or less if no

solution exists.

This means that combining the Paule/

Mandel estimator with Hartung‐Knapp‐Sidik‐Jonkman

(and also the modified Hartung‐Knapp method)

confidence intervals is similar to using a Paule and

Mandel–derived pooled effect estimator with confidence

intervals based on the quantiles of the Student t‐distribu-

tion with I−1 degrees of freedom:

θτ

2PM



±t

I−1ðÞ;1−



×b

στ

2PM



The Hartung‐Knapp confidence intervals and its mod-

ification were validated with the respective results of the

R metafor package.

We considered the Hartung‐Knapp method and its

modification because of the promising results in various

recent simulation studies also in the case of few

studies.

12,13,34

We combined both between‐study variance estimators

with 3 different methods to estimate the confidence

intervals (Wald‐type, Hartung‐Knapp‐Sidik‐Jonkman,

and Hartung‐Knapp modification).

12,14,25,31

The combination of the different between‐study vari-

ance estimators and confidence intervals results in the

following 5 inverse variance random effects models:

(1) DerSimonian‐Laird between‐study variance estima-

tor and Wald‐type confidence intervals (standard

random effects model, DLRE).

(2) DerSimonian‐Laird between‐study variance estima-

tor and Hartung‐Knapp‐Sidik‐Jonkman adjusted

confidence intervals (HKSJ).

23,31,32

(3) DerSimonian‐Laird between‐study variance

estimator and modified Hartung‐Knapp adjusted

confidence intervals (MHK).

14,23

(4) Paule‐Mandel between‐study variance estimator and

Wald‐type confidence intervals (PMRE).

(5) Paule‐Mandel between‐study variance estimator and

Hartung‐Knapp‐Sidik‐Jonkman confidence intervals

(PMHK).

24,31

2.1.3 |Beta‐binomial regression

The models described above are 2 stage models. This

means that in the first step, aggregated measures are

calculated for each study separately and in the second,

subsequent step these measures are combined.

Opposed

to this, the beta‐binominal regression is a 1 stage model.

This means the analysis is performed in 1 step similar

to individual patient data regression analysis.

Moreover,

the beta‐binomial model is a true (study‐specific) random

effects regression model.

Beta‐binomial (random effects) regression model

In the beta‐binomial model, we assume that the observed

proportions (p

) in the control group follow a

binominal distribution B(p,n

). Further, we assume

4MATHES AND KUSS

the success probability pto be beta‐distributed with

parameters aand b. The mean and variance of pare

given by E(p)=μ=a/(a+b) and Var(p)=μ(1 −μ)ϑ/

(1 + ϑ), respectively, where ϑ= 1/(a+b). Consequently,

the response y

is beta‐binominally distributed with

mean E(y

)=n

μand variance Var(y

)=n

μ(1−μ)

[1 + (n

−1) ϑ/(1+ϑ)]. The outcomes of 2 observations

from the same study are then correlated with corr(y

)=q= 1/(a+b+1).

The intervention effect is modelled via a link function

gby g(μ)=b

, where x

= 1 for the intervention

group and x

= 0 for the control group. In our study, we

use the logit link for gto arrive at log odds ratios to mea-

sure the intervention effect.

The beta‐binomial model performed well in a prior

model comparison of meta‐analysis for rare events.

This

raises the question whether the beta‐binominal regres-

sion model also performs well for meta‐analysis with a

small number of studies.

Estimation methods and confidence intervals for the

beta‐binomial regression model

We used different parameters (maximum likelihood and

quasi‐likelihood) and confidence interval estimation

methods (Wald‐type and t‐distributed with different num-

bers for the degrees of freedom) for the beta‐binomial

model, resulting in the 6 following implementations of

the beta‐binomial model in the comparison:

(1) Estimation of the beta‐binominal model via maxi-

mum likelihood (SAS PROC NLMIXED).

(a) Combined with Wald‐type confidence intervals

1−

,1−

2quantile of the standard normal dis-

tribution, denoted BBIN in the following).

(b) Combined with confidence intervals using a

t‐distribution with the degrees of freedom equal

to the number of studies (t

IðÞ;1−



, BB1N).

tribution with the degrees of freedom equal to

twice the number of studies (t

I×2ðÞ;1−



, BB2N).

(d) Combined with confidence intervals using a t‐dis-

tribution with the degrees of freedom equal to

twice the number of studies minus 3 to account

for the estimation of the 3 distributional parame-

ters (t

I×2−3ðÞ;1−



, BBMP).

36,37

(2) Estimation of the beta‐binominal model via quasi‐

likelihood (SAS PROC GLIMMIX). When using the

quasi‐likelihood principle, one specifies only mean

and variance but not the complete beta‐binomial

distribution. We assumed that this would increase

robustness of results with a small loss in efficiency.

We combined this effect estimate with the following:

(a) Wald‐type confidence intervals (BBQL).

(b) Confidence intervals using a t‐distribution with

the degrees of freedom equal to the number of

studies multiplied by 2 minus 3 to account for

the estimation of the 3 distributional parameters

(BBQD).

(3) Estimation of the beta‐binominal model via

maximum likelihood using SAS PROC FMM and

Wald‐type confidence intervals. The advantage of

SAS PROC FMM is that the starting values for the

beta‐distribution are directly estimated by the proce-

dure. Consequently, it is not necessary to estimate

starting values beforehand that facilitates the imple-

mentation of the beta‐binominal model.

Starting values for the beta‐binominal models using

SAS PROC NLMIXED (BBIN,BB1N,BB2N, and BBMP)

were computed from raw proportions, their variances,

and correlations.

2.2 |Simulations

We performed a simulation study to compare the statisti-

cal properties of the different meta‐analytic methods. If

feasible, all true values for the design factors of the

simulation study were chosen from actually performed

meta‐analyses to reflect real‐life conditions as best as

possible. The most suitable source to get true values for

the simulation was the review of Turner et al, which

analysed 1991 systematic reviews from the Cochrane

Database of Systematic Reviews.

Overall this review

included 14 886 meta‐analyses (which included at least

2 studies) of dichotomous outcomes on 77 237 single

studies.

2.2.1 |Design of simulation

We generated 10 000 meta‐analyses for each simulation

scenario.

The focus of our study was to assess the performance

of the meta‐analytic methods for pooling results of ≤5

studies. For the main analysis, we compared the methods

generating meta‐analysis with 2, 3, 4, and 5 included

studies.

In a supplemental analysis, we considered sce-

narios with 10, 15, 20, 30, and 50 studies to get an impres-

sion how the compared methods behave, if the number of

studies included in the meta‐analysis increases.

We generated event probabilities, sample sizes, and

heterogeneity estimates for each study included in the

MATHES AND KUSS 5

meta‐analysis based on the Turner data.

Furthermore,

we explicitly varied some parameters to check the robust-

ness of the results.

Table 1 shows the input parameters for the simula-

tion, including the information whether these were

varied implicitly based on distributional assumptions or

varied explicitly based on fixed values. In addition, the

table provides the source/rational of the parameter choice

and whether the scenarios are the base case analysis or

are a sensitivity analysis for assessing the robustness of

the results.

The combination of the different (random effects/

fixed effects, H0/H1) scenarios and of the number of

studies led to 36 base case simulation scenarios (360.000

meta‐analysis) in total. We performed the sensitivity

analysis (one randomly selected study 10 times larger,

TABLE 1 Description of the simulation

Distributional Assumptions Fixed Values

Parameter Input Parameter Specification

Description of

Resulting Data

Input Parameter

Specification

Base case analyses

Sample size of single study

Generated from a log‐normal

distribution with mean = 4.615

and SD = 1.1

Median = 103

Q1 = 50

Q3 = 204

Allocation of sample size to

control and intervention group

(balance of study size between

groups)

Random allocation with

probability 0.5

Event probabilities control group

Generated from a beta‐distribution

with α= .4230 and β= 1.433

Mean = 0.223

Median = 0.126

SD = 0.256

Effect (event probabilities

intervention group)

Generated from a standard

inverse‐variance random effects

model (t

see column fixed and

random effects scenario)

Medium effect:

OR = 0.684

Events control/intervention group Binominal draw

Fixed effects scenario t

Random effects scenario

Generated from a log‐normal

with mean t

=−1.47,

SD = 1.65, skewness = −0.55

Median

= 0.274

25% percentile

= 0.079

75% percentile

= 0.806

Mean I

= 17%,

SD = 30, range

0‐99%

Sensitivity analyses

Balance of study size

One randomly selected

study 10 times larger

Event probabilities control group

Low = 0.1

High = 0.5

Effect (event probabilities

intervention group)

Large effect OR = 0.466

Small effect OR = 0.855

Abbreviations: OR, odds ratio; Q1, first quartile, 25% percentile; Q3, third quartile, 75% percentile; SD, standard deviation.

Informed by Turner et al.

Based on Fleishman power transformation.

Under H0.

Only simulated in the random effects scenario for 2 to 5 studies.

6MATHES AND KUSS

small intervention effect, large intervention effect, low

baseline event probability, and high baseline event prob-

ability in the control group) only for the setting with 2

to 5 studies, random effects scenario under H1 (medium

effect), leading to 20 additional simulation scenarios.

The simulation code is available in Supporting

Information I.

2.2.2 |Measures to assess performance of

methods

We performed all comparisons of the effect estimates on

the log odds ratio scale. We estimated median bias and

empirical coverage of the 95% confidence interval to

assess the statistical properties.

In the medium effect

scenario, we also calculated the empirical power for all

methods. Moreover, we counted the number of

completely missing pooled effect estimates, to judge the

numerical robustness of the compared methods. For this

analysis, we present the number of converged runs. The

analysis code is available in Supporting Information II.

3|RESULTS

3.1 |Base case analysis

We present the results separately for the fixed effects and

the random effects situation.

First, we describe the statistical properties in the no

effect scenario. Second, we describe the results of the sim-

ulation in the medium effect scenario. Here, we focus on

describing the difference in the performance compared

with H0 for the meta‐analytic methods that performed well

regarding coverage under H0 and on the power analysis.

3.1.1 |No effect scenario

As was to be expected for almost all methods, the conver-

gence increased with increasing number of studies in the

meta‐analysis (Figures 1 and 2; see data in Table S1 in

Supporting Information III). The main reason is that the

number of meta‐analysis without any events decreased.

The beta‐binomial models that were estimated with

quasi‐likelihood (BBQD and BBQL) showed the tendency

of a decreasing convergence with increasing number of

studies. This is because the convergence criterion of the

procedure (SAS PROC GLIMMIX) was more often not

satisfied and thus the procedures stop without providing

parameter estimates. The reason is probably that with

an increasing number of studies, the estimation of param-

eters becomes more challenging, when the model has to

be estimated only based on means and variances.

The lines of BBIN, BB1N, BB2N, and BBMP as well as

BBQD and BBQL follow the same pattern because the log

odds ratio is estimated with the same procedure. Also the

lines for DLRE, HKSJ, MKH, PMRE, and PMHK show

the same shape because all meta‐analyses without any

event in one arm are not estimable.

For meta‐analysis of few studies, the beta‐binominal

models with a priori estimated starting values were most

robust (BBIN, BB1N, BB2N, and BBMP) followed by

YPET. All other methods performed quite similar with

about 15% to 20% nonconverged meta‐analyses in the

few study situations.

All methods showed small median bias even in the

few study scenarios (Figures 3 and 4; see data in Table

S1 in Supporting Information III). The strongest negative

median bias was −0.047 (BBIN, BB1N, BB2N, and

BBMP), which corresponds to an odds ratio of 0.95. The

strongest positive bias could be also recognized in the

random effects scenario with 0.022 (BBQL and BBQD),

which corresponds to an odds ratio of 1.02.

The shape of the lines was exactly the same for BBIN,

BB1N, BB2N, and BBMP; BBQD and BBQL; DLRE, HKSJ,

and MKH; and PMRE and PMHK because these methods

are based on the same point estimators for the effect.

Bias of the effect estimators based on the

DerSimonian‐Laird heterogeneity variance estimator

FIGURE 1 Converged runs (H0 FEM)

MATHES AND KUSS 7

(DLRE, HKSJ, and MKH) was very similar to bias of

effect estimators based on the Paule‐Mandel heterogene-

ity variance estimator (PMRE and PMHK).

In the fixed effects scenario, almost all methods

satisfied the empirical mean coverage to the 95% level

or were only marginally below this level (Figure 5; see

data in Table S1 in Supporting Information III).

Only the HKSJ method fell heavily below the

95% coverage level.

Again, in the random effects situation, only BBMP,

BB1N, BBQD, MKH, and PMHK satisfied or nearly satis-

fied the 95% empirical coverage level in the scenarios

with few studies (Figure 6). In particular, the fixed effects

models (MaHa and YPET) as well as DLRE and HKSJ

showed unacceptable low coverage probability in the ran-

dom effects situations. BBQD, BBQL, DLRE, HKSJ, and

MKH showed also very low coverage probability in situa-

tions with more than 5 studies included in the meta‐

analysis.

3.1.2 |Medium effect scenario

In general, the results for converged runs under H1 were

quite similar to the results under H0 (Table S1 in

Supporting Information IV).

Overall bias was small also under the alternative

(Figures S1 and S2 in Supporting Information V). Bias

was strongest in the random effects scenario. Maximum

FIGURE 2 Median bias (H0 FEM)

8MATHES AND KUSS

FIGURE 3 Mean empirical coverage (H0 FEM)

FIGURE 4 Converged runs (H0 REM)

MATHES AND KUSS 9

negative bias occurred using BBIN, BB1N, BB2N, and

BBMP with a log odds ratio of −0.06 (odds ratio, 0.94).

The effect estimates based on the DerSimonian‐Laird

heterogeneity variance estimator and the Paule‐Mandel

heterogeneity variance estimator were strongest posi-

tively biased (log odds ratio, 0.074 and 0.075, respectively;

odds ratio, 1.08 in both methods).

Also the results for empirical coverage probability for

meta‐analysis including few studies were similar to the

results of the no effect scenario. As in the zero effect

scenario, all but HKSJ satisfied the mean empirical

coverage to the 95% level in the fixed effects, few studies

scenarios (Figure S3 in Supporting Information V).

BBMP, BB1N, and BBQD as well as MKH and PMHK

satisfied the coverage probability throughout in all

random effects situations (Figure S4 in Supporting

Information V). BB1N was only marginally below the

empirical 95% coverage level in the scenarios with 2 to

5 studies (worst case: empirical coverage probabil-

ity = 0.926). In the random effects scenarios, in meta‐

analysis including more than ≥20 studies, not only DLRE

and HKSJ had a very low coverage probability but also

MKH. The coverage probability in the random effects,

large study scenarios was below 0.7 throughout for the

fixed effects models (MaHa and YPET).

Power depended strongly on the number of studies in

the meta‐analysis (Figures 7 and 8). The fixed effects

models and HKSJ showed highest median power for

FIGURE 5 Median bias (H0 REM)

10 MATHES AND KUSS

meta‐analysis of ≤5 studies. Also with BBFM, BBIN,

BBQL, and DLRE, there was a small probability to detect

a statistical significant difference in the few study meta‐

analysis, in the fixed effects as well as the random effects

situations. All other methods had a power close to zero

in the meta‐analyses of 2 studies. It should be regarded

that the power of the methods showing favourable

results considering power might be overestimated

because these methods were the less robust methods

(BBFM, BBQL, BBQD, and MaHa); ie, challenging

situations with low event rates are not estimable and

consequently are not included in the median power

analysis. In meta‐analysis of ≥10 studies, power of all

methods was quite similar.

3.2 |Sensitivity analysis

We performed the sensitivity analysis only for the

random effects, few study scenario because this is most

challenging and most relevant for practice. In the

sensitivity analysis, we include only BB1N and BBMP

because these beta‐binominal models performed best in

the base case analysis.

3.2.1 |One large study

The empirical coverage probability of DLRE, HKSJ, and

the fixed effects models (YPET and MaHa) further

decreased compared with the results of the random

FIGURE 6 Mean empirical coverage (H0 REM)

MATHES AND KUSS 11

effects situation under H0 with similar study sizes. The

empirical coverage probability of BB1N fell slightly below

the 95% confidence interval.

Figures for the sensitivity analysis on unbalanced

study size are given in Supporting Information VI

(Figures S1‐S3).

FIGURE 7 Mean power (H1 FEM)

FIGURE 8 Mean power (H1 REM)

12 MATHES AND KUSS

3.2.2 |Different intervention effect sizes

Using a small as well as a large intervention effect did not

alter bias and empirical coverage as compared with the

medium effect scenario. As expected, power increased in

the large intervention effect scenario and decreased in

the small intervention effect scenario.

Figures S4 to S6 in Supporting Information VI show

the results for the small intervention effect and Figures

S7 to S9 the results for large intervention effect.

3.2.3 |Different baseline risks in the con-

trol group

Using a low baseline as well as high baseline event prob-

ability in the control group did also not change bias and

empirical coverage as compared with the medium effect

scenario. Also as expected power increased in the high

baseline event probability scenario and decreased in the

low baseline event probability scenario.

Figures for the sensitivity analysis on the baseline risk

in the control group are available in Supporting Informa-

tion VI: Figures S10 to S12 are for the low baseline risk

and Figures S13 to S15 are for the high baseline risk.

4|DISCUSSION

This simulation shows that valid statistical methods for

meta‐analysis of few studies are available. Our simulation

was designed based on empirical data to reproduce real‐

world situations. Considering the frequency of meta‐

analysis including only few studies, our results are highly

relevant for applied research.

This is in particular true,

in areas where few study meta‐analysis is the normal case

like rare diseases, health technology assessment, sub-

group, or sensitivity analysis.

The standard approach for meta‐analysis is still the

DerSimonian‐Laird method. Our simulation shows that

the DerSimonian‐Laird method was heavily above the

5% type I error rate in all random effects situations.

Consequently, it can be supposed that a large number of

false positive meta‐analysis exist.

5,13

False positive meta‐

analysis results might have serious consequences for

clinical and health policy decision‐making because

results of meta‐analysis build the basis for statements in

clinical practice guidelines and health technology

assessments.

38,39

On the one hand, using adequate methods is very

important to avoid incorrect conclusions.

2,11

On the other

hand, the power has a central role because meta‐analysis

of few studies often agrees with future results of meta‐

analysis including more studies.

Very conservative

methods might lead to unnecessary research or rejection

of useful new interventions by health technology assess-

ment agencies. The bias of all methods was very small

in all simulation scenarios. Therefore, the main issue to

judge the validity of methods here, to arrive at

recommendations on the right meta‐analysis method, is

the critical balance between the empirical coverage

probability and power.

In the fixed effects scenario, the classical fixed effects

models (MaHa and YPET) had the best statistical

properties. If the assumption of a fixed effects model

can be reliably justified, ie, the included studies are clin-

ically homogenous, then fixed effects models seem prefer-

able. On the one hand, this suggestion is supported by the

fact that a sufficient estimation of heterogeneity is diffi-

cult in meta‐analysis of few studies.

On the other hand,

caution is needed in the case of small studies because

these are often more heterogeneous than large studies.

The beta‐binominal model was most robust and can be

recommended as the first choice in challenging situations

(few studies, rare events), in particular for the case that

the other models cannot estimate effects.

In the random effects situation, only MKH, PMHK,

and BBMP satisfied the type I error rate. As in prior

studies, in particular, MKH and PMHK were very conser-

vative for 2 to 3 studies in the meta‐analysis also in the

random effects situations.

12,15,34

These overly conserva-

tive confidence intervals resulted in a power almost of

zero. The beta‐binominal model (BB1N) and PMRE seem

a good alternative for meta‐analysis including only a few

studies because both have a higher power and fell only

slightly below the 95% coverage probability.

In general, using Student t‐distributed confidence

intervals accounts better for the uncertainty that is

induced by estimating the between‐study variance. The

termination of the optimal number of degrees of freedom

for the Student t‐distribution is difficult because it

depends on between‐and within‐study variances and

the number of included studies in the meta‐analysis.

Our simulation indicates that for the number of degrees

of freedom, BB1N (number of studies minus 1 degree of

freedom) is the best choice, whereas BBMP (2 times num-

ber of studies minus 3 degrees of freedom) is more con-

servative but still valid.

Prior studies have recommended HKSJ.

9,13,15,42

In our

simulation, the HKSJ was heavily below the specified

type I error level in all situations. This occurs when the

correction factor qis arbitrary small.

The risk is higher

in the case of small event rates because results are more

homogeneous than expected under a random effects

model and in the case that the standard errors of included

studies are heterogeneous.

Considering that our simula-

tion is based on empirical data, there seems to be a very

high risk of anticonservative results in practice when

MATHES AND KUSS 13

HKSJ is used.

Thus, we also discourage the use of the

original HKSJ for few as well as many studies in meta‐

analysis.

DLRE and HKSJ showed very low empirical coverage

levels, also in the meta‐analysis of more than 20 studies,

suggesting that heterogeneity is underestimated by these

methods. Also MKH, which has been recently suggested

as new standard methods for random effects meta‐analy-

sis, showed unsatisfactory empirical coverage.

The

reason might be that if true heterogeneity exists, these

methods might fail to detect it (the value of t

being

zero).

The impact is stronger with increasing number

of studies in the meta‐analysis because the influence of

is missing in the weight of each missed multiple time

in the calculation of the total variance.

We could show that also in the case of meta‐analysis

including more studies (≥20), statistical properties of 2

stage random effects meta‐analyses can be substantially

improved as compared with the standard DerSimonian‐

Laird method by combining the Paule‐Mandel heteroge-

neity variance estimator with Hartung‐Knapp confidence

intervals.

Our finding on the Paule‐Mandel estimator disagrees

with a study that compared 20 heterogeneity variance

estimators. In this study, it is concluded that the Paule‐

Mandel estimator “provides good estimation behaviour

but not markedly better than the other alternatives.”

In our study, the Paule‐Mandel estimator outperformed

the DerSimonian‐Laird method in all scenarios, in partic-

ular if combined with t‐distributed confidence intervals.

The main reasons are probably that in contrast to our

simulation only scenarios with 5 to 30 studies were

simulated and only confidence intervals based on the

normal approximation were used in the comparison.

Moreover, the simulation was not informed by real‐life

meta‐analysis and consequently is very different to our

simulation (eg, in heterogeneity, effect sizes).

Our findings were quite robust across all sensitivity

analysis. Neither the size of the intervention effect nor

the baseline risk in the control group had considerable

influence on the results. We varied the distribution of

study sizes in our simulation because these are common

scenarios and other studies have shown that this can

influence the statistical performance.

12,13

In our

simulation, we could not ascertain the finding by IntHout

et al that different study sizes had a strong influence on

the empirical coverage probability.

We found only a

very slight negative effect on coverage probability of the

2 stage random effects models in the random effects

situations.

Our study has some limitations.

First, we did not explicitly vary the distribution of

group size in our simulation because most studies apply

equal allocation ratios. An unbalanced size of study arms

can influence the performance of the statistical methods.

Our results might not pertain fully to meta‐analysis of

studies with unbalanced sample sizes (eg, different alloca-

tion ratios and nonrandomized studies).

Second, heterogeneity is often hard to detect in prac-

tice.

As we used real‐world heterogeneity estimates in

the simulation, we might have tended to underestimate

heterogeneity, ie, overestimated the number of meta‐

analysis with zero heterogeneity as well.

Third, we did not consider Bayesian methods, which

also have shown promising results for meta‐analysis of

few studies in recently published studies.

24,34,45,46

particular, if appropriate (weakly) informative prior infor-

mation exists, such Bayes methods could be consid-

ered.

22,47

Simulations have shown that Bayes methods

might increase power in few study meta‐analysis while

holding the type I error rate compared with other

methods (eg, PMHK).

Moreover, Bayes methods

become more interesting for applied scientists because

recently approximated methods were developed that

facilitate their application compared with full Bayesian/

Markov chain Monte Carlo approaches.

Future studies should directly compare the well

performing frequentist methods with well performing

Bayesian methods for meta‐analysis including few stud-

ies. Future research should also assess extensions of the

beta‐binominal models. In meta‐analysis including more

than 5 studies, beta‐binominal models that model the

beta‐binomial distributions for control and intervention

groups separately and link intervention and control

groups from the same study by a random effects (eg, esti-

mated from DLRE) also showed satisfactorily statistical

properties.

An advantage of such beta‐binominal

models is that the randomization is not broken.

5|CONCLUSION AND

RECOMMENDATIONS FOR

PRACTICE

If only a few studies for inclusion in a meta‐analysis are

available, then the choice of the right method is challeng-

ing because of the narrow path regarding the balance

between correct empirical coverage of the 95% confidence

interval and power.

In fixed effects situations, we recommend the classical

fixed effects models (MaHa and YPET). A premise is that

the assumption of a fixed effects model (common effect of

all studies) must be duly justified. If the standard fixed

effects models are not converging (eg, because of many

double zero studies), then the more robust BB1N or

BBMP is an alternative instead.

14 MATHES AND KUSS

In random effects situations, only MKH, PMHK, and

BBMP kept the type I error rate. Results of these models

can be judged reliable for estimating intervention effects.

We discourage the use of other inverse variance random

effects models than MKH and PMHK in situations where

strictly keeping the type I error rate is important. There is

a high risk not to detect a true existing difference using

these methods because of their very low power, in

particular for meta‐analysis of 2 to 3 studies. Thus, if

one is willing to accept a slightly higher type I error rate,

we recommend using BB1N and PMRE.

An advantage of the beta‐binominal models (BB1N

and BBMP) is their robustness and ability to include the

information from single and double zero studies.

recommend these models for pooling rare events.

ACKNOWLEDGEMENT

This work was not funded.

ORCID

Tim Mathes http://orcid.org/0000-0002-5304-1717

REFERENCES

1. Turner RM, Davey J, Clarke MJ, Thompson SG, Higgins JPT.

Predicting the extent of heterogeneity in meta‐analysis, using

empirical data from the Cochrane Database of Systematic

Reviews. Int J Epidemiol. 2012;41(3):818‐827.

2. Kontopantelis E, Springate DA, Reeves D. A re‐analysis of the

Cochrane Library data: the dangers of unobserved heterogeneity

in meta‐analyses. PLoS One. 2013;8(7):e69930.

3. Davey J, Turner RM, Clarke MJ, Higgins JP. Characteristics of

meta‐analyses and their component studies in the Cochrane

Database of Systematic Reviews: a cross‐sectional, descriptive

analysis. BMC Med Res Methodol. 2011;11(1):1‐11.

4. Borenstein M, Higgins JP. Meta‐analysis and subgroups. Prev

Sci. 2013;14(2):134‐143.

5. Shuster JJ, Walker MA. Low‐event‐rate meta‐analyses of

clinical trials: implementing good practices. Stat Med.

2016;35(14):2467‐2478.

6. Friedrich JO, Adhikari NK, Beyene J. Inclusion of zero total

event trials in meta‐analyses maintains analytic consistency

and incorporates all available data. BMC Med Res Methodol.

2007;7(1):1‐6.

7. Bradburn MJ, Deeks JJ, Berlin JA, Russell Localio A. Much ado

about nothing: a comparison of the performance of meta‐analyt-

ical methods with rare events. Stat Med. 2007;26(1):53‐77.

8. Higgins JPT, Thompson SG, Spiegelhalter DJ. A re‐evaluation

of random‐effects meta‐analysis. J Royal Stat Soc Series A,

(Statistics in Society). 2009;172(1):137‐159.

9. Guolo A, Varin C. Random‐effects meta‐analysis: the number of

studies matters. Stat Methods Med Res. 2015.

10. Jackson D, Bowden J, Baker R. How does the DerSimonian and

Laird procedure for random effects meta‐analysis compare with

its more efficient but harder to compute counterparts? J Stat

Plann Infer. 2010;140(4):961‐970.

11. Kuss O. Statistical methods for meta‐analyses including informa-

tion from studies without any events—add nothing to nothing

and succeed nevertheless. Stat Med. 2015;34(7):1097‐1116.

12. Röver C, Knapp G, Friede T. Hartung‐Knapp‐Sidik‐Jonkman

approach and its modification for random‐effects meta‐analysis

with few studies. BMC Med Res Methodol. 2015;15(1):1‐7.

13. IntHout J, Ioannidis JP, Borm GF. The Hartung‐Knapp‐Sidik‐

Jonkman method for random effects meta‐analysis is

straightforward and considerably outperforms the standard

DerSimonian‐Laird method. BMC Med Res Methodol. 2014;

14(1):1‐12.

14. Hartung J, Knapp G. A refined method for the meta‐analysis of

controlled clinical trials with binary outcome. Stat Med.

2001;20(24):3875‐3889.

15. Wiksten A, Rucker G, Schwarzer G. Hartung‐Knapp method is

not always conservative compared with fixed‐effect meta‐

analysis. Stat Med. 2016;35(15):2503‐2515.

16. Mantel N, Haenszel W. Statistical aspects of the analysis of data

from retrospective studies of disease. J Natl Cancer Inst.

1959;22(4):719‐748.

17. Deeks JJ, Altman DG, Bradburn MJ. Statistical methods for

examining heterogeneity and combining results from several

studies in meta‐analysis. In: Systematic Reviews in Health Care.

BMJ Publishing Group; 2008:285‐312.

18. Sweeting J, Sutton MAJ, Lambert PC. What to add to nothing?

Use and avoidance of continuity corrections in meta‐analysis

of sparse data. Stat Med. 2004;23(9):1351‐1375.

19. Deeks JJ, Higgins J, Altman DG. Analysing Data and

Undertaking Meta‐Analyses.Cochrane handbook for systematic

reviews of interventions: Cochrane book series; 2008:243‐296.

20. Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade

during and after myocardial infarction: an overview of the

randomized trials. Prog Cardiovasc Dis. 1985;27(5):335‐371.

21. Brockhaus AC, Bender R, Skipka G. The Peto odds ratio viewed

as a new effect measure. Stat Med. 2014;33(28):4861‐4874.

22. Veroniki AA, Jackson D, Viechtbauer W, et al. Methods to

estimate the between‐study variance and its uncertainty in

meta‐analysis. Res Synth Meth. 2016;7(1):55‐79.

23. DerSimonian R, Laird N. Meta‐analysis in clinical trials. Control

Clin Trials. 1986;7(3):177‐188.

24. Paule RC, Mandel J. Consensus values and weighting factors.

J Res Natl Bur Stand. 1982;87(5):377‐385.

25. DerSimonian R, Kacker R. Random‐effects model for meta‐

analysis of clinical trials: an update. Contemp Clin Trials.

2007;28(2):105‐114.

26. Whitehead A. Estimating the treatment difference in an

individual trial. In: Meta‐Analysis of Controlled Clinical Trials.

John Wiley & Sons, Ltd; 2003:23‐55.

27. Rukhin AL, Biggerstaff BJ, Vangel MG. Restricted maximum

likelihood estimation of a common mean and the Mandel–Paule

algorithm. J Stat Plann Infer. 2000;83(2):319‐330.

MATHES AND KUSS 15

28. Bowden J, Tierney JF, Copas AJ, Burdett S. Quantifying,

displaying and accounting for heterogeneity in the meta‐

analysis of RCTs using standard and generalised Qstatistics.

BMC Med Res Methodol. 2011;11(1):41.

29. Viechtbauer, W., R Metafor Package. 2016.

30. Hartung J. An alternative method for meta‐analysis. Biom J.

1999;41(8):901‐916.

31. Hartung J, Knapp G. On tests of the overall treatment effect in

meta‐analysis with normally distributed responses. Stat Med.

2001;20(12):1771‐1782.

32. Sidik K, Jonkman JN. A simple confidence interval for meta‐

analysis. Stat Med. 2002;21(21):3153‐3159.

33. Knapp G, Hartung J. Improved tests for a random effects

meta‐regression with a single covariate. Stat Med. 2003;

22(17):2693‐2710.

34. Friede T, Röver C, Wandel S, Neuenschwander B. Meta‐analysis

of few small studies in orphan diseases. Res Synth Meth. 2016.

p. n/a‐n/a

35. Burke DL, Ensor J, Riley RD. Meta‐analysis using individual

participant data: one‐stage and two‐stage approaches, and why

they may differ. Stat Med. 2017;36(5):855‐875.

36. Berkey CS, Hoaglin DC, Mosteller F, Colditz GA. A random‐

effects regression model for meta‐analysis. Stat Med.

1995;14(4):395‐411.

37. Raghunathan TE, Yoichi II. Analysis of binary data from a

multicentre clinical trial. Biometrika. 1993;80(1):127‐139.

38. Stephens JM, Handke B, Doshi JA. International survey of

methods used in health technology assessment (HTA): does

practice meet the principles proposed for good research? 2012.

2012;2:29‐44.

39. Guyatt G, Oxman AD, Akl EA, et al. GRADE guidelines: 1.

Introduction‐GRADE evidence profiles and summary of find-

ings tables. J Clin Epidemiol. 2011;64(4):383‐394.

40. Herbison P, Hay‐Smith J, Gillespie WJ. Meta‐analyses of small

numbers of trials often agree with longer‐term results. J Clin

Epidemiol. 2011;64(2):145‐153.

41. IntHout J, Ioannidis JPA, Borm GF, Goeman JJ. Small studies

are more heterogeneous than large ones: a meta‐meta‐analysis.

J Clin Epidemiol. 2015;68(8):860‐869.

42. Partlett C, Riley RD. Random effects meta‐analysis: coverage

performance of 95% confidence and prediction intervals

following REML estimation. Stat Med. 2017;36(2):301‐317.

43. Jackson D, Law M, Rücker G, Schwarzer G. The Hartung‐Knapp

modification for random‐effects meta‐analysis: a useful refine-

ment but are there any residual concerns? Stat Med.

2017;36(25):3923‐3934.

44. Petropoulou M, Mavridis D. A comparison of 20 heterogeneity

variance estimators in statistical synthesis of results from

studies: a simulation study. Stat Med. 2017;36(27):4266‐4280.

45. Bodnar O, Link A, Arendacká B, Possolo A, Elster C. Bayesian

estimation in random effects meta‐analysis using a non‐

informative prior. Stat Med. 2017;36(2):378‐399.

46. Friede T, Röver C, Wandel S, Neuenschwander B. Meta‐analysis

of two studies in the presence of heterogeneity with applications

in rare diseases. Biom J. 2016. p. n/a‐n/a

47. Rhodes KM, Turner RM, White IR, Jackson D, Spiegelhalter DJ,

Higgins JPT. Implementing informative priors for heterogeneity

in meta‐analysis using meta‐regression and pseudo data. Stat

Med. 2016;35(29):5495‐5511.

48. Bakbergenuly I, Kulinskaya E. Beta‐binomial model for meta‐

analysis of odds ratios. Stat Med. 2017. p. n/a‐n/a

SUPPORTING INFORMATION

Additional Supporting Information may be found online

in the supporting information tab for this article.

How to cite this article: Mathes T, Kuss O. A

comparison of methods for meta‐analysis of a small

number of studies with binary outcomes. Res Syn

Meth. 2018;1–16. https://doi.org/10.1002/jrsm.1296

16 MATHES AND KUSS

Meta-analysis of Censored Adverse Events

Article

Full-text available

Jun 2024

Meta-analysis is a powerful tool for assessing drug safety by combining treatment-related toxicological findings across multiple studies, as clinical trials are typically underpowered for detecting adverse drug effects. However, incomplete reporting of adverse events (AEs) in published clinical studies is frequently encountered, especially if the observed number of AEs is below a pre-specified study-dependent threshold. Ignoring the censored AE information, often found in lower frequency, can significantly bias the estimated incidence rate of AEs. Despite its importance, this prevalent issue in meta-analysis has received little statistical or analytic attention in the literature. To address this challenge, we propose a Bayesian approach to accommodating the censored and possibly rare AEs for meta-analysis of safety data. Through simulation studies, we demonstrate that the proposed method can improve accuracy in point and interval estimation of incidence probabilities, particularly in the presence of censored data. Overall, the proposed method provides a practical solution that can facilitate better-informed decisions regarding drug safety.

The Effects of Gaining Early Awareness and Readiness for Undergraduate Programs on Educational Outcomes: A Meta-Analysis

Article

Full-text available

May 2024

There has been much effort to minimize the educational disparities among students with diverse linguistic, cultural, and socioeconomic backgrounds. Gaining Early Awareness and Readiness for Undergraduate Programs (GEAR UP) is one of the national efforts to combat the educational inequity that started in the late 1990s. Since then, evidence of how GEAR UP contributes to diminishing the educational gap in students’ educational outcomes accumulated, including academic and behavioral outcomes. Yet, no study comprehensively evaluated the overall effects of such empirical studies. Thus, the goal of the current meta-analysis is to quantitatively synthesize the studies that investigated the effects of GEAR UP on historically underrepresented students’ college readiness. Across various educational outcomes, eight studies were identified. A random-effects model was employed to account for heterogeneity across the studies, followed by moderator analyses. Findings from four separate meta-analyses revealed that the magnitude of the overall effects of GEAR UP on educational outcomes varied from small (e.g., American College Test scores) to large (e.g., attendance rate), according to Kraft (2020). Results of two proportional meta-analyses revealed that both college enrollment rate and graduation rate were higher for GEAR UP participants compared to non-GEAR UP participants. The program length was found to not moderate the effect of GEAR UP. Implications and future research directions are suggested.

Risk of Malignant Neoplasm in Patients with Primary Hyperparathyroidism: A Systematic Review and Meta-analysis

Article

Full-text available

May 2024
CALCIFIED TISSUE INT

This study aimed to evaluate the prevalence and risk of malignant neoplasm in primary hyperparathyroidism (PHPT) patients. Potentially eligible studies were retrieved from PubMed and Embase databases from inception to November 2023 using search strategy consisting of terms for “Primary hyperparathyroidism” and “Malignant neoplasm”. Eligible study must report prevalence of malignant neoplasm among patients with PHPT or compare the risk of malignant neoplasm between patients with PHPT and comparators. Point estimates with standard errors were extracted from each study and combined using the generic inverse variance method.A total of 11,926 articles were identified. After two rounds of systematic review, 50 studies were included. The meta-analysis revealed that pooled prevalence rates of overall cancer was 0.19 (95%CI: 0.13–0.25; I² 94%). The two most prevalent types of malignancy among patients with PHPT ware papillary thyroid cancer (pooled prevalence: 0.07; 95%CI: 0.06–0.08; I² 85%) and breast cancer (pooled prevalence: 0.05; 95%CI: 0.03–0.07; I² 87%). Subgroup analysis of studies focusing on patients undergoing parathyroidectomy reported a fourfold higher prevalence of papillary thyroid cancer than the remaining studies (0.08 versus 0.02). The meta-analysis of cohort studies found a significant association between PHPT and overall cancer with the pooled risk ratio of 1.28 (95%CI: 1.23–1.33; I² 66.9%).We found that the pooled prevalence of malignant neoplasm in PHPT was 19%, with papillary thyroid cancer and breast cancer being the most prevalent types. The meta-analysis of cohort studies showed that patient with PHPT carried an approximately 28% increased risk of malignancy.

Biological activity of soils in Ukraine depending on tillage options: A meta-analysis

Article

Full-text available

Feb 2024

Pavlo Lykhovyd

Tillage is one of the major factors affecting soil biological activity, resulting in changes in soil organic carbon (SOC) content, providing for carbon sequestration and shifts in carbon dioxide emission from soils. Current climate change and aggravation of global warming through the increased emission of carbon dioxide are main driving forces for global transformation of agricultural practices in the direction of climate-smart agriculture (CSA), which requires the implementation of such crop cultivation practices that result in the minimization of SOC losses and carbon dioxide emissions. The magnitude and direction of different tillage practices affecting soil biological activity are different, therefore, the best tillage options should be chosen for implementation in national CSA systems to ensure achieving the global sustainability goals. This nationwide meta-analysis, conducted for tillage practices utilized in Ukrainian agriculture examines scientifically recorded effects of moldboard tillage depth, flat cutter and no-till options on soil respiration rates and cellulose decomposition intensity in dark-chestnut and chernozem soils of Ukraine. This meta-analysis enrolled 45 studies, which met the stipulated scientific quality criteria. Statistical processing was conducted through the standardized mean difference (SMD) model without subgroups at 95% confidence interval (CI). As a result, it was determined that there is subtle impact of moldboard tillage depth on soil biological activity, which is inconclusive and unclear. The similar results were obtained for the comparison between the tillage and no-till groups, where high heterogeneity of the dataset (I2 = 82.8%) resulted in low quality of evidence for the benefits of no-till in SOC sequestration. Besides, zero fail-safe numbers support the suggestion of low-quality evidence in favor of shallow plowing advantage over deep plowing, as well as no-till against tillage. As for the difference between the groups of moldboard and flat cutter tillage, it was established that there is strong enough evidence for the advantage of flat cutter tillage in terms of soil respiration rates and cellulose decomposition intensity reduction. Further studies in this direction are required to fill the gaps in current meta-analysis, especially in terms of no-till options and their effect on biological activity of Ukrainian soils in different cropping systems.

Assessing the relative effectiveness of various ultrasound-guided ablation techniques for treating benign thyroid nodules: A systematic review and network meta-analysis

Article

Full-text available

May 2024
MEDICINE

Background Benign thyroid nodules (BTNs) represent a prevalent clinical challenge globally, with various ultrasound-guided ablation techniques developed for their management. Despite the availability of these methods, a comprehensive evaluation to identify the most effective technique remains absent. This study endeavors to bridge this knowledge gap through a network meta-analysis (NMA), aiming to enhance the understanding of the comparative effectiveness of different ultrasound-guided ablation methods in treating BTNs. Methods We comprehensively searched PubMed, Embase, Cochrane, Web of Science, Ovid, SCOPUS, and ProQuest for studies involving 16 ablation methods, control groups, and head-to-head trials. NMA was utilized to evaluate methods based on the percentage change in nodule volume, symptom score, and cosmetic score. This study is registered in INPLASY (registration number 202260061). Results Among 35 eligible studies involving 5655 patients, NMA indicated that RFA2 (radiofrequency ablation, 2 sessions) exhibited the best outcomes at 6 months for percentage change in BTN volume (SUCRA value 74.6), closely followed by RFA (SUCRA value 73.7). At 12 months, RFA was identified as the most effective (SUCRA value 81.3). Subgroup analysis showed RFA2 as the most effective for solid nodule volume reduction at 6 months (SUCRA value 75.6), and polidocanol ablation for cystic nodules (SUCRA value 66.5). Conclusion Various ablation methods are effective in treating BTNs, with RFA showing notable advantages. RFA with 2 sessions is particularly optimal for solid BTNs, while polidocanol ablation stands out for cystic nodules.

Do statistical heterogeneity methods impact the results of meta-analyses? A meta epidemiological study

Article

Full-text available

Mar 2024
PLOS ONE

Background Orthodontic systematic reviews (SRs) use different methods to pool the individual studies in a meta-analysis when indicated. However, the number of studies included in orthodontic meta-analyses is relatively small. This study aimed to evaluate the direction of estimate changes of orthodontic meta-analyses (MAs) using different between-study variance meth- ods considering the level of heterogeneity when few trials were pooled. Methods Search and study selection: Systematic reviews (SRs) published over the last three years, from the 1st of January 2020 to the 31st of December 2022, in six main orthodontic journals with at least one MA pooling five or lesser primary studies were identified. Data collection and analysis: Data were extracted from each eligible MA, which was replicated in a random effect model using DerSimonian and Laird (DL), Paule–Mandel (PM), Restricted maximum- likelihood (REML), Hartung Knapp and Sidik Jonkman (HKSJ) methods. The results were reported using median and interquartile range (IQR) for continuous data and frequencies for categorical data and analyzed using non-parametric tests. The Boruta algorithm was used to assess the significant predictors for the significant change in the confidence interval between the different methods compared to the DL method, which was only feasible using the HKSJ method. Results 146 MAs were included, most applying the random effect model (n = 111; 76%) and pooling continuous data using mean difference (n = 121; 83%). The median number of studies was three (range 2, 4), and the overall statistical heterogeneity (I2 ranged from 0 to 99% with a median of 68%). Close to 60% of the significant findings became non-significant when HKwas applied compared to the DL method and when the heterogeneity was present I2>0%. On the other hand, 30.43% of the non-significant meta-analyses using the DL method became significant when HKSJ was used when the heterogeneity was absent I2 = 0%. Conclusion Orthodontic MAs with few studies can produce different results based on the between-study variance method and the statistical heterogeneity level. Compared to DL, HKSJ method is overconservative when I2 is greater than 0% and may result in false positive findings when the heterogeneity is absent.

Lung cancer risk and occupational pulmonary fibrosis: systematic review and meta-analysis

Article

Full-text available

Feb 2024

Background Molecular pathways found to be important in pulmonary fibrosis are also involved in cancer pathogenesis, suggesting common pathways in the development of pulmonary fibrosis and lung cancer. Research question Is pulmonary fibrosis from exposure to occupational carcinogens an independent risk factor for lung cancer? Study design and methods A comprehensive search of PubMed, Embase, Web of Science and Cochrane databases with over 100 search terms regarding occupational hazards causing pulmonary fibrosis was conducted. After screening and extraction, quality of evidence and eligibility criteria for meta-analysis were assessed. Meta-analysis was performed using a random-effects model. Results 52 studies were identified for systematic review. Meta-analysis of subgroups identified silicosis as a risk factor for lung cancer when investigating odds ratios for silicosis in autopsy studies (OR 1.47, 95% CI 1.13–1.90) and for lung cancer mortality in patients with silicosis (OR 3.21, 95% CI 2.67–3.87). Only considering studies with an adjustment for smoking as a confounder identified a significant increase in lung cancer risk (OR 1.58, 95% CI 1.34–1.87). However, due to a lack of studies including cumulative exposure, no adjustments could be included. In a qualitative review, no definitive conclusion could be reached for asbestosis and silicosis as independent risk factors for lung cancer, partly because the studies did not take cumulative exposure into account. Interpretation This systematic review confirms the current knowledge regarding asbestosis and silicosis, indicating a higher risk of lung cancer in exposed individuals compared to exposed workers without fibrosis. These individuals should be monitored for lung cancer, especially when asbestosis or silicosis is present.

Internal Consistency and Temporal Stability of the Community Assessment of Psychic Experiences (CAPE): A Reliability Generalization Meta-Analysis

Article

May 2024
PSYCHIAT RES

Psychotic experiences (PE) are prevalent in general and clinical populations and can increase the risk for mental disorders in young people. The Community Assessment of Psychic Experiences (CAPE) is a widely used measure to assess PE in different populations and settings. However, the current knowledge on their overall reliability is limited. We examined the reliability of the CAPE-42 and later versions, testing the role of age, sex, test scores, and clinical status as moderators. A systematic search was conducted on the Scopus, Web of Science, PubMed, EBSCOhost, ProQuest, and GoogleScholar databases. Internal consistency and temporal stability indices were examined through reliability generalization meta-analysis (RGMA). Moderators were tested through meta-regression analysis. From a pool of 1,015 records, 90 independent samples were extracted from 71 studies. Four versions showed quantitative evidence for inclusion: CAPE-42, CAPE-20, CAPE-P15, and CAPE-P8. Internal consistency indices were good (α/ω≈.725–0.917). Temporal stability was only analyzed for the CAPE-P15, yielding a moderate but not-significant effect (r=0.672). The evidence for temporal stability is scant due to the limited literature, and definitive conclusions cannot be drawn. Further evidence on other potential moderators such as adverse experiences or psychosocial functioning is required.

Comparing various Bayesian random-effects models for pooling randomized controlled trials with rare events

Article

Apr 2024
PHARM STAT

The meta‐analysis of rare events presents unique methodological challenges owing to the small number of events. Bayesian methods are often used to combine rare events data to inform decision‐making, as they can incorporate prior information and handle studies with zero events without the need for continuity corrections. However, the comparative performances of different Bayesian models in pooling rare events data are not well understood. We conducted a simulation to compare the statistical properties of four parameterizations based on the binomial‐normal hierarchical model, using two different priors for the treatment effect: weakly informative prior (WIP) and non‐informative prior (NIP), pooling randomized controlled trials with rare events using the odds ratio metric. We also considered the beta‐binomial model proposed by Kuss and the random intercept and slope generalized linear mixed models. The simulation scenarios varied based on the treatment effect, sample size ratio between the treatment and control arms, and level of heterogeneity. Performance was evaluated using median bias, root mean square error, median width of 95% credible or confidence intervals, coverage, Type I error, and empirical power. Two reviews are used to illustrate these methods. The results demonstrate that the WIP outperforms the NIP within the same model structure. Among the compared models, the model that included the treatment effect parameter in the risk model for the control arm did not perform well. Our findings confirm that rare events meta‐analysis faces the challenge of being underpowered, highlighting the importance of reporting the power of results in empirical studies.

Effects of community-based educational video interventions on nutrition, health, and use of health services in low- and middle-income countries: systematic review and meta-analysis

Article

Feb 2024
NUTR REV

Context Health education using videos has been promoted for its potential to enhance community health by improving social and behavior change communication. Objective To provide stakeholders in maternal and child health with evidence that can inform policies and strategies integrating video education to improve maternal, newborn, and child health. Data sources Five databases (MEDLINE, Embase, Scopus, Web of Science, and CENTRAL) were searched on January 28, 2022, and November 10, 2022 (updated search). Quantitative and qualitative studies conducted in low- and middle-income countries on the effects of video-based interventions on nutrition, health, and health service use were eligible. There was no restriction on time or language. Study selection was done in 2 stages and in duplicate. Data extraction A total of 13 710 records were imported to EndNote. Of these, 8226 records were screened by title and abstract using Rayyan, and 76 records were included for full-text evaluation. Results Twenty-nine articles (n = 12 084 participants) were included in this systematic review, and 7 were included in the meta-analysis. Video interventions improved knowledge about newborn care (n = 234; odds ratio [OR], 1.20; 95% confidence interval [CI], 1.04–1.40), colostrum feeding (n = 990; OR, 60.38; 95%CI, 18.25–199.78), continued breastfeeding (BF; n = 1914; OR, 3.79; 95%CI, 1.14–12.64), intention to use family planning (FP) (n = 814; OR, 1.57; 95%CI, 1.10–2.23), and use of FP (n = 864; OR, 6.55; 95%CI, 2.30–18.70). Video interventions did not result in reduced prelacteal feeding or improvement in early initiation of BF. The qualitative studies showed that video interventions were acceptable and feasible, with perceived impacts on communities. Conclusion This systematic review and meta-analysis indicated that video interventions improved knowledge of newborn care, colostrum feeding, and continuing BF, and the intention to use FP. Given the high levels of heterogeneity and inconsistency in reporting, more research with stronger designs is recommended. Systematic review registration PROSPERO registration no. CRD42022292190.

The Hartung-Knapp modification for random-effects meta-analysis: A useful refinement but are there any residual concerns?

Article

Full-text available

Jan 2017
STAT MED

The modified method for random-effects meta-analysis, usually attributed to Hartung andKnapp and also proposed by Sidik and Jonkman, is easy to implement and is becoming advocated for general use. Here, we examine a range of potential concerns about the widespread adoption of this method. Motivated by these issues, a variety of different conventions can be adopted when using the modified method in practice. We describe and investigate the use of a variety of these conventions using a new taxonomy ofmeta-analysis datasets.We conclude that the Hartung and Knapp modificationmay be a suitable replacement for the standard method. Despite this, analysts who advocate the modified method should be ready to defend its use against the possible objections to it that we present. We further recommend that the results from more conventional approaches should be used as sensitivity analyses when using the modified method. It has previously been suggested that a common-effect analysis should be used for this purpose but we suggest amending this recommendation and argue that a standard random-effects analysis should be used instead.

The Hartung‐Knapp modification for random‐effects meta‐analysis: A useful refinement but are there any residual concerns?

Article

Full-text available

Nov 2017
STAT MED

The modified method for random-effects meta-analysis, usually attributed to Hartung and Knapp and also proposed by Sidik and Jonkman, is easy to implement and is becoming advocated for general use. Here, we examine a range of potential concerns about the widespread adoption of this method. Motivated by these issues, a variety of different conventions can be adopted when using the modified method in practice. We describe and investigate the use of a variety of these conventions using a new taxonomy of meta-analysis datasets. We conclude that the Hartung and Knapp modification may be a suitable replacement for the standard method. Despite this, analysts who advocate the modified method should be ready to defend its use against the possible objections to it that we present. We further recommend that the results from more conventional approaches should be used as sensitivity analyses when using the modified method. It has previously been suggested that a common-effect analysis should be used for this purpose but we suggest amending this recommendation and argue that a standard random-effects analysis should be used instead.

Beta-binomial model for meta-analysis of odds ratios: I. BAKBERGENULY AND E. KULINSKAYA

Article

Full-text available

Jan 2017
STAT MED

In meta-analysis of odds ratios (ORs), heterogeneity between the studies is usually modelled via the additive random effects model (REM). An alternative, multiplicative REM for ORs uses overdispersion. The multiplicative factor in this overdispersion model (ODM) can be interpreted as an intra-class correlation (ICC) parameter. This model naturally arises when the probabilities of an event in one or both arms of a comparative study are themselves beta-distributed, resulting in beta-binomial distributions. We propose two new estimators of the ICC for meta-analysis in this setting. One is based on the inverted Breslow-Day test, and the other on the improved gamma approximation by Kulinskaya and Dollinger (2015, p. 26) to the distribution of Cochran's Q. The performance of these and several other estimators of ICC on bias and coverage is studied by simulation. Additionally, the Mantel-Haenszel approach to estimation of ORs is extended to the beta-binomial model, and we study performance of various ICC estimators when used in the Mantel-Haenszel or the inverse-variance method to combine ORs in meta-analysis. The results of the simulations show that the improved gamma-based estimator of ICC is superior for small sample sizes, and the Breslow-Day-based estimator is the best for n⩾100. The Mantel-Haenszel-based estimator of OR is very biased and is not recommended. The inverse-variance approach is also somewhat biased for ORs≠1, but this bias is not very large in practical settings. Developed methods and R programs, provided in the Web Appendix, make the beta-binomial model a feasible alternative to the standard REM for meta-analysis of ORs. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Meta-analysis using individual participant data: One-stage and two-stage approaches, and why they may differ

Article

Full-text available

Oct 2016
STAT MED

Meta-analysis using individual participant data (IPD) obtains and synthesises the raw, participant-level data from a set of relevant studies. The IPD approach is becoming an increasingly popular tool as an alternative to traditional aggregate data meta-analysis, especially as it avoids reliance on published results and provides an opportunity to investigate individual-level interactions, such as treatment-effect modifiers. There are two statistical approaches for conducting an IPD meta-analysis: one-stage and two-stage. The one-stage approach analyses the IPD from all studies simultaneously, for example, in a hierarchical regression model with random effects. The two-stage approach derives aggregate data (such as effect estimates) in each study separately and then combines these in a traditional meta-analysis model. There have been numerous comparisons of the one-stage and two-stage approaches via theoretical consideration, simulation and empirical examples, yet there remains confusion regarding when each approach should be adopted, and indeed why they may differ. In this tutorial paper, we outline the key statistical methods for one-stage and two-stage IPD meta-analyses, and provide 10 key reasons why they may produce different summary results. We explain that most differences arise because of different modelling assumptions, rather than the choice of one-stage or two-stage itself. We illustrate the concepts with recently published IPD meta-analyses, summarise key statistical software and provide recommendations for future IPD meta-analyses. © 2016 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

Random effects meta-analysis: Coverage performance of 95% confidence and prediction intervals following REML estimation

Article

Full-text available

Oct 2016
STAT MED

A random effects meta-analysis combines the results of several independent studies to summarise the evidence about a particular measure of interest, such as a treatment effect. The approach allows for unexplained between-study heterogeneity in the true treatment effect by incorporating random study effects about the overall mean. The variance of the mean effect estimate is conventionally calculated by assuming that the between study variance is known; however, it has been demonstrated that this approach may be inappropriate, especially when there are few studies. Alternative methods that aim to account for this uncertainty, such as Hartung-Knapp, Sidik-Jonkman and Kenward-Roger, have been proposed and shown to improve upon the conventional approach in some situations. In this paper, we use a simulation study to examine the performance of several of these methods in terms of the coverage of the 95% confidence and prediction intervals derived from a random effects meta-analysis estimated using restricted maximum likelihood. We show that, in terms of the confidence intervals, the Hartung-Knapp correction performs well across a wide-range of scenarios and outperforms other methods when heterogeneity was large and/or study sizes were similar. However, the coverage of the Hartung-Knapp method is slightly too low when the heterogeneity is low (I(2) < 30%) and the study sizes are quite varied. In terms of prediction intervals, the conventional approach is only valid when heterogeneity is large (I(2) > 30%) and study sizes are similar. In other situations, especially when heterogeneity is small and the study sizes are quite varied, the coverage is far too low and could not be consistently improved by either increasing the number of studies, altering the degrees of freedom or using variance inflation methods. Therefore, researchers should be cautious in deriving 95% prediction intervals following a frequentist random-effects meta-analysis until a more reliable solution is identified. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Analysing data and undertaking meta-analyses

Chapter

Sep 2019

This chapter describes the principles and methods used to carry out a meta-analysis for a comparison of two interventions for the main types of data encountered. A very common and simple version of the meta-analysis procedure is commonly referred to as the inverse-variance method. This approach is implemented in its most basic form in RevMan, and is used behind the scenes in many meta-analyses of both dichotomous and continuous data. Results may be expressed as count data when each participant may experience an event, and may experience it more than once. Count data may be analysed using methods for dichotomous data if the counts are dichotomized for each individual, continuous data and time-to-event data, as well as being analysed as rate data. Prediction intervals from random-effects meta-analyses are a useful device for presenting the extent of between-study variation. Sensitivity analyses should be used to examine whether overall findings are robust to potentially influential decisions.

A comparison of 20 heterogeneity variance estimators in statistical synthesis of results from studies: A simulation study

Article

Aug 2017

When we synthesize research findings via meta-analysis, it is common to assume that the true underlying effect differs across studies. Total variability consists of the within-study and between-study variances (heterogeneity). There have been established measures, such as I2, to quantify the proportion of the total variation attributed to heterogeneity. There is a plethora of estimation methods available for estimating heterogeneity. The widely used DerSimonian and Laird estimation method has been challenged, but knowledge of the overall performance of heterogeneity estimators is incomplete. We identified 20 heterogeneity estimators in the literature and evaluated their performance in terms of mean absolute estimation error, coverage probability, and length of the confidence interval for the summary effect via a simulation study. Although previous simulation studies have suggested the Paule-Mandel estimator, it has not been compared with all the available estimators. For dichotomous outcomes, estimating heterogeneity through Markov chain Monte Carlo is a good choice if an informative prior distribution for heterogeneity is employed (eg, by published Cochrane reviews). Nonparametric bootstrap and positive DerSimonian and Laird perform well for all assessment criteria for both dichotomous and continuous outcomes. Hartung-Makambi estimator can be the best choice when the heterogeneity values are close to 0.07 for dichotomous outcomes and medium heterogeneity values (0.01 , 0.05) for continuous outcomes. Hence, there are heterogeneity estimators (nonparametric bootstrap DerSimonian and Laird and positive DerSimonian and Laird) that perform better than the suggested Paule-Mandel. Maximum likelihood provides the best performance for both types of outcome in the absence of heterogeneity.

Controlled Clinical Trials

Chapter

Jan 2010

Statistical aspects of the analysis of data from retrospective studies of disease

Article

Jan 1979

Bayesian estimation in random effects meta-analysis using a non-informative prior

Article

Oct 2016
STAT MED

Pooling information from multiple, independent studies (meta-analysis) adds great value to medical research. Random effects models are widely used for this purpose. However, there are many different ways of estimating model parameters, and the choice of estimation procedure may be influential upon the conclusions of the meta-analysis. In this paper, we describe a recently proposed Bayesian estimation procedure and compare it with a profile likelihood method and with the DerSimonian-Laird and Mandel-Paule estimators including the Knapp-Hartung correction. The Bayesian procedure uses a non-informative prior for the overall mean and the between-study standard deviation that is determined by the Berger and Bernardo reference prior principle. The comparison of these procedures focuses on the frequentist properties of interval estimates for the overall mean. The results of our simulation study reveal that the Bayesian approach is a promising alternative producing more accurate interval estimates than those three conventional procedures for meta-analysis. The Bayesian procedure is also illustrated using three examples of meta-analysis involving real data. Copyright © 2016 John Wiley & Sons, Ltd.

A comparison of methods for meta-analysis of a small number of studies with binary outcomes

Abstract and Figures

Recommended publications

Estimation of Heritability in Heteroscedastic One-Way Unbalanced Random Model

A Dynamic “Fixed Effects” Model for Heterogeneous Panel Data

Methods for evidence synthesis in the case of very few studies

Performance of several types of beta-binomial models in comparison to standard approaches for meta-a...

Performance of several types of beta-binomial models in comparison to standard approaches for meta-a...

Methods to calculate uncertainty in the estimated overall effect size from a random-effects meta-ana...