Content uploaded by Philip M. Sedgwick
Author content
All content in this area was uploaded by Philip M. Sedgwick on Aug 28, 2015
Content may be subject to copyright.
Content uploaded by Philip M. Sedgwick
Author content
All content in this area was uploaded by Philip M. Sedgwick on Aug 20, 2015
Content may be subject to copyright.
STATISTICAL QUESTION
Meta-analyses: what is heterogeneity?
Philip Sedgwick reader in medical statistics and medical education
Institute for Medical and Biomedical Education, St George’s, University of London, London, UK
Researchers undertook a meta-analysis to evaluate the
effectiveness of multifactorial assessment and intervention
programmes in preventing falls and injuries among older people.
Randomised or quasi-randomised trials that evaluated
interventions to prevent falls and injuries were included. The
intervention had to be delivered to individual patients, not at a
community or population level. It also had to be service based
in an emergency department, primary care, or the community.
Control groups could receive standard care or no fall prevention.
The outcomes included the number of fallers and fall related
injuries.1
In total 19 trials were identified. Of these, eight reported fall
related injuries. When combined across trials, the risk for fall
related injuries was reduced after the intervention compared
with the control, but not significantly (relative risk 0.90, 95%
confidence interval 0.68 to 1.20). Tests of statistical
heterogeneity for the meta-analysis of fall related injuries gave
the following results: χ2=15.77, degrees of freedom=7, P=0.03
(Cochran’s Q test), I2=55.6% (Higgins’s I2test statistic).
Subgroup analyses using a test of interaction based on Cochran’s
Q test were subsequently performed. The resulting P values
were: P=0.75 for site of delivery (hospital vcommunity); P=0.75
for whether a doctor was included in the team (yes vno); and
P=0.52 for whether trial participants had been selected because
they were at high risk of falls (yes vno).
The study concluded that there was limited evidence that
multifactorial fall prevention programmes in primary care,
community, or emergency care settings were effective in
reducing the number of fallers or fall related injuries.
Which of the following statements, if any, are true for the
meta-analysis of fall related injuries?
a) The presence of statistical heterogeneity would be
indicative of variation between trials in the magnitude or
direction of the sample estimates of the relative risk of fall
related injuries
b) The result of Cochran’s Q test indicated that heterogeneity
existed between the sample estimates
c) Higgins’s I2test statistic indicated that homogeneity
existed between the sample estimates
d) Any statistical heterogeneity in the overall meta-analysis
of fall related injuries was not explained by the subgroup
analyses
Answers
Statements a,b, and dare true, whereas cis false.
A total of eight trials reported fall related injuries. For each trial
a sample estimate of the population parameter of the relative
risk of fall related injuries after multifactorial assessment and
intervention programmes compared with control was obtained.
The aim of the meta-analysis was to combine the sample
estimates from the eight trials and provide a single estimate of
the population parameter. By combining the sample estimates,
the meta-analysis reduced the evidence to a manageable
quantity. The figure shows the forest plot for the meta-analysis.
The interpretation of a forest plot has been described in a
previous question.2
Forest plot of the sample estimates of the relative risk of
fall related injuries after multifactorial assessment and
intervention programmes compared with the control
The combined relative risk of fall related injuries after the
intervention versus the control was 0.90 (95% confidence
interval: 0.68 to 1.20). Therefore, although the risk of fall related
injuries was reduced after the intervention, the difference was
not significant. It was essential that the meta-analysis
incorporated a statistical test of heterogeneity. The purpose of
this test was to assess the extent of variation between the sample
estimates. Heterogeneity would exist if the sample estimates
for the population relative risk were of different magnitudes or
had the opposite direction of effect (ais true). Conversely, if
p.sedgwick@sgul.ac.uk
For personal use only: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe
BMJ 2015;350:h1435 doi: 10.1136/bmj.h1435 (Published 16 March 2015) Page 1 of 3
Endgames
ENDGAMES
homogeneity existed the estimates would be of a similar
magnitude and direction. If heterogeneity existed, it would
influence how the total overall estimate was calculated, as
described below. Furthermore, it would be sensible to explore
the potential sources of heterogeneity. For example,
heterogeneity would occur if the effect of the intervention
differed between the sites where it was delivered (hospital v
community). If so, it would be useful to estimate the effect of
the intervention separately for each of the site of intervention
subgroups. Otherwise, the results of the meta-analysis might be
misleading regarding the effectiveness of the intervention and
might be detrimental to future patient care.
The most routinely used tests for statistical heterogeneity are
Cochran’s Q test and Higgins’s I2test statistic. Cochran’s Q is
the traditional test of heterogeneity and is based on the χ2test,
which has been described in a previous question.3The statistical
test of heterogeneity using Cochran’s Q test is carried out in a
similar way to traditional statistical hypothesis testing, with a
null hypothesis and an alternative hypothesis. The null
hypothesis states that homogeneity exists between the sample
estimates of the population parameter across the trials, and any
variation between them is no more than would be expected when
taking samples from the same population—that is, any variation
between them is a result of sampling error. The alternative
hypothesis states that heterogeneity exists between the sample
estimates. The results for the test of heterogeneity for the
meta-analysis of fall related injuries are displayed towards the
bottom of the forest plot in the line “Test for heterogeneity:
χ2=15.77, df=7, P=0.03, I2=55.6%.” The first three statistics are
the χ2test statistic, degrees of freedom (df), and P value resulting
from Cochran’s Q test of the statistical hypotheses described
above. The I2statistic refers to Higgins’s I2test described below.
The resulting P value for Cochran’s Q test was 0.03, and because
it was smaller than the traditional critical level of significance
of 0.05 (5%), the null hypothesis was rejected in favour of the
alternative, with the conclusion that heterogeneity existed
between the sample estimates (bis true).
Cochran’s Q test is not very accurate; it is conservative and
often fails to detect heterogeneity in the sample estimates.
Therefore, a critical level of significance of 0.10 (10%) is often
chosen rather than the traditional one of 0.05 (5%). Because of
the lack of accuracy of Cochran’s Q test, Higgins’s I2test
statistic is often used as an additional test of heterogeneity.
Higgins’s I2test statistic represents the proportion of variation
between the sample estimates that is due to heterogeneity rather
than to sampling error. Values can range from 0% to 100%,
with 0% indicating that statistical homogeneity exists and 100%
indicating that statistical heterogeneity exists. It has been
suggested that the adjectives low, moderate, and high
(heterogeneity) be assigned to I2values of 25%, 50%, and 75%.
In general, significant heterogeneity is considered to be present
if I2is 50% or more. In the above meta-analysis, I2was reported
as 55.6%, indicating the presence of significant heterogeneity
(cis false) and confirming the result of Cochran’s Q test.
The test of statistical heterogeneity influenced how the combined
overall estimate of the relative risk for fall related injuries after
the intervention compared with the control was obtained. The
presence of heterogeneity between the sample estimates
indicated that “random effects” methods should be used to derive
the combined overall estimate of the treatment effect. If
homogeneity had existed, “fixed effects” methods would have
been used. A meta-analysis incorporating random effects
methodology produces a wider confidence interval for the
combined overall effect than one in which fixed effects
methodology is used, resulting in a less accurate estimate of the
effect of the intervention. The reduced accuracy of the combined
overall effect of the intervention reflected the heterogeneity
between the sample estimates.
The researchers undertook a subgroup analysis to investigate
the heterogeneity between the sample estimates. The aim was
to establish whether the heterogeneity between the eight trials
in the sample estimates of the relative risk of fall related injuries
could be explained by differences between the trials—for
example, in how the intervention was delivered or participants
recruited. If so, the effects of the intervention might vary
according to how the intervention is delivered or between
subgroups of patients, which would have implications for future
care. The researchers chose the factors that were thought to be
important in explaining the observed heterogeneity between the
sample estimates. These subgroups were based on three
factors—site of delivery (hospital vcommunity), whether a
doctor was included in the team (yes vno), and whether trial
participants had been selected because they were at high risk
of falls (yes vno).
For each factor subgroup, a separate meta-analysis of injury
related falls was performed and a subtotal estimate for the effect
of the intervention derived by combining the sample estimates
across trials within the subgroup. A test of heterogeneity was
undertaken for the meta-analysis within each subgroup, although
the results were not presented. The subtotal estimates for the
effects of the intervention and tests of heterogeneity can be
compared between subgroups for a factor, although if done
visually this should be done informally only. In particular, the
lack of statistical significance for the test of heterogeneity within
each subgroup does not indicate that the subgroups within the
factor explain the statistical heterogeneity observed overall. It
may be misleading to compare the results of the tests of
statistical heterogeneity between factor subgroups because the
subgroups may not have sufficient statistical power with respect
to the numbers of trials and participants to detect heterogeneity.
To formally investigate heterogeneity across the subgroups of
a factor, the subtotal estimates for the effects of the intervention
in the subgroups were compared by a test of interaction. For
example, the test of interaction for the factor site of delivery of
intervention (hospital vcommunity) investigated whether the
effect of the intervention on the risk of fall related injuries varied
between the subgroups. Interaction is sometimes referred to as
effect modification. In a meta-analysis, interaction is
investigated using Cochran’s Q test or Higgins’ I2, or both.
These tests involve comparing the subtotal estimates between
the subgroups. Cochran’s Q test provides a test of the null
hypothesis that homogeneity exists between the subgroups in
the subtotal estimates of the population parameter—that is, any
variation between the subgroups is no more than would be
expected as a result of sampling error. Higgins’s I2test statistic
measures the proportion of total variation between the subgroups
in the subtotal estimates that is due to heterogeneity rather than
to sampling error. The test of interaction contrasts with the test
of heterogeneity in the meta-analysis that combines all the trials,
where Cochran’s Q and Higgins’s I2were used to compare the
sample estimates of the treatment effect across all of the trials.
A test of interaction was undertaken for each factor—site of
delivery, whether a doctor was included in the team, and whether
participants were selected because they were at high risk of
falls. For each factor, the researchers performed Cochran’s Q
test to test for interaction between the subgroups. The results
of the test of interaction were χ2=0.1, P=0.75 for the site of
delivery (hospital vcommunity); χ2=0.1, P=0.75 for whether a
doctor was included in the team (yes vno); and χ2=0.42, P=0.52
for whether trial participants had been selected because they
For personal use only: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe
BMJ 2015;350:h1435 doi: 10.1136/bmj.h1435 (Published 16 March 2015) Page 2 of 3
ENDGAMES
were at high risk of falls (yes vno). In none of these tests of
interaction was the P value less than the traditional critical level
of significance of 0.05 (5%). Therefore, the statistical
heterogeneity in the overall meta-analysis of fall related injuries
was not explained by the subgroup analyses (dis true). The
researchers commented that because the studies were carried
out in several countries, differences between the populations or
healthcare systems might have contributed to the heterogeneity.
Methodological differences between trials may also contribute
to statistical heterogeneity. In particular, the meta-analysis above
included trials that were randomised or quasi-randomised and
therefore of variable methodological quality. A
quasi-randomised trial uses methods of allocating participants
to treatment groups that are not truly random—for example,
alternate allocation.
Caution is needed when interpreting findings from subgroup
analyses that use tests of interaction. The results may be
misleading because the analyses are observational and not based
on comparisons between randomised groups of patients, and
therefore prone to confounding.
Competing interests: None declared.
1 Gates S, Fisher JD, Cooke MW, et al. Multifactorial assessment and targeted intervention
for preventing falls and injuries among older people in community and emergency care
settings: systematic review and meta-analysis. BMJ 2008;336:130.
2 Sedgwick P. How to read a forest plot. BMJ 2012;345:e8335.
3Sedgwick P. Statistical tests for independent groups: categorical data. BMJ 2012;344:e344.
Cite this as: BMJ 2015;350:h1435
© BMJ Publishing Group Ltd 2015
For personal use only: See rights and reprints http://www.bmj.com/permissions Subscribe: http://www.bmj.com/subscribe
BMJ 2015;350:h1435 doi: 10.1136/bmj.h1435 (Published 16 March 2015) Page 3 of 3
ENDGAMES