ArticlePDF Available

Caveat emptor: the combined effects of multiplicity and selective reporting

Authors:
  • UNC Gillings School of Global Public Health

Abstract and Figures

Clinical trials and systematic reviews of clinical trials inform healthcare decisions. There is growing concern, however, about results from clinical trials that cannot be reproduced. Reasons for nonreproducibility include that outcomes are defined in multiple ways, results can be obtained using multiple methods of analysis, and trial findings are reported in multiple sources ("multiplicity"). Multiplicity combined with selective reporting can influence dissemination of trial findings and decision-making. In particular, users of evidence might be misled by exposure to selected sources and overly optimistic representations of intervention effects. In this commentary, drawing from our experience in the Multiple Data Sources in Systematic Reviews (MUDS) study and evidence from previous research, we offer practical recommendations to enhance the reproducibility of clinical trials and systematic reviews.
Content may be subject to copyright.
C O M M E N T A R Y Open Access
Caveat emptor: the combined effects of
multiplicity and selective reporting
Tianjing Li
1*
, Evan Mayo-Wilson
1
, Nicole Fusco
1
, Hwanhee Hong
2
and Kay Dickersin
1
Abstract
Clinical trials and systematic reviews of clinical trials inform healthcare decisions. There is growing concern,
however, about results from clinical trials that cannot be reproduced. Reasons for nonreproducibility include that
outcomes are defined in multiple ways, results can be obtained using multiple methods of analysis, and trial
findings are reported in multiple sources (multiplicity). Multiplicity combined with selective reporting can influence
dissemination of trial findings and decision-making. In particular, users of evidence might be misled by exposure to
selected sources and overly optimistic representations of intervention effects. In this commentary, drawing from our
experience in the Multiple Data Sources in Systematic Reviews (MUDS) study and evidence from previous research,
we offer practical recommendations to enhance the reproducibility of clinical trials and systematic reviews.
Keywords: Multiplicity, Selective reporting, Reproducibility
Background
Clinical trials and systematic reviews of clinical trials in-
form healthcare decisions, but there is growing concern
that the methods and results of some clinical trials are not
reproducible [1]. Poor design, careless execution, and vari-
ation in reporting contribute to nonreproducibility [2,3].
In addition, trials may not be reproducible because trialists
have reported their studies selectively [4]. Although steps
now being taken toward open scienceare designed to
enhance reproducibility [5,6], such as trial registration
and mandatory results reporting [713], making trial pro-
tocols and results public may lead to a glut of data and
sources that few scientists have the resources to explore.
This well-needed approach will thus not serve as a pana-
cea for the problem of nonreproducibility.
Goodman and colleagues argue that multiplicity,
combined with incomplete reporting, might be the single
largest contributor to the phenomenon of nonreproduci-
bility, or falsity, of published claims [in clinical re-
search]([14], p. 4). We define multiplicity in clinical
research to include assessing multiple outcomes, using
multiple statistical models, and reporting in multiple
sources. When multiplicity is used by investigators to
selectively report trial design and findings, misleading
information is transmitted to evidence users.
Multiplicity was evident in a study we recently con-
ducted, the Multiple Data Sources in Systematic Reviews
(MUDS) project [1519]. In this paper, drawing from
our experience in the MUDS study and evidence from
previous research, we offer practical recommendations
to enhance the reproducibility of clinical trials and sys-
tematic reviews.
Multiplicity of outcomes
Choosing appropriate outcomes is a critical step in design-
ing valid and useful clinical trials. An outcome is an event
following an intervention that is used to assess its safety
and/or efficacy [20]. For randomized controlled trials
(RCTs), outcomes should be clinically relevant and import-
ant to patients, and they should capture the causal effects
of interventions; core outcome sets aim to do this [21].
A clear outcome definition includes the domain (e.g.,
pain), the specific measurement tool or instrument (e.g.,
short form of the McGill Pain Questionnaire), the time
point of assessment (e.g., 8 weeks), the specific metric
used to characterize each participants results (e.g.,
change from baseline to a specific time point), and the
method of aggregating data within each group (e.g.,
mean) (Table 1)[22,23]. Multiplicity in outcomes occurs
when, for one outcome domain,there are variations in
* Correspondence: tli19@jhu.edu
1
Department of Epidemiology, Johns Hopkins University Bloomberg School
of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA
Full list of author information is available at the end of the article
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Li et al. Trials (2018) 19:497
https://doi.org/10.1186/s13063-018-2888-9
the other four elements [17]. For example, a trial can
collect data on many outcomes under the rubric of
pain,introducing multiplicity and the possibility for se-
lectively reporting a pain outcome associated with the
most favorable results. Likewise, a systematic review
may specify only the outcome domain, allowing for vari-
ations in all other elements [24].
To illustrate how cherry-pickingan outcome can
work, in a Pfizer study that compared celecoxib with pla-
cebo in osteoarthritis of the knee, the investigators noted,
The WOMAC [Western Ontario and McMaster Univer-
sities] pain subscale was the most responsive of all five
pain measures. Painactivity composites resulted in a sta-
tistically significant difference between celecoxib and pla-
cebo but were not more responsive than pain measures
alone. However, a composite responder defined as having
20% improvement in pain or 10% improvement in activity
yielded much larger differences between celecoxib and
placebo than with pain scores alone([25], p. 247).
Multiplicity of analyses and results
The goal of the analysis in an RCT is to draw inferences
regarding the intervention effect by contrasting
group-level quantities. Numerical contrasts between
groups, which are typically ratios (e.g., relative risk) or dif-
ferences in values (e.g., difference in means), are the re-
sults of the trial. There are numerous ways to analyze data
for a defined outcome; thus, multiple methods of analysis
introduce another dimension of multiplicity in clinical tri-
als [17]. For example, one could analyze data on all or a
subset of the participants, use multiple methods for hand-
ling missing data, and adjust for different covariates.
Although it makes sense that a range of analyses may
be conducted to ensure that the findings are robust to
different assumptions made about the data, performing
analysis multiple ways and obtaining different results
can lead to selective reporting of results deemed as fa-
vorable by the study investigators [26,27].
Multiplicity of sources
Trial results can be reported in multiple places. This cre-
ates problems for users because many sources present
incomplete or unclear information, and by reporting in
multiple sources, investigators may present conflicting
results. When we compared all data sources for trials in-
cluded in the MUDS project, we found that information
about trial characteristics and risk of bias often differed
across reports, and conflicting information was difficult
to disentangle [18]. In addition, important information
about certain outcomes was available only in nonpublic
sources [17]. Additionally, information within databases
may change over time. In trial registries, outcomes may
be changed, deleted, or added; although changes are
documented in the archives of ClinicalTrials.gov, they
are easily overlooked.
The consequences of multiplicity in RCTs
Compared with the number of domainsin a trial, multi-
plicity in outcome definitions and methods of analysis
may lead to an exponentially larger number of RCT re-
sults. This, combined with multiple sources of RCT infor-
mation, leads to challenges for subsequent evidence
synthesis [17].
There are many ways that multiplicity leads to re-
search waste. Arguably, the most prominent example is
that when one uses inconsistent outcome definitions
across RCTs, trial findings cannot be combined in sys-
tematic reviews and meta-analyses even when the indi-
vidual trials studied the same question [28,29].
Aggregating results from trials depends on consistency
in both outcome domains and the other four elements.
Failure to synthesize the quantitative evidence means that
health policy, practice guidelines, and healthcare
decision-making are not informed by RCT evidence, even
though RCTs exist [2,30,31]. For example, in a Cochrane
eyes and vision systematic review and meta-analysis of
RCTs examining methods to control inflammation after
cataract surgery, 48 trials were eligible for inclusion in the
review. However, no trial contributed to the meta-analysis,
because the outcome domain inflammationwas assessed
and aggregated inconsistently [32,33].
Multiplicity combined with selective reporting can
mislead decision-making. There is ample evidence that
outcomes associated with positive or statistically signifi-
cant results are more likely to be reported than out-
comes associated with negative or null results [4,34].
Selective reporting can have three types of consequence
for a systematic review: (1) a systematic review may fail
to locate an entire trial because it remains unpublished
(potential for publication bias); (2) a systematic review
may locate the trial but fail to locate all outcomes
assessed in the trial (potential for bias in selective
reporting of outcomes); and (3) a systematic review may
Table 1 Elements needed to define an outcome
Element Description
1. Domain Title or concept that describes the outcome
2. Specific measure Tool or instrument that assesses the outcome
domain, including the name of the tool or
instrument and/or specific diagnostic criteria
and ascertainment procedures
3. Time point When the outcome will be assessed
4. Specific metric Ways to characterize measurement on each
individual (e.g., change in a measurement
from baseline to a specific time point)
5. Method of
aggregation
Ways to summarize individual-level measurements
into group-level statistics for estimating treatment
effect, including if the outcome will be treated as a
continuous, categorical, or time-to-event variable
and, if relevant, the specific cutoff or categories
Li et al. Trials (2018) 19:497 Page 2 of 6
locate all outcomes but fail to locate all numerical re-
sults (potential for bias in selective reporting of results)
[35]. Any three types of selective reporting threaten the
reproducibility of clinical trials and the validity of sys-
tematic reviews because they lead to overly optimistic
representations of intervention effects. To improve the
reproducibility of clinical trials and systematic reviews,
we have the recommendations outlined below for trial-
ists and systematic reviewers.
Recommendation 1: Trialists should define outcomes using
the five-element framework and use core outcome sets
whenever possible
Many trials do not define their outcomes completely [23];
yet, simply naming an outcome domain for a trial is insuf-
ficient to limit multiplicity, and it invites selective report-
ing. When outcomes are defined solely in terms of their
domains, there is much room for making up multiple out-
comes post hoc and cherry-picking favorable results.
In MUDS, we collected data from 21 trials of gabapen-
tin for neuropathic pain. By searching for all sources of
information about the trials, we identified 74 reports
that described the trial results, including journal articles,
conference abstracts, trial registrations, approval pack-
ages from the U.S. Food and Drug Administration, and
clinical study reports. We also acquired six databases
containing individual participant data. For the single
outcome domain pain intensity,we identified 8 specific
measurements (e.g., short form of the McGill Pain Ques-
tionnaire, visual analogue scale), 2 specific metrics, and
39 methods of aggregation for an 8-week time window.
This resulted in 119 defined outcomes.
Recommendation 2: Trialists should produce and update, as
needed, a dated statistical analysis plan (SAP) and
communicate the plan to the public
It is possible to obtain multiple results for a single out-
come by using different methods of analysis [17,36]. In
MUDS, using gabapentin for neuropathic pain as an ex-
ample, we identified 4 analysis populations and 5 ways
of handling missing data from 21 trials, leading to 287
results for pain intensity at an 8-week time window.
We recommend that trialists should produce a SAP be-
fore the first participant is randomized, following the rec-
ommended guidelines [37]. The International Conference
on Harmonisation defines a SAP as adocumentthatcon-
tains a more technical and detailed elaboration of the prin-
cipal features of the analysis than those described in the
protocol, and includes detailed procedures for executing
the statistical analysis of the primary and secondary
variables and other data([38], p. 35). Currently, SAPs are
usually prepared for industry-sponsored trials; however, in
our opinion, SAPs may not be prepared with the same level
of detail for non-industry-sponsored trials [39]. Others
have shown a diverse practice with regard to SAP con-
tent [37]. The National Institutes of Health has a less
specific but similar policy, which became effective Janu-
ary 25, 2018 [40].
It is entirely possible that additional analyses might be
conducted after the SAP is drafted, such as at the behest
of peer reviewers or a data monitoring committee. In
cases such as this, investigators should document and
date any amendments to the SAP or protocol and com-
municate post hoc analyses clearly. SAPs should be
made publicly available and linked to other trial infor-
mation (see Recommendation 3).
Recommendation 3: Trialists should make information
about trial methods and results public, provide references
to the information sources in a trial registry, and keep the
list of sources up-to-date
Trial methods and results should be made public so that
users can assess the validity of the trial findings. Users of
trial information should anticipate that there may be
multiple sources associated with a trial. A central index-
ing system, such as a trial registry, for listing all trial in-
formation sources should be available so that systematic
reviewers can find multiple sources without unnecessary
expenditure of resources.
Recommendation 4: Systematic reviewers should anticipate
a multiplicity of outcomes, results, and sources for trials
included in systematic reviews and should describe how
they will handle such issues before initiating their research
Systematic reviewers sometimes use explicit rules for
data extraction and analysis. For example, some system-
atic reviewers extract outcomes that were measured
using the most common scale or instrument for a par-
ticular domain. Although such approaches may be re-
producible and efficient, they may exclude data that
users consider informative. When rules for selecting
from among multiple outcomes and results are not pre-
specified, the choice of data for meta-analysis may be ar-
bitrary or data-driven. In the MUDS example, if we were
to pick all possible combinations of the three elements
(specific measure, specific metric, and method of aggre-
gation) for a single outcome domain, pain intensity at an
8-week window (i.e., holding domain and time point
constant), we could conduct 34 trillion different
meta-analyses [18].
Many authoritative sources recommend looking for all
sources of information about each trial identified for a
systematic review [41,42]. To investigate whether multi-
plicity in results and multiplicity in data sources might
influence the conclusions of meta-analysis on pain at
8 weeks, we performed a resampling meta-analysis [43]
using MUDS data from the 21 trials and 74 sources as
follows:
Li et al. Trials (2018) 19:497 Page 3 of 6
Fig. 1 Results of the resampling meta-analyses for pain intensity at 8 weeks [18]. CSR Clinical Study Report, FDA U.S. Food and Drug
Administration, IPD Individual patient data, SMD Standardized mean difference
Li et al. Trials (2018) 19:497 Page 4 of 6
1. In each resampling iteration, we randomly selected
one possible result from each trial within a
prespecified 8-week time window.
2. We combined the sampled results using a random
effects meta-analysis.
3. We iterated the first two steps 10,000 times;
4. We generated a histogram that shows the
distribution of the estimates from meta-analyses.
As shown in the top histogram of Fig. 1, when all
sources of data were used, meta-analyses that included
the largest and smallest estimates from each trial could
lead to different conclusions on the effectiveness of
gabapentin with nonoverlapping 95% CIs. When the re-
sampling meta-analyses were repeated using only one
data source at a time, we found that there was variation
in the results by data source.
Conclusions
Multiplicity of outcomes, analyses, results, and sources,
coupled with selective reporting, can affect the findings
of individual trials as well as the systematic reviews and
meta-analyses based on them. We encourage trialists
and systematic reviewers to consider our recommenda-
tions aimed at minimizing the effects of multiplicity on
what we know about intervention effectiveness.
Abbreviations
MUDS: Multiple Data Sources in Systematic Reviews; RCT: Randomized
controlled trial; SAP: Statistical analysis plan
Funding
This work was supported by contract ME 1303 5785 from the Patient-Centered
Outcomes Research Institute (PCORI) and a fund established at The Johns
Hopkins University for scholarly research on reporting biases by Greene LLP. The
funders were not involved in the design or conduct of the study, manuscript
preparation, or the decision to submit the manuscript for publication.
Authorscontributions
All authors served as key investigators of the MUDS study, contributing to its
design, execution, analysis, and reporting. Specifically, EMW served as the
project director and managed the daily operation of the study. NF contributed
to the data collection, management, and analysis. HH led the data analysis. KD
obtained and secured the funding and oversaw all aspects of the study. For this
commentary, TL wrote the initial draft with input from EMW, NF, and KD. HH
generated the figure. All authors reviewed and critically revised the manuscript,
and all authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable, because this is a commentary.
Competing interests
Up to 2008, KD served as an unpaid expert witness for the plaintiffslawyers
(Greene LLP) in litigation against Pfizer that provided several gabapentin
documents used for several papers we referenced and commented on.
Thomas Greene has donated funding to The Johns Hopkins University for
scholarship related to reporting that has been used by various doctoral
students, including NF.
PublishersNote
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Author details
1
Department of Epidemiology, Johns Hopkins University Bloomberg School
of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA.
2
Department of Biostatistics and Bioinformatics, Duke University School of
Medicine, 2424 Erwin Road, Suite 1105, 11041 Hock Plaza, Durham, NC
27705, USA.
Received: 7 March 2018 Accepted: 30 August 2018
References
1. Ebrahim S, Sohani ZN, Montoya L, Agarwal A, Thorlund K, Mills EJ, Ioannidis
JP. Reanalyses of randomized clinical trial data. JAMA. 2014;312(10):102432.
2. Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S, Michie S,
Moher D, Wager E. Reducing waste from incomplete or unusable reports of
biomedical research. Lancet. 2014;383(9913):26776.
3. Turner L, Shamseer L, Altman DG, Schulz KF, Moher D. Does use of the
CONSORT Statement impact the completeness of reporting of randomised
controlled trials published in medical journals? A Cochrane review. Syst Rev.
2012;1:60. https://doi.org/10.1186/2046-4053-1-60.
4. Dwan K, Gamble C, Williamson PR, Kirkham JJ. Reporting Bias Group.
Systematic review of the empirical evidence of study publication bias and
outcome reporting bias an updated review. PLoS One. 2013;8(7):e66844.
5. Institute of Medicine. Sharing clinical trial data: maximizing benefits,
minimizing risk. Washington, DC: National Academies Press; 2015.
6. Nosek BA, Alter G, Banks GC, et al. Scientific standards: promoting an open
research culture. Science. 2015;348(6242):14225.
7. DeAngelis CD, Drazen JM, Frizelle FA, Haug C, Hoey J, Horton R, Kotzin S,
Laine C, Marusic A, Overbeke AJ, Schroeder TV, Sox HC, Van Der Weyden
MB. International Committee of Medical Journal Editors. Clinical trial
registration: a statement from the International Committee of Medical
Journal Editors. JAMA. 2004;292(11):13634.
8. Zarin DA, Tse T, Williams RJ, Rajakannan T. Update on trial registration 11 years
after the ICMJE policy was established. N Engl J Med. 2017;376(4):38391.
9. Food and Drug Administration Amendments Act (FDAAA), 42 USC §801. 2007.
10. National Institutes of Health. NIH policy on dissemination of NIH-funded
clinical trial information. Fed Regist. 2016;81(183):64922.
11. Clinical trials registration and results information submission: final rule. 42
CFR 11. 2016.
12. Hudson KL, Lauer MS, Collins FS. Toward a new era of trust and
transparency in clinical trials. JAMA. 2016;316(13):13534.
13. Zarin DA, Tse T, Williams RJ, Carr S. Trial reporting in ClinicalTrials.gov the
final rule. N Engl J Med. 2016;375(20):19982004.
14. Goodman SN, Fanelli D, Ioannidis JP. What does research reproducibility
mean? Sci Transl Med. 2016;8(341):341ps12. https://doi.org/10.1126/
scitranslmed.aaf5027.
15. Mayo-Wilson E, Hutfless S, Li T, Gresham G, Fusco N, Ehmsen J, Heyward J,
Vedula S, Lock D, Haythornthwaite J, Payne JL, Cowley T, Tolbert E, Rosman
L, Twose C, Stuart EA, Hong H, Doshi P, Suarez-Cuervo C, Singh S, Dickersin
K. Integrating multiple data sources (MUDS) for meta-analysis to improve
patient-centered outcomes research. Syst Rev. 2015;4:143.
16. Mayo-Wilson E, Doshi P, Dickersin K. Are manufacturers sharing data as
promised? BMJ. 2015;351:h4169.
17. Mayo-Wilson E, Fusco N, Li T, Hong H, Canner J, Dickersin K. MUDS
Investigators. Multiple outcomes and analyses in clinical trials create challenges
for interpretation and research synthesis. J Clin Epidemiol. 2017;86:3950.
18. Mayo-Wilson E, Li T, Fusco N, Bertizzolo L, Canner JK, Cowley T, Doshi P,
Ehmsen J, Gresham G, Guo N, Haythornthwaite JA, Heyward J, Hong H, Pham
D, Payne JL, Rosman L, Stuart EA, Suarez-Cuervo C, Tolbert E, Twose C, Vedula
S, Dickersin K. Cherry-picking by trialists and meta-analysts can drive
conclusions about intervention efficacy. J Clin Epidemiol. 2017;91:95110.
19. Mayo-Wilson E, Li T, Fusco N, Dickersin K. MUDS investigators. Practical
guidance for using multiple data sources in systematic reviews and meta-
analyses (with examples from the MUDS study). Res Synth Methods. 2018;
9(1):212.
20. Meinert CL. Clinical trials dictionary: terminology and usage
recommendations. 2nd ed. Hoboken, NJ: Wiley; 2012.
21. Williamson PR, Altman DG, Blazeby JM, Clarke M, Devane D, Gargon E,
Tugwell P. Developing core outcome sets for clinical trials: issues to
consider. Trials. 2012;13:132.
Li et al. Trials (2018) 19:497 Page 5 of 6
22. Saldanha IJ, Dickersin K, Wang X, Li T. Outcomes in Cochrane systematic
reviews addressing four common eye conditions: an evaluation of
completeness and comparability. PLoS One. 2014;9(10):e109400.
23. Zarin DA, Tse T, Williams RJ, Califf RM, Ide NC. The ClinicalTrials.gov results
database update and key issues. N Engl J Med. 2011;364(9):85260.
24. Dosenovic S, Jelicic Kadic A, Jeric M, Boric M, Markovic D, Vucic K, Puljak L.
Efficacy and safety outcome domains and outcome measures in systematic
reviews of neuropathic pain conditions. Clin J Pain. 2018;34(7):67484.
25. Trudeau J, Van Inwegen R, Eaton T, Bhat G, Paillard F, Ng D, Tan K, Katz NP.
Assessment of pain and activity using an electronic pain diary and
actigraphy device in a randomized, placebo-controlled crossover trial of
celecoxib in osteoarthritis of the knee. Pain Pract. 2015;15(3):24755.
26. Page MJ, McKenzie JE, Kirkham J, Dwan K, Kramer S, Green S, Forbes A. Bias
due to selective inclusion and reporting of outcomes and analyses in
systematic reviews of randomised trials of healthcare interventions.
Cochrane Database Syst Rev. 2014;10:MR000035.
27. Page MJ, McKenzie JE, Chau M, Green SE, Forbes A. Methods to select
results to include in meta-analyses deserve more consideration in
systematic reviews. J Clin Epidemiol. 2015;68(11):128291.
28. Saldanha IJ, Li T, Yang C, Owczarzak J, Williamson PR, Dickersin K. Clinical
trials and systematic reviews addressing similar interventions for the same
condition do not consider similar outcomes to be important: a case study
in HIV/AIDS. J Clin Epidemiol. 2017;84:8594.
29. Saldanha IJ, Lindsley K, Do DV, Chuck RS, Meyerle C, Jones L, Coleman AL,
Jampel HJ, Dickersin K, Virgili G. Comparison of clinical trials and systematic
review outcomes for the 4 most prevalent eye diseases. JAMA Ophthalmol.
2017;135(9):93340.
30. Chalmers I, Glasziou P. Avoidable waste in the production and reporting of
research evidence. Lancet. 2009;374(9683):869.
31. Chan AW, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, Krumholz
HM, Ghersi D, van der Worp HB. Increasing value and reducing waste:
addressing inaccessible research. Lancet. 2014;383(9913):25766. https://doi.
org/10.1016/S0140-6736(13)62296-5.
32. Juthani VV, Clearfield E, Chuck RS. Non-steroidal anti-inflammatory drugs
versus corticosteroids for controlling inflammation after uncomplicated
cataract surgery. Cochrane Database Syst Rev. 2017;7:CD010516. https://doi.
org/10.1002/14651858.CD010516.pub2.
33. Clearfield E, Money S, Saldanha I, Chuck R, Lindsley K. Outcome choice and
potential loss of valuable information - an example from a Cochrane Eyes
and Vision systematic review. Abstracts of the Global Evidence Summit,
Cape Town, South Africa. Cochrane Database Syst Rev. 2017;9(Suppl 1).
https://doi.org/10.1002/14651858.CD201702.
34. Song F, Parekh S, Hooper L, Loke YK, Ryder J, Sutton AJ, Hing C, Kwok CS,
Pang C, Harvey I. Dissemination and publication of research findings: an
updated review of related biases. Health Technol Assess. 2010;14(8). https://
doi.org/10.3310/hta14080.
35. Sterne JA, Sutton AJ, Ioannidis JP, Terrin N, Jones DR, Lau J, Carpenter J,
Rücker G, Harbord RM, Schmid CH, Tetzlaff J, Deeks JJ, Peters J, Macaskill P,
Schwarzer G, Duval S, Altman DG, Moher D, Higgins JP. Recommendations
for examining and interpreting funnel plot asymmetry in meta-analyses of
randomised controlled trials. BMJ. 2011;343:d4002.
36. Vedula SS, Li T, Dickersin K. Difference in reporting of analyses in internal
company documents versus published trial reports: comparisons in industry-
sponsored trials in off-label uses of gabapentin. PLoS Med. 2013;10(1):e1991378.
37. Gamble C, Krishan A, Stocken D, Lewis S, Juszczak E, Doré C, Williamson PR,
Altman DG, Montgomery A, Lim P, Berlin J, Senn S, Day S, Barbachano Y,
Loder E. Guidelines for the content of statistical analysis plans in clinical
trials. JAMA. 2017;318(23):233743.
38. International Conference on Harmonisation of technical requirements for
registration of pharmaceuticals for human use. Statistical Principles for
Clinical Trials E9. Step 4 version. 5 February 1998. http://www.ich.org/
fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E9/Step4/E9_
Guideline.pdf. Accessed 11 Sept 2018.
39. Finfer S, Bellomo R. Why publish statistical analysis plans? Crit Care Resusc.
2009;11(1):56.
40. General instructions for NIH and other PHS agencies. SF424 (R&R) Application
Packages. 25 September 2017. https://grants.nih.gov/grants/how-to-apply-
application-guide/forms-e/general-forms-e.pdf. Accesse d 11 Sept 2018.
41. JPT H, Deeks JJ. Chapter 7: Selecting studies and collecting data. In: JPT H,
Green S, editors. Cochrane handbook for systematic reviews of
interventions. Version 5.1.0 (updated March 2011): The Cochrane
Collaboration; 2011. Available from https://training.cochrane.org/handbook.
Accessed 11 Sept 2018.
42. Institute of Medicine. Finding what works in health care: standards for
systematic reviews. Washington, DC: National Academies Press; 2011.
43. Tendal B, Nuesch E, Higgins JP, Juni P, Gotzsche PC. Multiplicity of data in
trial reports and the reliability of meta-analyses: Empirical study. BMJ. 2011;
343:d4829.
Li et al. Trials (2018) 19:497 Page 6 of 6
... By contrast, reviewers are advised to look for other data sources about harms such as clinical study reports (CSRs), case report forms, trial registers, and individual participant data (IPD) [31][32][33][34][35][36][37][38][39]. Locating and reviewing multiple data sources requires more time and resources than reviewing journal articles alone [40,41], and sometimes, trial investigators do not provide requested data, so many reviews are limited to published reports [42][43][44]. This is an important limitation because conclusions can change when different data sources are used, as in reviews examining the relative harms and benefits of antidepressants for young people [45,46]. ...
Article
Full-text available
Guidance for systematic reviews of interventions recommends both benefits and harms be included. Systematic reviews may reach conclusions about harms (or lack of harms) that are not true when reviews include only some relevant studies, rely on incomplete data from eligible studies, use inappropriate methods for synthesizing data, and report results selectively. Separate reviews about harms could address some of these problems, and we argue that conducting separate reviews of harms is a feasible alternative to current standards and practices. Systematic reviews of potential benefits could be organized around the use of interventions for specific health problems. Systematic reviews of potential harms could be broader, including more diverse study designs and including all people at risk of harms (who might use the same intervention to treat different health problems). Multiple reviews about benefits could refer to a single review of harms. This approach could improve the reliability, completeness, and efficiency of systematic reviews.
... Therefore, when using multiple outcomes, these should be in different domains as much as feasible. 12 The outcome measures must be comprehensively described or else the results of the trial cannot be interpreted. In an assessment of reporting in trials in livestock species, the measurement of all outcomes was described in 79% of trials, meaning that information with respect to all outcomes was not provided in approximately one-fifth of trials. ...
Article
Researchers planning clinical trials should identify the primary trial outcome and adequately power the trial to detect clinically meaningful differences in this outcome. All primary and secondary outcomes and their measurement should be comprehensively described, and their results reported. There is evidence that trials on the same subject use different outcomes or measure the same outcome in different ways, making it difficult to compare intervention effectiveness across clinical trials. Consensus development of core outcome sets could improve consistency in outcome measures used across trials and aid in development of an evidence-based body of literature on intervention effectiveness in swine populations.
... It was notable that many of the most common movements from the larger clinical literature (eg, reaching, sit-to-stand, tracing, and pointing) appeared so infrequently in this literature. This lack of consistency in the literature could have affected the validity estimates [135][136][137][138][139], and the lack of harmonization across studies limits any inference about methodological or analytic decisions [140]. ...
Article
Full-text available
Background With the advent of smart sensing technology, mobile and wearable devices can provide continuous and objective monitoring and assessment of motor function outcomes. Objective We aimed to describe the existing scientific literature on wearable and mobile technologies that are being used or tested for assessing motor functions in mobility-impaired and healthy adults and to evaluate the degree to which these devices provide clinically valid measures of motor function in these populations. Methods A systematic literature review was conducted by searching Embase, MEDLINE, CENTRAL (January 1, 2015, to June 24, 2020), the United States and European Union clinical trial registries, and the United States Food and Drug Administration website using predefined study selection criteria. Study selection, data extraction, and quality assessment were performed by 2 independent reviewers. Results A total of 91 publications representing 87 unique studies were included. The most represented clinical conditions were Parkinson disease (n=51 studies), followed by stroke (n=5), Huntington disease (n=5), and multiple sclerosis (n=2). A total of 42 motion-detecting devices were identified, and the majority (n=27, 64%) were created for the purpose of health care–related data collection, although approximately 25% were personal electronic devices (eg, smartphones and watches) and 11% were entertainment consoles (eg, Microsoft Kinect or Xbox and Nintendo Wii). The primary motion outcomes were related to gait (n=30), gross motor movements (n=25), and fine motor movements (n=23). As a group, sensor-derived motion data showed a mean sensitivity of 0.83 (SD 7.27), a mean specificity of 0.84 (SD 15.40), a mean accuracy of 0.90 (SD 5.87) in discriminating between diseased individuals and healthy controls, and a mean Pearson r validity coefficient of 0.52 (SD 0.22) relative to clinical measures. We did not find significant differences in the degree of validity between in-laboratory and at-home sensor-based assessments nor between device class (ie, health care–related device, personal electronic devices, and entertainment consoles). Conclusions Sensor-derived motion data can be leveraged to classify and quantify disease status for a variety of neurological conditions. However, most of the recent research on digital clinical measures is derived from proof-of-concept studies with considerable variation in methodological approaches, and much of the reviewed literature has focused on clinical validation, with less than one-quarter of the studies performing analytical validation. Overall, future research is crucially needed to further consolidate that sensor-derived motion data may lead to the development of robust and transformative digital measurements intended to predict, diagnose, and quantify neurological disease state and its longitudinal change.
... The second set of articles discusses the relevance of the open science movement to intervention trials in prevention science. Axford et al. (2022) examine the important issue of publication bias in prevention, focusing on the harm and waste that results from the misrepresentation of study results, or "spin" (Lazarus et al., 2015), and from non-reporting of null and negative trial results (Li et al., 2018). The field would benefit from considering the strategies that the authors advance to enable and learn from the transparent, comprehensive dissemination of null and negative results. ...
... Although statistical methods can be used to overcome the problem of outcome heterogeneity (eg, by converting one metric or aggregation score into another), multiple outcome definitions give rise to reporting bias and to methodological heterogeneity. [74][75][76] For example, the seven most common depression scales encompass 52 symptoms of depression and differ considerably in content. 21 (40%) of the 52 symptoms appear in only one scale, whereas 6 (12%) of 52 symptoms are included in all. ...
Article
The clinical guidelines that underpin the use of drugs for mental disorders are informed by evidence from randomised controlled trials (RCTs). RCTs are performed to obtain marketing authorisation from regulators. The methods used in these RCTs could be appropriate for early phases of drug development because they identify drugs with important harms and drugs that are efficacious for specific health problems and populations. RCTs done before marketing authorisation do not tend to address clinical questions that concern the effectiveness of a drug in heterogeneous and comorbid populations, the optimisation of drug sequencing and discontinuation, or the comparative benefits and harms of different drugs that could be used for the same health problem. This Review proposes an overview of some shortcomings of RCTs, at an individual level and at the whole portfolio level, and identifies some methods in planning, conducting, and carrying out analyses in RCTs that could enhance their ability to support therapeutic decisions. These suggestions include: identifying patient-important questions to be investigated by psychopharmacological RCTs; embedding pragmatic RCTs within clinical practice to improve generalisability to target populations; collecting evidence about drugs in overlooked populations; developing methods to facilitate the recruitment of patients with mental disorders and to reduce the number of patients who drop out, using specific methods; using core outcome sets to standardise the assessment of benefits and harms; and recording systematically serious objective outcomes, such as suicide or hospitalisation, to be evaluated in meta-analyses. This work is a call to address questions relevant to patients using diverse design of RCTs, thus contributing to the development of a patient-centred, evidence-based psychiatry.
... Systematic reviews should include all relevant reports (e.g., design papers, primary and secondary results papers, conference abstracts, trial registration) for included studies because different reports might present different and complementary information. 6,7,10,11 For overviews and studies that include systematic reviews, it is important to assess the overlap in citations so that supporting evidence is not double-counted towards a summary effect estimate. [12][13][14][15][16] Across a set of reviews for an intervention, we would hope to see similar results for harms, especially if those reviews include the same sources of evidence. ...
Article
Full-text available
Objective : In this methodologic study (Part 2 of 2), we examined the overlap in sources of evidence and the corresponding results for harms in systematic reviews for gabapentin. Study Design & Setting : We extracted all citations referenced as sources of evidence for harms of gabapentin from 70 systematic reviews, as well as the harms assessed and numerical results. We assessed consistency of harms between pairs of reviews with a high degree of overlap in sources of evidence (>50%) as determined by corrected covered area (CCA). Results : We found 514 reports cited across 70 included reviews. Most reports (244/514, 48%) were not cited in more than one review. Among 18 pairs of reviews, we found reviews had differences in which harms were assessed and their choice to meta-analyze estimates or present descriptive summaries. When a specific harm was meta-analyzed in a pair of reviews, we found similar effect estimates. Conclusion : Differences in harms results across reviews can occur because the choice of harms is driven by reviewer preferences, rather than standardized approaches to selecting harms for assessment. A paradigm shift is needed in the current approach to synthesizing harms.
... Registered trials are also less likely to report significant results than non-registered ones (Kaplan & Irvin, 2015). favourable results (Zarin et al., 2011;Mayo-Wilson et al., 2017;Li, Mayo-Wilson, et al., 2018). In the included studies, 17.9% of the primary outcomes in the protocols and 37.9% of the primary outcomes in the publication described the domain, but not the specific measurement. ...
Article
Background Outcome discrepancies between protocols and respective publications represent a concerning bias. The purpose of this study was to assess the prevalence of selective outcome reporting (SOR) in root coverage randomized clinical trials (RCTs). Methods Published root coverage RCTs (July 2005 to March 2020) were included if a corresponding protocol could be identified in a public registry. Discrepancies between protocol and its correspondent publication(s) were compared regarding primary and secondary outcomes and other study characteristics. Associations between trial characteristics and SOR were evaluated. Results Forty four studies (54 publications) were included. The majority of studies (77.3%) were retrospectively registered. SOR was frequent (40.9% of trials) and consisted of primary outcome downgrade (22.7%); secondary outcome upgrade (11.4%); new primary outcome introduced in publication (25%); protocol primary outcome omitted from publication (13.6%) and discrepancy in primary outcome timing (18.2%). SOR was unclear in 20.5% of studies and favoured statistical significance in 12 studies (27.3%). SOR was significantly associated with study significance (p < 0.001) and unclear outcome definition in the publication (p < 0.001). Only a third (32.8%) of primary outcomes were completely defined. Conclusions The present study identified high prevalence of SOR in root coverage RCTs.
Chapter
Clinical trials are experiments in human beings. Findings from these experiments, either by themselves or within research syntheses, are often meant to evidence-based clinical decision-making. These decisions can be misled when clinical trials are reported in a biased manner. For clinical trials to inform healthcare decisions without bias, their reporting should be complete, timely, transparent, and accessible. Reporting of clinical trials is biased when it is influenced by the nature and direction of its results. Reporting biases in clinical trials may manifest in different ways, including results not being reported at all, reported in part, with delay, or in sources of scientific literature that are harder to access. Biased reporting of clinical trials in turn can introduce bias into research syntheses, with the eventual consequence being misinformed healthcare decisions. Clinical trial registration, access to protocols and statistical analysis plans, and guidelines for transparent and complete reporting are critical to prevent reporting biases.
Article
Objective : Most systematic reviews of interventions focus on potential benefits. Common methods and assumptions that are appropriate for assessing benefits can be inappropriate for harms. This paper provides a primer on researching harms, particularly in systematic reviews. Study Design and Setting : Commentary describing challenges with assessing harm. Results : Investigators should be familiar with various terminologies used to describe, classify, and group harms. Published reports of clinical trials include limited information about harms, so systematic reviewers should not depend on these studies and journal articles to reach conclusions about harms. Visualizations might improve communication of multiple dimensions of harms such as severity, relatedness, and timing. Conclusion : The terminology, classification, detection, collection, and reporting of harms create unique challenges that take time, expertise, and resources to navigate in both primary studies and evidence syntheses. Systematic reviewers might reach incorrect conclusions if they focus on evidence about harms found in published reports of randomized trials of a particular health problem. Systematic reviews could be improved through better identification and reporting of harms in primary studies and through better training and uptake of appropriate methods for synthesizing evidence about harms.
Article
Full-text available
The correct title of the article [1] should be "Integrating multiple data sources (MUDS) for meta-analysis to improve patient-centered outcomes research: a protocol".
Article
Full-text available
Importance While guidance on statistical principles for clinical trials exists, there is an absence of guidance covering the required content of statistical analysis plans (SAPs) to support transparency and reproducibility. Objective To develop recommendations for a minimum set of items that should be addressed in SAPs for clinical trials, developed with input from statisticians, previous guideline authors, journal editors, regulators, and funders. Design Funders and regulators (n = 39) of randomized trials were contacted and the literature was searched to identify existing guidance; a survey of current practice was conducted across the network of UK Clinical Research Collaboration–registered trial units (n = 46, 1 unit had 2 responders) and a Delphi survey (n = 73 invited participants) was conducted to establish consensus on SAPs. The Delphi survey was sent to statisticians in trial units who completed the survey of current practice (n = 46), CONSORT (Consolidated Standards of Reporting Trials) and SPIRIT (Standard Protocol Items: Recommendations for Interventional Trials) guideline authors (n = 16), pharmaceutical industry statisticians (n = 3), journal editors (n = 9), and regulators (n = 2) (3 participants were included in 2 groups each), culminating in a consensus meeting attended by experts (N = 12) with representatives from each group. The guidance subsequently underwent critical review by statisticians from the surveyed trial units and members of the expert panel of the consensus meeting (N = 51), followed by piloting of the guidance document in the SAPs of 5 trials. Findings No existing guidance was identified. The registered trials unit survey (46 responses) highlighted diversity in current practice and confirmed support for developing guidance. The Delphi survey (54 of 73, 74% participants completing both rounds) reached consensus on 42% (n = 46) of 110 items. The expert panel (N = 12) agreed that 63 items should be included in the guidance, with an additional 17 items identified as important but may be referenced elsewhere. Following critical review and piloting, some overlapping items were combined, leaving 55 items. Conclusions and Relevance Recommendations are provided for a minimum set of items that should be addressed and included in SAPs for clinical trials. Trial registration, protocols, and statistical analysis plans are critically important in ensuring appropriate reporting of clinical trials.
Article
Full-text available
Data for individual trials included in systematic reviews may be available in multiple sources. For example, a single trial might be reported in 2 journal articles and 3 conference abstracts. Because of differences across sources, source selection can influence the results of systematic reviews. We used our experience in the Multiple Data Sources in Systematic Reviews (MUDS) study, and evidence from previous studies, to develop practical guidance for using multiple data sources in systematic reviews. We recommend the following: (1) Specify which sources you will use. Before beginning a systematic review, consider which sources are likely to contain the most useful data. Try to identify all relevant reports and to extract information from the most reliable sources. (2) Link individual trials with multiple sources. Write to authors to determine which sources are likely related to the same trials. Use a modified Preferred Reporting Items for Systematic Reviews and Meta‐analyses (PRISMA) flowchart to document both the selection of trials and the selection of sources. (3) Follow a prespecified protocol for extracting trial characteristics from multiple sources. Identify differences among sources, and contact study authors to resolve differences if possible. (4) Prespecify outcomes and results to examine in the review and meta‐analysis. In your protocol, describe how you will handle multiple outcomes within each domain of interest. Look for outcomes using all eligible sources. (5) Identify which data sources were included in the review. Consider whether the results might have been influenced by data sources used. (6) To reduce bias, and to reduce research waste, share the data used in your review.
Article
Full-text available
Objective: To determine whether disagreements among multiple data sources affect systematic reviews of randomized clinical trials (RCTs). Study design and setting: Eligible RCTs examined gabapentin for neuropathic pain and quetiapine for bipolar depression, reported in public (e.g., journal articles) and non-public sources (clinical study reports [CSRs] and individual participant data [IPD]). Results: We found 21 gabapentin RCTs (74 reports, six IPD) and seven quetiapine RCTs (50 reports, one IPD); most were reported in journal articles (18/21 [86%] and 6/7 [86%], respectively). When available, CSRs contained the most trial design and risk of bias information. CSRs and IPD contained the most results. For the outcome domains "pain intensity" (gabapentin) and "depression" (quetiapine), we found single trials with 68 and 98 different meta-analyzable results, respectively; by purposefully selecting one meta-analyzable result for each RCT, we could change the overall result for pain intensity from effective (standardized mean difference [SMD]=-0.45; 95%CI -0.63 to -0.27) to ineffective (SMD=-0.06; 95%CI -0.24 to 0.12). We could change the effect for depression from a medium effect (SMD=-0.55; 95%CI -0.85 to -0.25) to a small effect (SMD=-0.26; 95%CI -0.41 to -0.1). Conclusions: Disagreements across data sources affect the effect size, statistical significance, and interpretation of trials and meta-analyses.
Article
Full-text available
Objective: To identify variations in outcomes and results across public and non-public reports of randomized clinical trials (RCTs). Study design and setting: Eligible RCTs examined gabapentin for neuropathic pain and quetiapine for bipolar depression, reported in public (e.g., journal articles) and non-public sources (e.g., clinical study reports) available by 2015. We recorded pre-specified outcome domains. We considered outcomes "defined" if they included the domain, measure, metric, method of aggregation, and time-point. We recorded "treatment effect" definitions in each report (i.e., outcome definition and methods of analysis). We assessed whether results were meta-analyzable. Results: We found 21 gabapentin RCTs (68 public, 6 non-public reports) and seven quetiapine RCTs (46 public, 4 non-public reports). RCTs assessed four and seven pre-specified outcome domains, and reported 214 and 81 outcome definitions, respectively. Using multiple outcome definitions and methods of analysis, RCTs assessed 605 and 188 treatment effects, associated with 1,230 and 661 meta-analyzable results. Public reports included 305 (25%) and 109 (16%) meta-analyzable results, respectively. Conclusion: Eligible RCTs included hundreds of outcomes and results. Only a small proportion of outcomes and results were in public reports. Both trial authors and meta-analysts may cherry-pick where there are multiple results and multiple sources of RCTs.
Article
Full-text available
Background: The usefulness of clinical trials and systematic reviews is compromised when they report different outcomes. We compared outcomes in reviews of HIV/AIDS and the trials included in the reviews. Study design and setting: We examined all Cochrane reviews of HIV/AIDS (as of June 2013) that included ≥1 trial, and the trials that the reviews included. We compared outcomes within subgroups defined by type of intervention: clinical management, biomedical prevention, behavioral prevention, and health services. Results: We included 84 reviews that encompassed 524 trials. Although the median number of outcomes per trial (8) and per review (7.5) was similar, the trials reported a considerably greater number of unique outcomes than the reviews (779 vs. 218), ranging from 2.3 times greater (clinical management) to 5.4 times greater (behavioral prevention). High proportions of trial outcomes were not in any review: 68% (clinical management) to 83% (behavioral prevention). Lower proportions of review outcomes were not in any trial: 11% (clinical management) to 39% (health services). Conclusion: Outcomes in trials and reviews are not well aligned for appropriate inclusion of trial results in reviews and meta-analyses. Differences in perspectives, goals, and constraints between trialists and reviewers may explain differences in outcomes they consider important.
Article
Objectives: Heterogeneity of outcome domains, used in interventional trials and systematic reviews (SRs) for neuropathic pain (NeuP), makes decisions on the comparative effectiveness of available treatments difficult. This study analyzed outcome domains and measures used in SRs of randomized controlled trials on efficacy and safety of interventions for NeuP and compared them with the core outcome set (COS) and core outcome measures (COMs) for chronic pain recommended by the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT). Methods: Five electronic databases were searched to find SRs of interventions for NeuP. Outcome domains and measures were independently extracted by 2 authors, and compared against the IMMPACT-recommended COS and COMs. Outcome domains specified in the methods and reported in the results were also compared. Results: Ninety-seven SRs were analyzed. The 2 core domains most frequently specified in the methods and reported in the results of SRs were pain and symptoms and adverse events. Pain intensity was mostly assessed with VAS (n=59) and NRS (n=29) scales. The incidence (n=70) and severity (n=60) were most commonly reported for adverse events. There were 240 different outcome measures used for the assessment of treatment efficacy and safety. Conclusions: Authors of SRs in the field of NeuP insufficiently use relevant recommended COS and COMs for chronic pain. More effort should be put into the implementation of COS to ensure that the study results can be compared and combined. There is a need for defining core outcome domains and measures specific for NeuP.
Article
Importance Suboptimal overlap in outcomes reported in clinical trials and systematic reviews compromises efforts to compare and summarize results across these studies. Objectives To examine the most frequent outcomes used in trials and reviews of the 4 most prevalent eye diseases (age-related macular degeneration [AMD], cataract, diabetic retinopathy [DR], and glaucoma) and the overlap between outcomes in the reviews and the trials included in the reviews. Design, Setting, and Participants This cross-sectional study examined all Cochrane reviews that addressed AMD, cataract, DR, and glaucoma; were published as of July 20, 2016; and included at least 1 trial and the trials included in the reviews. For each disease, a pair of clinical experts independently classified all outcomes and resolved discrepancies. Outcomes (outcome domains) were then compared separately for each disease. Main Outcomes and Measures Proportion of review outcomes also reported in trials and vice versa. Results This study included 56 reviews that comprised 414 trials. Although the median number of outcomes per trial and per review was the same (n = 5) for each disease, the trials included a greater number of outcomes overall than did the reviews, ranging from 2.9 times greater (89 vs 30 outcomes for glaucoma) to 4.9 times greater (107 vs 22 outcomes for AMD). Most review outcomes, ranging from 14 of 19 outcomes (73.7%) (for DR) to 27 of 29 outcomes (93.1%) (for cataract), were also reported in the trials. For trial outcomes, however, the proportion also named in reviews was low, ranging from 19 of 107 outcomes (17.8%) (for AMD) to 24 of 89 outcomes (27.0%) (for glaucoma). Only 1 outcome (visual acuity) was consistently reported in greater than half the trials and greater than half the reviews. Conclusions and Relevance Although most review outcomes were reported in the trials, most trial outcomes were not reported in the reviews. The current analysis focused on outcome domains, which might underestimate the problem of inconsistent outcomes. Other important elements of an outcome (ie, specific measurement, specific metric, method of aggregation, and time points) might have differed even though the domains overlapped. Inconsistency in trial outcomes may impede research synthesis and indicates the need for disease-specific core outcome sets in ophthalmology.
Article
Background: Cataract is a leading cause of blindness worldwide. Cataract surgery is commonly performed but can result in postoperative inflammation of the eye. Inadequately controlled inflammation increases the risk of complications. Non-steroidal anti-inflammatory drugs (NSAIDs) and corticosteroids are used to prevent and reduce inflammation following cataract surgery, but these two drug classes work by different mechanisms. Corticosteroids are effective, but NSAIDs may provide an additional benefit to reduce inflammation when given in combination with corticosteroids. A comparison of NSAIDs to corticosteroids alone or combination therapy with these two anti-inflammatory agents will help to determine the role of NSAIDs in controlling inflammation after routine cataract surgery. Objectives: To evaluate the comparative effectiveness of topical NSAIDs (alone or in combination with topical corticosteroids) versus topical corticosteroids alone in controlling intraocular inflammation after uncomplicated phacoemulsification. To assess postoperative best-corrected visual acuity (BCVA), patient-reported discomfort, symptoms, or complications (such as elevation of IOP), and cost-effectiveness with the use of postoperative NSAIDs or corticosteroids. Search methods: To identify studies relevant to this review, we searched the Cochrane Central Register of Controlled Trials (CENTRAL), which contains the Cochrane Eyes and Vision Trials Register (2016, Issue 12), MEDLINE Ovid (1946 to December 2016), Embase Ovid (1947 to 16 December 2016), PubMed (1948 to December 2016), LILACS (Latin American and Caribbean Health Sciences Literature Database) (1982 to 16 December 2016), the metaRegister of Controlled Trials (mRCT) (www.controlled-trials.com; last searched 17 June 2013), ClinicalTrials.gov (www.clinicaltrials.gov; searched December 2016), and the WHO International Clinical Trials Registry Platform (ICTRP) (www.who.int/ictrp/search/en; searched December 2016). Selection criteria: We included randomized controlled trials (RCTs) in which participants were undergoing phacoemulsification for uncomplicated cataract extraction. We included both trials in which topical NSAIDs were compared with topical corticosteroids and trials in which combination therapy (topical NSAIDs and corticosteroids) was compared with topical corticosteroids alone. The primary outcomes for this review were inflammation and best-corrected visual acuity (BCVA). Data collection and analysis: Two review authors independently screened the full-text articles, extracted data from included trials, and assessed included trials for risk of bias according to Cochrane standards. The two review authors resolved any disagreements by discussion. We graded the certainty of the evidence using GRADE. Main results: This review included 48 RCTs conducted in 17 different countries and two ongoing studies. Ten included studies had a trial registry record. Fifteen studies compared an NSAID with a corticosteroid alone, and 19 studies compared a combination of an NSAID plus a corticosteroid with a corticosteroid alone. Fourteen other studies had more than two study arms. Overall, we judged the studies to be at unclear risk of bias. NSAIDs alone versus corticosteroids aloneNone of the included studies reported postoperative intraocular inflammation in terms of cells and flare as a dichotomous variable. Inflammation was reported as a continuous variable in seven studies. There was moderate-certainty evidence of no difference in mean cell value in the participants receiving an NSAID compared with the participants receiving a corticosteroid (mean difference (MD) -0.60, 95% confidence interval (CI) -2.19 to 0.99), and there was low-certainty evidence that the mean flare value was lower in the group receiving NSAIDs (MD -13.74, 95% CI -21.45 to -6.04). Only one study reported on corneal edema at one week postoperatively and there was uncertainty as to whether the risk of edema was higher or lower in the group that received NSAIDs (risk ratio (RR) 0.77, 95% CI 0.26 to 2.29). No included studies reported BCVA as a dichotomous outcome and no study reported time to cessation of treatment. None of the included studies reported the proportion of eyes with cystoid macular edema (CME) at one week postoperatively. Based on four RCTs that reported CME at one month, we found low-certainty evidence that participants treated with an NSAID alone had a lower risk of developing CME compared with those treated with a corticosteroid alone (RR 0.26, 95% CI 0.17 to 0.41). No studies reported on other adverse events or economic outcomes. NSAIDs plus corticosteroids versus corticosteroids aloneNo study described intraocular inflammation in terms of cells and flare as a dichotomous variable and there was not enough continuous data for anterior chamber cell and flare to perform a meta-analysis. One study reported presence of corneal edema at various times. Postoperative treatment with neither a combination treatment with a NSAID plus corticosteroid or with corticosteroid alone was favored (RR 1.07, 95% CI 0.98 to 1.16). We judged this study to have high risk of reporting bias, and the certainty of the evidence was downgraded to moderate. No included study reported the proportion of participants with BCVA better than 20/40 at one week postoperatively or reported time to cessation of treatment. Only one included study reported on the presence of CME at one week after surgery and one study reported on CME at two weeks after surgery. After combining findings from these two studies, we estimated with low-certainty evidence that there was a lower risk of CME in the group that received NSAIDs plus corticosteroids (RR 0.17, 95% CI 0.03 to 0.97). Seven RCTs reported the proportion of participants with CME at one month postoperatively; however there was low-certainty evidence of a lower risk of CME in participants receiving an NSAID plus a corticosteroid compared with those receiving a corticosteroid alone (RR 0.50, 95% CI 0.23 to 1.06). The few adverse events reported were due to phacoemulsification rather than the eye drops. Authors' conclusions: We found insufficient evidence from this review to inform practice for treatment of postoperative inflammation after uncomplicated phacoemulsification. Based on the RCTs included in this review, we could not conclude the equivalence or superiority of NSAIDs with or without corticosteroids versus corticosteroids alone. There may be some risk reduction of CME in the NSAID-alone group and the combination of NSAID plus corticosteroid group. Future RCTs on these interventions should standardize the type of medication used, dosing, and treatment regimen; data should be collected and presented using the Standardization of Uveitis Nomenclature (SUN) outcome measures so that dichotomous outcomes can be analyzed.