Article

Replication in Empirical Economics: The Journal of Money, Credit and Banking Project

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper examines the role of replication in empirical economic research. It presents the findings of a two-year study that collected programs and data from authors and attempted to replicate their published results. The research provides new and important information about the extent and causes of failures to replicate published results in economics. The findings suggest that inadvertent errors in published empirical articles are a commonplace, rather than a rare, occurence. Copyright 1986 by American Economic Association.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... When is a replication even conceivable? 30 years ago, Dewald, Thursby, and Anderson (1986) were only able to obtain something resembling replication data for just over half of the economics papers for which they sought data; that figure has risen to between 60 and 80 percent; in other words, the rate of data unavailability has nearly been cut in half. 4 Getting usable data and code in hand to make a viable replication attempt was very hard in those days: of 54 received data sets, only 8 satisfied a basic usability criterion (yielding the 15 percent number from the 1986 study), driving the chances of even being able to try a replication below 10 percent. ...
... 4. Dewald, Thursby, and Anderson (1986) found that replication datasets were twice as likely to be available if the concerned papers were in the process of being published at the time of the replication attempt, compared to papers that had been published years before the attempt. ...
... 5. Datasets and programs failed to combine to produce viable and successful replications for a list of reasons that is, by turns, tragic, comic, incredible, and instructive. Dewald, Thursby, and Anderson (1986) describe a common situation: a regularly updated government dataset is used in analysis, but Ozier neither a copy of the relevant vintage of the public dataset nor the precise date when it was obtained is included in the replication files, thus preventing would-be replicators from knowing whether the dataset they obtain is the same as what the original study authors had used. Chang and Li (2015) mention, as the 1986 paper also did, the occasional problem of confidential datasets and unavailable software packages. ...
... In addition, empirical papers receive more citations than theoretical papers (Angrist et al., 2017). It is 35 years since Dewald et al. (1986) emphasized the importance of replications in empirical research. Out of a sample of 54 published papers in the Journal of Money, Credit and Banking in 1980, they found only eight (15%) were replicable without problems and 14 (26%) were incomplete. ...
... The literature suggests two reasons for the lack of replication studies (Dewald et al., 1986;Galiani et al., 2017;Vilhuber, 2020): ...
... Specifically, discussions about useful editorial procedures in dealing with empirical or other data-driven contributions have been raised frequently in the last decades. Since the seminal paper of Dewald et al. (1986), in which the authors systematically attempted to reproduce the results of published papers, the editorial procedures of journals in dealing with data have been under fire. The study of Dewald et al. suggested that errors in published empirical articles are "a commonplace rather than a rare occurrence" (Dewald et al., 1986, p. 587f). ...
... Studies have been dealing with questions of reproducible research and particularly with the data policies and data archives of economics journals since the late 1980s. The publication of Dewald et al. (1986), set a starting point to the ongoing debate. Their paper presented the findings of a project in which the authors collected programs and data from authors of the Journal of Money, Credit and Banking (JMCB). ...
Article
Full-text available
In the field of social sciences and particularly in economics, studies have frequently reported a lack of reproducibility of published research. Most often, this is due to the unavailability of data reproducing the findings of a study. However, over the past years, debates on open science practices and reproducible research have become stronger and louder among research funders, learned societies, and research organisations. Many of these have started to implement data policies to overcome these shortcomings. Against this background, the article asks if there have been changes in the way economics journals handle data and other materials that are crucial to reproduce the findings of empirical articles. For this purpose, all journals listed in the Clarivate Analytics Journal Citation Reports edition for economics have been evaluated for policies on the disclosure of research data. The article describes the characteristics of these data policies and explicates their requirements. Moreover, it compares the current findings with the situation some years ago. The results show significant changes in the way journals handle data in the publication process. Research libraries can use the findings of this study for their advisory activities to best support researchers in submitting and providing data as required by journals.
... Such metascientific studies in economics sometimes examine published work either through the lens of its unconsidered statistical properties (Ioannidis et al., 2017;Young, 2018). More often, these checks operate through attempts to replicate published papers, either focused on individual papers, or on many at once (Camerer et al., 2016;Chang & Li, 2017;Dewald et al., 1986). ...
... Pure replications will intentionally make the exact same choices as in the original study unless a clear error is spotted, and replications using new data or methods generally attempt to make the same choices except for the specific data set or method being changed, so as to isolate the source of any difference. Incentives to replicate, which favor results that overturn an original study (Dewald et al., 1986;Gertler et al., 2018;Hamermesh, 2007), do not favor looking into these choices. Even if results are found to be sensitive to researcher degrees of freedom, as long as the original choices are not obviously incorrect, it is difficult to make a convincing case to an editor that the results have been overturned. ...
Article
Researchers make hundreds of decisions about data collection, preparation, and analysis in their research. We use a many‐analysts approach to measure the extent and impact of these decisions. Two published causal empirical results are replicated by seven replicators each. We find large differences in data preparation and analysis decisions, many of which would not likely be reported in a publication. No two replicators reported the same sample size. Statistical significance varied across replications, and for one of the studies the effect's sign varied as well. The standard deviation of estimates across replications was 3–4 times the mean reported standard error.
... This paper adds to the growing literature on the reproducibility and credibility of academic research in economics and finance (see the surveys of Christensen andMiguel 2018 andColliard, Hurlin, andPérignon 2023). Early evidence was provided by Dewald, Thursby, and Anderson (1986) and McCullough, McGeary, and Harrison (2006) for the Journal of Money, Credit and Banking and by McCullough and Vinod (2003) and Glandon (2011) for the American Economic Review. ...
Article
Full-text available
We analyze the computational reproducibility of more than 1,000 empirical answers to 6 research questions in finance provided by 168 research teams. Running the researchers’ code on the same raw data regenerates exactly the same results only 52% of the time. Reproducibility is higher for researchers with better coding skills and those exerting more effort. It is lower for more technical research questions, more complex code, and results lying in the tails of the distribution. Researchers exhibit overconfidence when assessing the reproducibility of their own research. We provide guidelines for finance researchers and discuss implementable reproducibility policies for academic journals. (JEL C80, C87)
... The Journal of Money, Credit and Banking (JMCB) was one of the first journals to introduce a "data availability policy" and one of the first ones to be evaluated. Dewald et al. (1986) assess the first 54 studies subject to the policy. Only eight studies (14.8%) submitted materials that were deemed sufficient to attempt a reproduction, and only four of these studies could be reproduced without major issues. ...
Article
Full-text available
With the help of more than 700 reviewers, we assess the reproducibility of nearly 500 articles published in the journal Management Science before and after the introduction of a new Data and Code Disclosure policy in 2019. When considering only articles for which data accessibility and hardware and software requirements were not an obstacle for reviewers, the results of more than 95% of articles under the new disclosure policy could be fully or largely computationally reproduced. However, for 29% of articles, at least part of the data set was not accessible to the reviewer. Considering all articles in our sample reduces the share of reproduced articles to 68%. These figures represent a significant increase compared with the period before the introduction of the disclosure policy, where only 12% of articles voluntarily provided replication materials, of which 55% could be (largely) reproduced. Substantial heterogeneity in reproducibility rates across different fields is mainly driven by differences in data set accessibility. Other reasons for unsuccessful reproduction attempts include missing code, unresolvable code errors, weak or missing documentation, and software and hardware requirements and code complexity. Our findings highlight the importance of journal code and data disclosure policies and suggest potential avenues for enhancing their effectiveness. This paper was accepted by David Simchi-Levi, behavioral economics and decision analysis–fast track. Supplemental Material: The online appendices are available at https://doi.org/10.1287/mnsc.2023.03556 .
... Two-thirds of all political science publications disclosed in the American Political Science Review (APSR) between 2013 and 2014 did not furnish replication materials, according to Key [15]. In genetics [20], medicine [24], economics [3,5,17], and sociology [19] is still the same problem. As per Vines et al. [26], the availability of research data declined dramatically over time following publication, by 17% annually in 516 studies with article ages ranging from 2 to 22 years. ...
Conference Paper
Full-text available
0000−0003−3975−5374] , Kheir Eddine Farfar 1[0000−0002−0366−4596] , Allard Oelen 1[0000−0001−9924−9153] , Oliver Karras 1[0000−0001−5336−6899] , and Sören Auer 1,2[0000−0002−0698−2864] Abstract. One of the pillars of the scientific method is reproducibility-the ability to replicate the results of a prior study if the same procedures are followed. A lack of reproducibility can lead to wasted resources, false conclusions, and a loss of public trust in science. Ensuring reproducibility is challenging due to the heterogeneity of the methods used in different fields of science. In this article, we present an approach for increasing the reproducibility of research results, by semantically describing and interlinking relevant artifacts such as data, software scripts or simulations in a knowledge graph. In order to ensure the flexibility to adapt the approach to different fields of science, we devise a template model, which allows defining typical descriptions required to increase reproducibility of a certain type of study. We provide a scoring model for gradually assessing the reproducibility of a certain study based on the templates and provide a knowledge graph infrastructure for curating reproducibility descriptions along with semantic research contribution descriptions. We demonstrate the feasibility of our approach with an example in data science.
... The Journal of Money, Credit and Banking (JMCB) was one of the first journals to introduce a "data availability policy", and one of the first ones to be evaluated. Dewald et al. (1986) assess the first 54 studies subject to the policy. Only 8 studies (14.8%) submitted materials that were deemed sufficient to attempt a reproduction, and only 4 of these studies could be reproduced without major issues. ...
Preprint
Full-text available
With the help of more than 700 reviewers we assess the reproducibility of nearly 500 articles published in the journal Management Science before and after the introduction of a new Data and Code Disclosure policy in 2019. When considering only articles for which data accessibility and hard-and software requirements were not an obstacle for reviewers, the results of more than 95% of articles under the new disclosure policy could be fully or largely computationally reproduced. However, for almost 29% of articles at least part of the dataset was not accessible for the reviewer. Considering all articles in our sample reduces the share of reproduced articles to 68%. The introduction of the disclosure policy increased reproducibility significantly, since only 12% of articles accepted before the introduction of the disclosure policy voluntarily provided replication materials, out of which 55% could be (largely) reproduced. Substantial heterogeneity in reproducibility rates across different fields is mainly driven by differences in dataset accessibility. Other reasons for unsuccessful reproduction attempts include missing code, unresolvable code errors, weak or missing documentation, but also soft-and hardware requirements and code complexity. Our findings highlight the importance of journal code and data disclosure policies, and suggest potential avenues for enhancing their effectiveness.
... The Journal of Money, Credit and Banking (JMCB) was one of the first journals to introduce a "data availability policy", and one of the first ones to be evaluated. Dewald et al. (1986) assess the first 54 studies subject to the policy. Only 8 studies (14.8%) submitted materials that were deemed sufficient to attempt a reproduction, and only 4 of these studies could be reproduced without major issues. ...
Preprint
With the help of more than 700 reviewers we assess the reproducibility of nearly 500 articles published in the journal Management Science before and after the introduction of a new Data and Code Disclosure policy in 2019. When considering only articles for which data accessibility and hard-and software requirements were not an obstacle for reviewers, the results of more than 95% of articles under the new disclosure policy could be fully or largely computationally reproduced. However, for almost 29% of articles at least part of the dataset was not accessible for the reviewer. Considering all articles in our sample reduces the share of reproduced articles to 68%. The introduction of the disclosure policy increased reproducibility significantly, since only 12% of articles accepted before the introduction of the disclosure policy voluntarily provided replication materials, out of which 55% could be (largely) reproduced. Substantial heterogeneity in reproducibility rates across different fields is mainly driven by differences in dataset accessibility. Other reasons for unsuccessful reproduction attempts include missing code, unresolvable code errors, weak or missing documentation, but also soft-and hardware requirements and code complexity. Our findings highlight the importance of journal code and data disclosure policies, and suggest potential avenues for enhancing their effectiveness.
... Researchers' willingness to share data/code in the literature. Note: Dot-plot of data/code sharing rates in the literature[4,[19][20][21][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38]. Sample sizes of the respective sharing studies are denoted by dot size. ...
Article
Full-text available
Transparency and peer control are cornerstones of good scientific practice and entail the replication and reproduction of findings. The feasibility of replications, however, hinges on the premise that original researchers make their data and research code publicly available. This applies in particular to large-N observational studies, where analysis code is complex and may involve several ambiguous analytical decisions. To investigate which specific factors influence researchers' code sharing behavior upon request, we emailed code requests to 1,206 authors who published research articles based on data from the European Social Survey between 2015 and 2020. In this preregistered multifactorial field experiment, we randomly varied three aspects of our code request's wording in a 2x4x2 factorial design: the overall framing of our request (enhancement of social science research, response to replication crisis), the appeal why researchers should share their code (FAIR principles, academic altruism, prospect of citation, no information), and the perceived effort associated with code sharing (no code cleaning required, no information). Overall, 37.5% of successfully contacted authors supplied their analysis code. Of our experimental treatments, only framing affected researchers' code sharing behavior, though in the opposite direction we expected: Scientists who received the negative wording alluding to the replication crisis were more likely to share their research code. Taken together, our results highlight that the availability of research code will hardly be enhanced by small-scale individual interventions but instead requires large-scale institutional norms.
... Within the social sciences, the vast majority of datasets produced by sponsored research are never deposited or shared [35], and, as a result, reproducing published tables and fi gures, and directly extending prior results is o en diffi cult or impossible [36], [37], [38]. Similar problems exist in other fi elds. ...
... Within the social sciences, the vast majority of datasets produced by sponsored research are never deposited or shared [35], and, as a result, reproducing published tables and fi gures, and directly extending prior results is o en diffi cult or impossible [36], [37], [38]. Similar problems exist in other fi elds. ...
... Second is the potential for data entry and com putational errors. DeWald, et al. [2] attempted to replicate studies that had appeared in the Journal of Money, Credit, and Banking. They ran into significant problems even though they attem pted to use exactly the same data and analytic methods. ...
... Although the symposium focused on education and pedagogy, the last few decades have witnessed a parallel increase in attention to the reproducibility of professional research. Evidence of widespread nonreproducibility of published research across the social and natural sciences has been accumulating for more than 35 years (see, e.g., Dewald, Thursby, and Anderson 1986;Bollenetal.2015;Maniadis and Tufano 2017;Christensen and Miguel 2018). ...
Article
Full-text available
This paper synthesizes ideas that emerged over the course of a ten-week symposium titled “Teaching Reproducible Research: Educational Outcomes” that took place in the spring of 2021. The speakers included one linguist, three political scientists, seven psychologists, and three statisticians; about half of them were based in the US and about half in the UK.The symposium focused on a particular form of reproducibility–namely computational reproducibility–and the paper begins with an exposition of what computational reproducibility is and how it can be achieved. Drawing on talks by the speakers and comments from participants, the paper then enumerates several reasons for which learning reproducible research methods enhances the education of college and university students; the benefits have partly to do with developing computational skills that prepare students for future education and employment, but they also have to do with their intellectual development more broadly. The paper also distills insights from the symposium about practical strategies instructors can adopt to integrate reproducibility into their teaching, as well as to promote the practice among colleagues and throughout departmental curricula. The conceptual framework about the meaning and purposes of teaching reproducibility, and the practical guidance about how to get started, add up to an invitation to instructors to explore the potential for introducing reproducibility in their classes and research supervision.
... It publishes eight journals, including one of the top five journals in the discipline, in addition to several well-respected field journals. Concerns about the reliability and robustness of economic research have circulated in the AEA's membership for more than 30 years (Dewald et al. 1986;McCullough and Vinod 1999). The policy to require that articles provide copies of their replication materials, first implemented in 2004 (Bernanke 2004), was highly innovative at the time, but reflective of the membership's requests. ...
Article
Full-text available
We describe a unique environment in which undergraduate students from various STEM and social science disciplines are trained in data provenance and reproducible methods, and then apply that knowledge to real, conditionally accepted manuscripts and associated replication packages. We describe in detail the recruitment, training, and regular activities. While the activity is not part of a regular curriculum, the skills and knowledge taught through explicit training of reproducible methods and principles, and reinforced through repeated application in a real-life workflow, contribute to the education of these undergraduate students, and prepare them for post-graduation jobs and further studies. Supplementary materials for this article are available online.
... 1 The main conclusion from this literature is that the average reproducibility level remains generally low. Early evidence was provided by Dewald et al. (1986) and McCullough et al. (2006) for the Journal of Money, Credit and Banking and by McCullough and Vinod (2003) and Glandon (2011) for the American Economic Review. In a study of 67 articles published in 13 well-regarded economics journals, Chang and Li (2017) were able to reproduce the results for one-third of these papers from the code and data available on the journals' repositories. ...
Article
Full-text available
We analyze the computational reproducibility of more than 1,000 empirical answers to six research questions in finance provided by 168 international research teams. Surprisingly, neither researcher seniority, nor the quality of the research paper seem related to the level of reproducibility. Moreover, researchers exhibit strong overconfidence when assessing the reproducibility of their own research and underestimate the difficulty faced by their peers when attempting to reproduce their results. We further find that reproducibility is higher for researchers with better coding skills and for those exerting more effort. It is lower for more technical research questions and more complex code.
... It publishes 8 journals, including one of the top 5 journals in the discipline, in addition to several well-respected field journals. Concerns about the reliability and robustness of economic research have circulated in the AEA's membership for more than 30 years (Dewald et al., 1986;McCullough & Vinod, 1999). The policy to require that articles provide copies of their replication materials, first implemented in 2004 (Bernanke, 2004), was highly innovative at the time, but reflective of the membership's requests. ...
Preprint
Full-text available
We describe a unique environment in which undergraduate students from various STEM and social science disciplines are trained in data provenance and reproducible methods, and then apply that knowledge to real, conditionally accepted manuscripts and associated replication packages. We describe in detail the recruitment, training, and regular activities. While the activity is not part of a regular curriculum, the skills and knowledge taught through explicit training of reproducible methods and principles, and reinforced through repeated application in a real-life workflow, contribute to the education of these undergraduate students, and prepare them for post-graduation jobs and further studies.
... This is the domain of computational reproducibility studies that go far back in economics and finance. In economics, the American Economic Review (AER) was the first to introduce a Data Availability Policy in 2005 after the AER had published two studies that illustrated how hard it was to reproduce empirical studies (Dewald, Thursby, and Anderson, 1986;McCullough and Vinod, 2003). Reproducibility issues persisted as, at most, half of the empirical studies published in the top economics journals could be computationally reproduced (Glandon, 2011;Chang and Li, 2017;Gertler, Galiani, and Romero, 2018). ...
... This is the domain of computational reproducibility studies that go far back in economics and finance. In economics, the American Economic Review (AER) was the first to introduce a Data Availability Policy in 2005 after the AER had published two studies that illustrated how hard it was to reproduce empirical studies (Dewald, Thursby, and Anderson, 1986;McCullough and Vinod, 2003). Reproducibility issues persisted as, at most, half of the empirical studies published in the top economics journals could be computationally reproduced (Glandon, 2011;Chang and Li, 2017;Gertler, Galiani, and Romero, 2018). ...
Article
Full-text available
We conducted two large-scale, highly powered, randomized controlled trials intended to encourage consumer debt repayments. In Study 1, we implemented five treatments varying the design of envelopes sent to debtors. We did not find any treatment effects on response and repayment rates compared to the control condition. In Study 2, we varied the letters’ contents in nine treatments, implementing factorial combinations of social norm and (non-)deterrence nudges, which were either framed emotively or non-emotively. We find that all nudges are ineffective compared to the control condition and even tend to induce backfiring effects compared to the agency’s original letter. Since comparable nudges have been shown to be highly effective in other studies, our study supports the literature, emphasizing that the success of nudging interventions crucially depends on the domain of application.
... This is the domain of computational reproducibility studies that go far back in economics and finance. In economics, the American Economic Review (AER) was the first to introduce a Data Availability Policy in 2005 after the AER had published two studies that illustrated how hard it was to reproduce empirical studies (Dewald, Thursby, and Anderson, 1986;McCullough and Vinod, 2003). Reproducibility issues persisted as, at most, half of the empirical studies published in the top economics journals could be computationally reproduced (Glandon, 2011;Chang and Li, 2017;Gertler, Galiani, and Romero, 2018). ...
Article
Full-text available
In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in sample estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty: non-standard errors. To study them, we let 164 teams test six hypotheses on the same sample. We find that non-standard errors are sizeable, on par with standard errors. Their size (i) co-varies only weakly with team merits, reproducibility, or peer rating, (ii) declines significantly after peer-feedback, and (iii) is underestimated by participants.
... This is the domain of computational reproducibility studies that go far back in economics and finance. In economics, the American Economic Review (AER) was the first to introduce a Data Availability Policy in 2005 after the AER had published two studies that illustrated how hard it was to reproduce empirical studies (Dewald, Thursby, and Anderson, 1986;McCullough and Vinod, 2003). Reproducibility issues persisted as, at most, half of the empirical studies published in the top economics journals could be computationally reproduced (Glandon, 2011;Chang and Li, 2017;Gertler, Galiani, and Romero, 2018). ...
Preprint
Full-text available
In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in sample estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty: non-standard errors. To study them, we let 164 teams test six hypotheses on the same sample. We find that non-standard errors are sizeable, on par with standard errors. Their size (i) co-varies only weakly with team merits, reproducibility, or peer rating, (ii) declines significantly after peer-feedback, and (iii) is underestimated by participants.
... Reproducibility and replicability of scientific findings has been given great scrutiny in recent years (Camerer et al., 2016;Collaboration, 2015;Fanelli, 2018;Klein et al., 2014). 1 Actual published individual reproductions or replications are notably not very common (in economics, see Bell & Miller, 2013;Duvendack et al., 2017). In part, this is because it often was difficult to find the materials required to conduct reproducibility or replication exercises (Dewald et al., 1986;McCullough et al., 2006;McCullough & Vinod, 2003). 1 Scientific journals, whether run by publishing companies (Springer, Elsevier, etc.) or learned societies (American Economic Association, Midwest Political Science Association, American Statistical Association, Royal Statistical Society, to name just a few in the social and statistical sciences), have been playing an important role in supporting these efforts for many years (Stodden et al., 2016), and continue to explore novel and better ways of doing so. More and more journals are adopting "data and code availability" policies, 2 though some doubt has been cast on their effectiveness (Chang & Li, 2017a;Höffler, 2017a;Stodden et al., 2013;Stodden et al., 2018). ...
Article
Full-text available
We propose a metadata package that is intended to provide academic journals with a lightweight means of registering, at the time of publication, the existence and disposition of supplementary materials. Information about the supplementary materials is, in most cases, critical for the reproducibility and replicability of scholarly results. In many instances, these materials are curated by a third party, which may or may not follow developing standards for the identification and description of those materials. As such, the vocabulary described here complements existing initiatives that specify vocabularies to describe the supplementary materials or the repositories and archives in which they have been deposited. Where possible, it reuses elements of relevant other vocabularies, facilitating coexistence with them. Furthermore, it provides an “at publication” record of reproducibility characteristics of a particular article that has been selected for publication. The proposed metadata package documents the key characteristics that journals care about in the case of supplementary materials that are held by third parties: existence, accessibility, and permanence. It does so in a robust, time-invariant fashion at the time of publication, when the editorial decisions are made. It also allows for better documentation of less accessible (non-public data), by treating it symmetrically from the point of view of the journal, therefore increasing the transparency of what up until now has been very opaque.
... Christensen and Miguel (2018) summarize evidence of widespread replicability problems in economics. Many have reported difficulty obtaining materials necessary for replication even when journals publishing the work have adopted explicit disclosure policies (Dewald et al. 1986, McCullough 2009, Glandon 2010. The American Economic Review's Annual Report of the Editors shows a concerning trend related to papers for which the journal waives data disclosure requirements-increasing fairly steadily from 6% in 2005 to 46% in 2016. ...
Article
As part of a broader methodological reform movement, scientists are increasingly interested in improving the replicability of their research. Replicability allows others to perform replications to explore potential errors and statistical issues that might call the original results into question. Little attention, however, has been paid to the state of replicability in the field of empirical legal research (ELR). Quality is especially important in this field because empirical legal researchers produce work that is regularly relied upon by courts and other legal bodies. In this review, we summarize the current state of ELR relative to the broader movement toward replicability in the social sciences. As part of that aim, we summarize recent collective replication efforts in ELR and transparency and replicability guidelines adopted by journals that publish ELR. Based on this review, ELR seems to be lagging other fields in implementing reforms. We conclude with suggestions for reforms that might encourage improved replicability.
... The results suggest that top economics journals have no trouble implementing transparency standards. In fact, they made the first move in that direction in the 1980s when certain economics journals made publication conditional on providing data and code allowing the reproduction of results (Dewald, Thursby, and Anderson 1986). Thus, journal policies could require authors of quantitative studies based on publicly available data to meet higher-level requirements. ...
Article
Full-text available
Growing concerns about the credibility of scientific findings have sparked a debate on new transparency and openness standards in research. Management and organization studies scholars generally support the new standards, while emphasizing the unique challenges associated with their implementation in this paradigmatically diverse discipline. In this study, I analyze the costs to authors and journals associated with the implementation of new transparency and openness standards, and provide a progress report on the implementation level thus far. Drawing on an analysis of the submission guidelines of 60 empirical management journals, I find that the call for greater transparency was received, but resulted in implementations that were limited in scope and depth. Even standards that could have been easily adopted were left unimplemented, producing a paradoxical situation in which research designs that need transparency standards the most are not exposed to any, likely because the standards are irrelevant to other research designs.
... Stressing that confirmation through replication is a key part of the scientific method, Dewald et al. (1986) experience a general inability to replicate a set of empirical economic papers due to the inability or unwillingness of authors to provide data and programs along with clear documentation. For the nine papers they attempt to replicate, they find a number of programming errors and in some instances are unable to reproduce the published estimates. ...
Article
Full-text available
This paper is the second of a two-part series that provides essential context for any serious study of alternative risk premium (ARP) strategies. Practitioners uniformly emphasize the academic lineage of ARP strategies, regularly citing seminal papers. However, a single, comprehensive review of the copious research underpinning the category does not exist. This paper provides a comprehensive review of ARP’s academic roots, explaining that it sits at the confluence of decades of research on empirical anomalies, hedge fund replication, multi-factor models, and data snooping.
... In economics, one catalyst for these changes was the data-sharing policy adopted by the American Economic Association (AEA) in 2005, which came in response to growing evidence that many, if not most, published empirical analysis in economics could not be readily reproduced (Dewald, Thursby, and Anderson 1986;Bernanke 2004). The policy led to an almost immediate increase in the posting of data and analysis code for the American Economic Review (Christensen, Dafoe et al. 2019). ...
Article
A decade ago, the term “research transparency” was not on economists' radar screen, but in a few short years a scholarly movement has emerged to bring new open science practices, tools and norms into the mainstream of our discipline. The goal of this article is to lay out the evidence on the adoption of these approaches – in three specific areas: open data, pre-registration and pre-analysis plans, and journal policies – and, more tentatively, begin to assess their impacts on the quality and credibility of economics research. The evidence to date indicates that economics (and related quantitative social science fields) are in a period of rapid transition toward new transparency-enhancing norms. While solid data on the benefits of these practices in economics is still limited, in part due to their relatively recent adoption, there is growing reason to believe that critics' worst fears regarding onerous adoption costs have not been realized. Finally, the article presents a set of frontier questions and potential innovations.
... This report treats "reliability," reproducibility, and replicability as synonymous, grouping together studies of different phenomena. For example, the first paragraph cites concerns about mass irreplicability (including a citation to Open Science Collaboration 2015, a replication study); while the second paragraph cites Dewald, Thursby, and Anderson (1988) and Moffitt and Glandon (2011), both of which -in the RSS terminology -studied reproducibility and state explicitly that they did not attempt to replicate any of the papers examined. Later, Lutter and Zorn (2018) argue that "Access to the data necessary to replicate scientific studies is essential because the results of so many peer-reviewed scientific publications have proven to be impossible to reproduce," (Lutter and Zorn 2018, 15, my emphasis) supporting this with a mix of studies that examine either reproducibility or replicability but in no case both. ...
Article
Full-text available
Concerns about a crisis of mass irreplicability across scientific fields (“the replication crisis”) have stimulated a movement for open science, encouraging or even requiring researchers to publish their raw data and analysis code. Recently, a rule at the US Environmental Protection Agency (US EPA) would have imposed a strong open data requirement. The rule prompted significant public discussion about whether open science practices are appropriate for fields of environmental public health. The aims of this paper are to assess (1) whether the replication crisis extends to fields of environmental public health; and (2) in general whether open science requirements can address the replication crisis. There is little empirical evidence for or against mass irreplicability in environmental public health specifically. Without such evidence, strong claims about whether the replication crisis extends to environmental public health — or not — seem premature. By distinguishing three concepts — reproducibility, replicability, and robustness — it is clear that open data initiatives can promote reproducibility and robustness but do little to promote replicability. I conclude by reviewing some of the other benefits of open science, and offer some suggestions for funding streams to mitigate the costs of adoption of open science practices in environmental public health.
... Recommendations about the treatment of disease are based on the findings of medical research, but how can such findings be trusted when, according to Ioannidis, most of them are false? Clearly, there is a predictability problem that extends beyond medicine to practically all fields of social science, including economics (Camerer et al., 2016;Dewald, Thursby, & Anderson, 1986). Fortunately, empirical studies in the field of forecasting have provided us with some objective evidence that allows us to both determine the accuracy of predictions and estimate the level of uncertainty. ...
... With recent efforts showing some high profile works failing to reproduce [40][41][42], attempts have been made to determine why such works fail to reproduce [43,44], what policies can be taken to decrease reproduction failures [36,45] and whether such policies are effective [11,42,46]. Despite these efforts, scientific results remain challenging to reproduce across many disciplines [11,41,[47][48][49][50][51][52][53][54][55][56][57][58]. We return to these issues in the Discussion section, when we contrast our work with previous and related efforts. ...
Article
Full-text available
We carry out efforts to reproduce computational results for seven published articles and identify barriers to computational reproducibility. We then derive three principles to guide the practice and dissemination of reproducible computational research: (i) Provide transparency regarding how computational results are produced; (ii) When writing and releasing research software, aim for ease of (re-)executability; (iii) Make any code upon which the results rely as deterministic as possible. We then exemplify these three principles with 12 specific guidelines for their implementation in practice. We illustrate the three principles of reproducible research with a series of vignettes from our experimental reproducibility work. We define a novel Reproduction Package , a formalism that specifies a structured way to share computational research artifacts that implements the guidelines generated from our reproduction efforts to allow others to build, reproduce and extend computational science. We make our reproduction efforts in this paper publicly available as exemplar Reproduction Packages . This article is part of the theme issue ‘Reliability and reproducibility in computational science: implementing verification, validation and uncertainty quantification in silico ’.
Article
Placebo tests provide incentives to underreport statistically significant tests, a form of reversed p‐hacking. We test for such underreporting in 11 top economics journals between 2009 and 2021 based on a pre‐registered analysis plan. If the null hypothesis is true in all tests, 2.5% of them should be significant at the 5% level with an effect in the same direction as the main test (and 5% in total). The actual fraction of statistically significant placebo tests with an effect in the same direction is 1.29% (95% CI [0.83, 1.63]), and the overall fraction of statistically significant placebo tests is 3.10% (95% CI [2.2, 4.0]).
Article
This paper reviews the impact of replications published as comments in the American Economic Review between 2010 and 2020. We examine their citations and influence on the original papers' (OPs) subsequent citations. Our results show that comments are barely cited, and they do not affect the OP's citations—even if the comment diagnoses substantive problems. Furthermore, we conduct an opinion survey among replicators and authors and find that there often is no consensus on whether the OP's contribution sustains. We conclude that the economics literature does not self‐correct, and that robustness and replicability are hard to define in economics.
Chapter
List of works referred to in Armstrong & Green (2022) The Scientific Method
Article
We assess the impact of mandating data-sharing in economics journals on two dimensions of research credibility: statistical significance and excess statistical significance. Excess statistical significance is a necessary condition for publication selection bias. Quasi-experimental difference-in-differences analysis of 20,121 estimates published in 24 general interest and leading field journals shows that data-sharing policies have reduced reported statistical significance and the associated t-values. The magnitude of this reduction is large and of practical significance. We also find suggestive evidence that mandatory data-sharing reduces excess statistical significance and hence decreases publication bias.
Article
Full-text available
We examine the predictability of 299 capital market anomalies enhanced by 30 machine learning approaches and over 250 models in a dataset with more than 500 million firm-month anomaly observations. We find significant monthly (out-of-sample) returns of around 1.8–2.0%, and over 80% of the models yield returns equal to or larger than our linearly constructed baseline factor. For the best performing models, the risk-adjusted returns are significant across alternative asset pricing models, considering transaction costs with round-trip costs of up to 2% and including only anomalies after publication. Our results indicate that non-linear models can reveal market inefficiencies (mispricing) that are hard to conciliate with risk-based explanations.
Article
Recent research suggests educators can enhance the depth of capstone projects by assigning projects with real-world applications. We illustrate how the case method can be adapted for an undergraduate research experience course. We present an example case study project used in an economic consulting capstone course. Student teams receive a case narrative that includes a real-world request for a proposal. In response, they must formulate a research question, identify and analyze appropriate data to address it, and prepare several business memorandums, a final proposal, and an oral presentation to share their findings. We argue that while traditional undergraduate research helps students develop general data literacy and critical thinking skills, a case study format is better suited to simulate how these skills will be used once students enter their professional lives.
Technical Report
Reproducibility and transparency can be regarded (at least in experimental research) as a hallmark of research. The ability to reproduce research results in order to check their reliability is an important cornerstone of research that helps to guarantee the quality of research and to build on existing knowledge. The digital turn has brought more opportunities to document, share and verify research processes and outcomes. Consequently, there is an increasing demand for more transparency with regard to research processes and outcomes. This fits well with the open science agenda requiring, amongst others, open software, open data, and open access to publications even if openness alone does not guarantee reproducibility. The purpose of this activity of Knowledge Exchange was to explore current practices and barriers in the area of research reproducibility, with a focus on the publication and dissemination stage. We wanted to determine how technical and social infrastructures can support future developments in this area. In this work, we defined research reproducibility as cases where data and procedures shared by the authors of a study are used to obtain the same results as in their original work.
Article
A testable implication of the modern quantity theory of money, when viewed as a theory of inflation, is the joint hypothesis that (i) there is a one‐to‐one positive relationship between inflation and the money stock growth rate, (ii) there is a one‐to‐one negative relationship between inflation and the aggregate output growth rate, and (iii) there are no other determinants of inflation besides the money stock and aggregate output expansion rates. This implication is the theory's linchpin prediction. A recent prior study published in this journal examines cross‐country data and reports that this hypothesis cannot be rejected. The present study reexamines the prior study's data and finds that the joint hypothesis is decisively rejected, an unpleasant finding from a monetarist perspective. The article then goes on to propose an alternative to the prior study's model of the inflation process and reports findings that are, from the perspective of a monetarist, at least mildly pleasant.
Article
The emerging or at least threatening “significance test crisis” in accounting has been prompted by a chorus across multiple physical and social sciences of dissatisfaction with conventional frequentist statistical research methods and behaviors, particularly the use and abuse of p-levels. There are now hundreds of published papers and statements, echoing what has been said behind closed doors for decades, namely that much if not most empirical research is unreliable, simply wrong or at worst fabricated. The problems are a mixture of flawed statistical logic (as Bayesians have claimed for decades), “p-hacking” by way of fishing for significant results and publications, selective reporting or “the file drawer problem”, and ultimately the “agency problem” that researchers charged by funding bodies (their Universities, governments and taxpayers) with conducting disinterested “objective science” are motivated more by the personal need to publish and please other researchers. Expanding on that theme, the supply of empirical research in the “market for statistical significance” is described in terms of “market failure” and “the market for lemons”.
ResearchGate has not been able to resolve any references for this publication.