Article

From consolidation to disruption: A novel way to measure the impact of scientists and identify laureates

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This study proposes a novel approach for evaluating the impact of scientists by introducing a new set of metrics and a dual measurement framework that combines the concepts of disruption and consolidation. Traditional metrics like total citation and h-index are limited in their ability to capture the full range of a scientist's influence, and therefore the Scientists' Disruptive Citation (SDC), Disruptive h-index (D h-index), and consolidating metrics are introduced to provide a more comprehensive evaluation of scientists' disruptive and consolidating influence. Using a dataset of 463,348 papers, 234,086 disambiguated scientists, and data on three important awards, including Nobel Prize, Wolf Prize, and Dirac Medal, in the field of Physics, this study demonstrates that the SDC and D h-index are superior to all benchmark metrics, including the conventional and normalized disruption-based measures, in terms of convergent validity. Second, this study analyzes the distribution of academic characteristics between award-winning and non-laureates, explores various metrics of scientists with high SDC and Scientists' Consolidating Citation (SCC), and finds that disruptive impact can identify successful scientists from their counterparts and serve as an early signal of successful scientists. Third, this study reveals that the disruptive citation proposed in this study is less susceptible to manipulation, making it a more reliable metric for assessing a scientist's or a single paper's disruptive impact than the CD-index. The results suggest that the SDC and D h-index are reliable metrics for measuring scientists' innovative influence and can aid in the development of future scientific research. Overall, this study provides a scientifically sound and effective new perspective on measuring scientists using a dual measurement of disruptive and consolidating influence.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... It is important to note that we chose a selection of commonly used benchmarks for comparison and did not include more sophisticated measures such as the PageRank (Brin & Page, 1998), disruptive indegree, or SDC (Yang, Hu, et al., 2023) as benchmarks. However, we did compare the k-step h-index with these state-of-the-art benchmarks and found that the k-step h-index may not be the optimal method for ranking expert-selected items (refer to S.M. ...
... However, it is important to acknowledge that k-step h-indices are not the state-of-the-art measures for identifying milestone papers or laureates. In fact, they perform worse than certain network-based metrics, such as the PageRank (Brin & Page, 1998) and disruptive indegree (Yang, Hu, et al., 2023) at the paper level and SDC (Yang, Hu, et al., 2023) at the author level (refer to S.M. Note 4 for detailed information). ...
... However, it is important to acknowledge that k-step h-indices are not the state-of-the-art measures for identifying milestone papers or laureates. In fact, they perform worse than certain network-based metrics, such as the PageRank (Brin & Page, 1998) and disruptive indegree (Yang, Hu, et al., 2023) at the paper level and SDC (Yang, Hu, et al., 2023) at the author level (refer to S.M. Note 4 for detailed information). ...
Article
The evaluation of scientific impact plays a crucial role in assessing research contributions. In this study, we introduce the concept of the k-step h-index and investigate its applicability in citation networks at different levels, including papers, authors, and institutions. By incorporating higher generations of citation information, the k-step h-index provides a comprehensive and nuanced measure of scientific influence. It demonstrates exponential growth in k-step citations, capturing valuable information from the Hirsch core and tail. Through power law distribution analysis, we uncover the presence of highly influential entities coexisting with less influential ones, revealing the heterogeneity of impact within citation networks. To validate the effectiveness of the k-step h-index, we utilize a vast dataset from APS, conducting a thorough examination of its consistency and convergent validity. Our findings demonstrate strong correlations between the k-step h-index and conventional metrics, as well as alignment with measures of innovation. This confirms the reliability of the k-step h-index and its ability to capture innovative contributions. Notably, when compared to benchmarks, the k-step h-index outperforms in accurately ranking expert-selected items, including milestone papers, distinguished authors, and prestigious institutions. Higher values of the k-step h-index consistently exhibit superior performance, showcasing their predictive power in identifying prominent scientific entities. These findings hold significant implications for research evaluation, policy-making, and strategic planning, as they pave the way for a more holistic understanding of scholarly contributions.
... The indices are "heavily dependent on the references cited in the focal paper. For instance, a researcher could artificially inflate their disruptive citation by citing references with minimal citation or by citing fewer references or even omitting references altogether" (Yang et al, 2023 ...
... Furthermore, Yang, Hu, et al. (2023) and R. proposed different ways to incorporate the DI1 to the evaluation of scientists' research impact. ...
Preprint
Full-text available
The purpose of this paper is to provide a review of the literature on the original disruption index (DI1) and its variants in scientometrics. The DI1 has received much media attention and prompted a public debate about science policy implications, since a study published in Nature found that papers in all disciplines and patents are becoming less disruptive over time. This review explains in the first part the DI1 and its variants in detail by examining their technical and theoretical properties. The remaining parts of the review are devoted to studies that examine the validity and the limitations of the indices. Particular focus is placed on (1) possible biases that affect disruption indices (2) the convergent and predictive validity of disruption scores, and (3) the comparative performance of the DI1 and its variants. The review shows that, while the literature on convergent validity is not entirely conclusive, it is clear that some modified index variants, in particular DI5, show higher degrees of convergent validity than DI1. The literature draws attention to the fact that (some) disruption indices suffer from inconsistency, time-sensitive biases, and several data-induced biases. The limitations of disruption indices are highlighted and best practice guidelines are provided. The review encourages users of the index to inform about the variety of DI1 variants and to apply the most appropriate variant. More research on the validity of disruption scores as well as a more precise understanding of disruption as a theoretical construct is needed before the indices can be used in the research evaluation practice.
Article
Delayed recognition, exemplified by the phenomenon of sleeping beauties, presents a compelling narrative within the dynamics of scientific impact and innovation. Our investigation delves into the nuanced facets of delayed acknowledgement, uncovering its profound implications and innovation pathways. Through the analysis of extensive datasets and advanced methodologies, we elucidate the intricate connections between delayed recognition and the realms of scientific and technological influence. Our study not only reveals correlations between atypical combinations of knowledge and the emergence of sleeping beauties but also sheds light on the relationship between delayed recognition and disruptive paradigm shifts in scientific evolution, suggesting their potential role in shaping scientific breakthroughs. Furthermore, our analysis highlights the journey of delayed recognition, often culminating in significant contributions across diverse fields, including notable achievements, such as Nobel-worthy milestones. This article advances our understanding of scientific evolution and the complex landscape of acknowledging pioneering research.
Article
In the dynamic landscape of contemporary scientific research characterized by increasing collaboration, this study, leveraging a comprehensive dataset spanning six decades and encompassing 16 diverse fields with 30 million journal papers, conducts the first large-scale analysis of age structure within scientific teams. Our findings illuminate a consistent upward trajectory in the average team age over time, coupled with a concurrent decline in team age diversity. Examining their intricate relationships with scientific impact, we unveil intriguing inverted-U associations between team age, team age diversity, and scientific impact. This underscores the optimal performance of moderately aged and diverse teams in terms of team impact. Additionally, our research uncovers a U-shaped relationship between team age, team age diversity, and scientific disruption, emphasizing the disruptive potential of extreme team age patterns. Importantly, these discerned patterns hold robustly across various fields and team sizes, offering valuable insights for strategically composing scientific teams and enhancing their productivity in the collaborative landscape of scientific research.
Article
Full-text available
Encompassing an intricately profound propensity for revolutionary, paradigm-shifting ramifications and the potential to wield an irrefutably disruptive sway on forthcoming research endeavors, the notion of the Disruption Index (DI) has surfaced as an object of fervent scientific scrutiny within the realm of scientometrics. Nevertheless, its implementation faces multifaceted constraints. Through a meticulous inquiry, we methodically dissect the limitations of DI, encompassing: (a) susceptibility to variations in reference numbers, (b) vulnerability to intentional author manipulations, (c) heterogeneous manifestations across diverse subject fields, (d) disparities across publication years, (e) misalignment with established scientific impact measures, (f) inadequacy in convergent validity with expert-selected milestones , and (g) a prevalent concentration around zero in its distribution. Unveiling the root causes of these challenges, we propose a viable solution encapsulated in the Rescaled Disruption Index (RDI), achieved through comprehensive rescaling across fields, years, and references. Our empirical investigations unequivocally demonstrate the efficacy of RDI, unveiling the universal nature of disruption distributions in science. This introduces a robust and refined framework for assessing disruptive potential in the scientific landscape while preserving the core principles of the index.
Article
In the relentless pursuit of scientific advancement, comprehending the profound impact and innovation nature inherent in funded research projects assumes paramount significance. To illuminate this matter, I delve into the realm of research supported by the National Institutes of Health (NIH) and the National Science Foundation (NSF). The evaluative framework encompasses a spectrum of metrics, including citations by papers, patents, and Tweets, as markers of research impact. Moreover, I embrace ex-ante innovation (Novelty) and ex-post innovation (Disruption) as dual indispensable yardsticks for evaluating the innovative nature of research projects. Novelty denotes the manifestation of atypical combinations of existing knowledge, while Disruption signifies the extent of paradigm-shifting potential and the ability to exert a disruptive influence on future research endeavors. First, the analysis reveals that funded research projects manifest a conspicuously heightened impact in comparison to their non-funded counterparts. Second, I uncover a noteworthy finding: funded research demonstrates significantly higher levels of ex-ante innovation (Novelty). However, in a surprising twist, the impact of funding on ex-post innovation (Disruption) appears to be faint. Additionally, I undertake a meticulous scrutiny of the robustness of the research findings by scrutinizing patterns across years and fields. Despite the uneven distribution of NIH and NSF funded research and inconspicuous heterogeneity across fields, the patterns of the impact and dual innovation of funded research are consistent across almost all fields.
Article
Full-text available
The purpose of this paper is to provide a review of the literature on the original disruption index (DI1) and its variants in scientometrics. The DI1 has received much media attention and prompted a public debate about science policy implications, since a study published in Nature found that papers in all disciplines and patents are becoming less disruptive over time. This review explains in the first part the DI1 and its variants in detail by examining their technical and theoretical properties. The remaining parts of the review are devoted to studies that examine the validity and the limitations of the indices. Particular focus is placed on (1) possible biases that affect disruption indices (2) the convergent and predictive validity of disruption scores, and (3) the comparative performance of the DI1 and its variants. The review shows that, while the literature on convergent validity is not entirely conclusive, it is clear that some modified index variants, in particular DI5 , show higher degrees of convergent validity than DI1. The literature draws attention to the fact that (some) disruption indices suffer from inconsistency, time-sensitive biases, and several data-induced biases. The limitations of disruption indices are highlighted and best practice guidelines are provided. The review encourages users of the index to inform about the variety of DI1 variants and to apply the most appropriate variant. More research on the validity of disruption scores as well as a more precise understanding of disruption as a theoretical construct is needed before the indices can be used in the research evaluation practice.
Article
Full-text available
Theories of scientific and technological change view discovery and invention as endogenous processes1,2, wherein previous accumulated knowledge enables future progress by allowing researchers to, in Newton’s words, ‘stand on the shoulders of giants’3–7. Recent decades have witnessed exponential growth in the volume of new scientific and technological knowledge, thereby creating conditions that should be ripe for major advances8,9. Yet contrary to this view, studies suggest that progress is slowing in several major fields10,11. Here, we analyse these claims at scale across six decades, using data on 45 million papers and 3.9 million patents from six large-scale datasets, together with a new quantitative metric—the CD index12—that characterizes how papers and patents change networks of citations in science and technology. We find that papers and patents are increasingly less likely to break with the past in ways that push science and technology in new directions. This pattern holds universally across fields and is robust across multiple different citation- and text-based metrics1,13–17. Subsequently, we link this decline in disruptiveness to a narrowing in the use of previous knowledge, allowing us to reconcile the patterns we observe with the ‘shoulders of giants’ view. We find that the observed declines are unlikely to be driven by changes in the quality of published science, citation practices or field-specific factors. Overall, our results suggest that slowing rates of disruption may reflect a fundamental shift in the nature of science and technology. A decline in disruptive science and technology over time is reported, representing a substantive shift in science and technology, which is attributed in part to the reliance on a narrower set of existing knowledge.
Article
Full-text available
Compared to previous studies that generally detect scientific breakthroughs based on citation patterns, this article proposes a knowledge entity‐based disruption indicator by quantifying the change of knowledge directly created and inspired by scientific breakthroughs to their evolutionary trajectories. Two groups of analytic units, including MeSH terms and their co‐occurrences, are employed independently by the indicator to measure the change of knowledge. The effectiveness of the proposed indicators was evaluated against the four datasets of scientific breakthroughs derived from four recognition trials. In terms of identifying scientific breakthroughs, the proposed disruption indicator based on MeSH co‐occurrences outperforms that based on MeSH terms and three earlier citation‐based disruption indicators. It is also shown that in our indicator, measuring the change of knowledge inspired by the focal paper in its evolutionary trajectory is a larger contributor than measuring the change created by the focal paper. Our study not only offers empirical insights into conceptual understanding of scientific breakthroughs but also provides practical disruption indicator for scientists and science management agencies searching for valuable research.
Article
Full-text available
In scientific research, collaboration is one of the most effective ways to take advantage of new ideas, skills, and resources and for performing interdisciplinary research. Although collaboration networks have been intensively studied, the question of how individual scientists choose collaborators to study a new research topic remains almost unexplored. Here, we investigate the statistics and mechanisms of collaborations of individual scientists along their careers, revealing that, in general, collaborators are involved in significantly fewer topics than expected from a controlled surrogate. In particular, we find that highly productive scientists tend to have a higher fraction of single-topic collaborators, while highly cited—i.e., impactful—scientists have a higher fraction of multitopic collaborators. We also suggest a plausible mechanism for this distinction. Moreover, we investigate the cases where scientists involve existing collaborators in a new topic. We find that, compared to productive scientists, impactful scientists show strong preference of collaboration with high-impact scientists on a new topic. Finally, we validate our findings by investigating active scientists in different years and across different disciplines.
Article
Full-text available
Newton’s centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a novel, discipline-independent method to identify the giant for any individual paper, allowing us to systematically examine the role and characteristics of giants in science. We find that across disciplines, about 95% of papers stand on the shoulders of giants, yet the weight of scientific progress rests on relatively few shoulders. Defining a new measure of giant index, we find that, while papers with high citations are more likely to be giants, for papers with the same citations, their giant index sharply predicts a paper’s future impact and prize-winning probabilities. Giants tend to originate from both small and large teams, being either highly disruptive or highly developmental. And papers that did not have a giant but later became a giant tend to be home-run papers that are highly disruptive to science. Given the crucial importance of citation-based measures in science, the developed concept of giants may offer a useful new dimension in assessing scientific impact that goes beyond sheer citation counts. Peer Review https://publons.com/publon/10.1162/qss_a_00186
Article
Full-text available
Finding the lineage of a research topic is crucial for understanding the prior state of the art and advancing scientific displacement. The deluge of scholarly articles makes it difficult to locate the most relevant previous work. It causes researchers to spend a considerable amount of time building up their literature list. Citations play a crucial role in discovering relevant literature. However, not all citations are created equal. The majority of the citations that a paper receives provide contextual and background information to the citing papers. In those cases, the cited paper is not central to the theme of citing papers. However, some papers build upon a given paper, further the research frontier. In those cases, the concerned cited paper plays a pivotal role in the citing paper. Hence, the nature of citation the former receives from the latter is significant. In this work, we discuss our investigations towards discovering significant citations of a given paper. We further show how we can leverage significant citations to build a research lineage via a significant citation graph. We demonstrate the efficacy of our idea with two real-life case studies. Our experiments yield promising results with respect to the current state-of-the-art in classifying significant citations, outperforming the earlier ones by a relative margin of 20 points in terms of precision. We hypothesize that such an automated system can facilitate relevant literature discovery and help identify knowledge flow for a particular category of papers.
Article
Full-text available
Science and technology develop not only along historical trajectories, but also as next-order regimes that periodically change the landscape. Regimes can incur on trajectories which are then disrupted. Using citations and references for the operationalization, we discuss and quantify both the recently proposed “disruption indicator” and the older indicator for “critical transitions” among reference lists as changes which may necessitate a rewriting of history. We elaborate this with three examples in order to provide a proof of concept. We shall show how the indicators can be calculated using Web-of-Science data. The routine is automated (available at < http://www.leydesdorff.net/software/di/index.htm >) so that it can be upscaled in future research. We suggest that “critical transitions” can be used to indicate disruption at the regime level, whereas disruption is developed at the trajectory level. Both conceptually and empirically, however, continuity is grasped more easily than disruption.
Article
Full-text available
The disruption index (DI) based on bibliographic coupling and uncoupling between a document and its references was first proposed by Funk & Owen-Smith (2017) for citation relations among patents and then adapted for scholarly papers by Wu et al. (2019). However, Wu & Wu (2019) argued that this indicator would be inconsistent. We propose revised disruption indices (DI* and DI#) which make the indicator theoretically more robust and consistent. Along similar lines, Chen et al. (2020) developed the indicator into two dimensions: disruption and consolidation. We elaborate the improvements in simulations and empirically. The relations between disruption, consolidation, and bibliographic coupling are further specified. Bibliographic coupling of a focal paper with its cited references generates historical continuity. A two-dimensional framework is used to conceptualize dis-continuity not as a residual, but a dimension which can further be specified.
Article
Full-text available
The citation impact of a scientific publication is usually seen as a one-dimensional concept. We introduce a multi-dimensional framework for characterizing the citation impact of a publication. In addition to the level of citation impact, quantified by the number of citations received by a publication, we also conceptualize and operationalize the depth and breadth and the dependence and independence of the citation impact of a publication. The proposed framework distinguishes between publications that have a deep citation impact, typically in a relatively narrow research area, and publications that have a broad citation impact, probably covering a wider area of research. It also makes a distinction between publications that are strongly dependent on earlier work and publications that make a more independent scientific contribution. We use our multi-dimensional citation impact framework to report basic descriptive statistics on the citation impact of highly cited publications in all scientific disciplines. In addition, we present a detailed case study focusing on the field of scientometrics. The proposed citation impact framework provides a more in-depth understanding of the citation impact of a publication than a traditional one-dimensional perspective. Peer Review https://publons.com/publon/10.1162/qss_a_00109
Article
Full-text available
Recently, Wu, Wang, and Evans (2019) proposed a new family of indicators, which measure whether a scientific publication is disruptive to a field or tradition of research. Such disruptive influences are characterized by citations to a focal paper, but not its cited references. In this study, we are interested in the question of convergent validity. We used external criteria of newness to examine convergent validity: in the post-publication peer review system of F1000Prime, experts assess papers whether the reported research fulfills these criteria (e.g., reports new findings). This study is based on 120,179 papers from F1000Prime published between 2000 and 2016. In the first part of the study we discuss the indicators. Based on the insights from the discussion, we propose alternate variants of disruption indicators. In the second part, we investigate the convergent validity of the indicators and the (possibly) improved variants. Although the results of a factor analysis show that the different variants measure similar dimensions, the results of regression analyses reveal that one variant (????????5) performs slightly better than the others.
Article
Full-text available
Significance By analyzing data from nearly all US PhD recipients and their dissertations across three decades, this paper finds demographically underrepresented students innovate at higher rates than majority students, but their novel contributions are discounted and less likely to earn them academic positions. The discounting of minorities’ innovations may partly explain their underrepresentation in influential positions of academia.
Article
Full-text available
Wu et al. (Nature 566:378–382, 2019) introduced a new indicator measuring disruption (\({DI}_{1}\)). Bornmann et al. (Do disruption index indicators measure what they propose to measure? The comparison of several indicator variants with assessments by peers, 2019. https://arxiv.org/abs/1911.08775) compared variants of the disruption index and pointed to \({DI}_{5}\) as an interesting variant. The calculation of a field-specific version of \({DI}_{5}\) (focusing on disruptiveness within the same field) for Scientometrics papers in the current study reveals that the variant is possibly able to identify landmark papers in scientometrics. This result is in contrast to the Scientometrics analysis previously published by Bornmann and Tekles (Scientometrics 120(1):331–336, 2019) based on the original disruption index (\({DI}_{1}\)).
Article
Full-text available
An ongoing project explores the extent to which artificial intelligence (AI), specifically in the areas of natural language processing and semantic reasoning, can be exploited to facilitate the studies of science by deploying software agents equipped with natural language understanding capabilities to read scholarly publications on the web. The knowledge extracted by these AI agents is organized into a heterogeneous graph, called Microsoft Academic Graph (MAG), where the nodes and the edges represent the entities engaging in scholarly communications and the relationships among them, respectively. The frequently updated data set and a few software tools central to the underlying AI components are distributed under an open data license for research and commercial applications. This paper describes the design, schema, and technical and business motivations behind MAG and elaborates how MAG can be used in analytics, search, and recommendation scenarios. How AI plays an important role in avoiding various biases and human induced errors in other data sets and how the technologies can be further improved in the future are also discussed.
Article
Full-text available
Human achievements are often preceded by repeated attempts that fail, but little is known about the mechanisms that govern the dynamics of failure. Here, building on previous research relating to innovation1–7, human dynamics8–11 and learning12–17, we develop a simple one-parameter model that mimics how successful future attempts build on past efforts. Solving this model analytically suggests that a phase transition separates the dynamics of failure into regions of progression or stagnation and predicts that, near the critical threshold, agents who share similar characteristics and learning strategies may experience fundamentally different outcomes following failures. Above the critical point, agents exploit incremental refinements to systematically advance towards success, whereas below it, they explore disjoint opportunities without a pattern of improvement. The model makes several empirically testable predictions, demonstrating that those who eventually succeed and those who do not may initially appear similar, but can be characterized by fundamentally distinct failure dynamics in terms of the efficiency and quality associated with each subsequent attempt. We collected large-scale data from three disparate domains and traced repeated attempts by investigators to obtain National Institutes of Health (NIH) grants to fund their research, innovators to successfully exit their startup ventures, and terrorist organizations to claim casualties in violent attacks. We find broadly consistent empirical support across all three domains, which systematically verifies each prediction of our model. Together, our findings unveil detectable yet previously unknown early signals that enable us to identify failure dynamics that will lead to ultimate success or failure. Given the ubiquitous nature of failure and the paucity of quantitative approaches to understand it, these results represent an initial step towards the deeper understanding of the complex dynamics underlying failure.
Article
Full-text available
One of the most universal trends in science and technology today is the growth of large teams in all areas, as solitary researchers and small teams diminish in prevalence1–3. Increases in team size have been attributed to the specialization of scientific activities³, improvements in communication technology4,5, or the complexity of modern problems that require interdisciplinary solutions6–8. This shift in team size raises the question of whether and how the character of the science and technology produced by large teams differs from that of small teams. Here we analyse more than 65 million papers, patents and software products that span the period 1954–2014, and demonstrate that across this period smaller teams have tended to disrupt science and technology with new ideas and opportunities, whereas larger teams have tended to develop existing ones. Work from larger teams builds on more-recent and popular developments, and attention to their work comes immediately. By contrast, contributions by smaller teams search more deeply into the past, are viewed as disruptive to science and technology and succeed further into the future—if at all. Observed differences between small and large teams are magnified for higher-impact work, with small teams known for disruptive work and large teams for developing work. Differences in topic and research design account for a small part of the relationship between team size and disruption; most of the effect occurs at the level of the individual, as people move between smaller and larger teams. These results demonstrate that both small and large teams are essential to a flourishing ecology of science and technology, and suggest that, to achieve this, science policies should aim to support a diversity of team sizes.
Article
Full-text available
Background Understanding the impact of a publication by using bibliometric indices becomes an essential activity not only for universities and research institutes but also for individual academicians. This paper aims to provide a brief review of the current bibliometric tools used by authors and editors and proposes an algorithm to assess the relevance of the most common bibliometric tools to help the researchers select the fittest journal and know the trends of published submissions by using self-evaluation. Methods We present a narrative review answering at least two related consecutive questions triggered by the topics mentioned above. How prestigious is a journal based on its most recent bibliometrics, so authors may choose it to submit their next manuscript? And, how can they self-evaluate/understand the impact of their whole publishing scientific life? Results We presented the main relevant definitions of each bibliometrics and grouped them in those oriented to evaluated journals or individuals. Also, we share with our readers our algorithm to assess journals before manuscript submission. Conclusions Since there is a journal performance market and an article performance market, each one with its patterns, an integrative use of these metrics, rather than just the impact factor alone, might represent the fairest and most legitimate approach to assess the influence and importance of an acceptable research issue, and not only a sound journal in their respective disciplines.
Article
Full-text available
The hot streak-loosely defined as 'winning begets more winnings'-highlights a specific period during which an individual's performance is substantially better than his or her typical performance. Although hot streaks have been widely debated in sports1,2, gambling3-5 and financial markets6,7 over the past several decades, little is known about whether they apply to individual careers. Here, building on rich literature on the lifecycle of creativity8-22, we collected large-scale career histories of individual artists, film directors and scientists, tracing the artworks, films and scientific publications they produced. We find that, across all three domains, hit works within a career show a high degree of temporal regularity, with each career being characterized by bursts of high-impact works occurring in sequence. We demonstrate that these observations can be explained by a simple hot-streak model, allowing us to probe quantitatively the hot streak phenomenon governing individual careers. We find this phenomemon to be remarkably universal across diverse domains: hot streaks are ubiquitous yet usually unique across different careers. The hot streak emerges randomly within an individual's sequence of works, is temporally localized, and is not associated with any detectable change in productivity. We show that, because works produced during hot streaks garner substantially more impact, the uncovered hot streaks fundamentally drive the collective impact of an individual, and ignoring this leads us to systematically overestimate or underestimate the future impact of a career. These results not only deepen our quantitative understanding of patterns that govern individual ingenuity and success, but also may have implications for identifying and nurturing individuals whose work will have lasting impact.
Article
Full-text available
The whys and wherefores of SciSci The science of science (SciSci) is based on a transdisciplinary approach that uses large data sets to study the mechanisms underlying the doing of science—from the choice of a research problem to career trajectories and progress within a field. In a Review, Fortunato et al. explain that the underlying rationale is that with a deeper understanding of the precursors of impactful science, it will be possible to develop systems and policies that improve each scientist's ability to succeed and enhance the prospects of science as a whole. Science , this issue p. eaao0185
Article
Full-text available
This work presents a new approach for analysing the ability of existing research metrics to identify research which has strongly influenced future developments. More specifically, we focus on the ability of citation counts and Mendeley reader counts to distinguish between publications regarded as seminal and publications regarded as literature reviews by field experts. The main motivation behind our research is to gain a better understanding of whether and how well the existing research metrics relate to research quality. For this experiment we have created a new dataset which we call TrueImpactDataset and which contains two types of publications, seminal papers and literature reviews. Using the dataset, we conduct a set of experiments to study how citation and reader counts perform in distinguishing these publication types, following the intuition that causing a change in a field signifies research quality. Our research shows that citation counts work better than a random baseline (by a margin of 10%) in distinguishing important seminal research papers from literature reviews while Mendeley reader counts do not work better than the baseline.
Article
Full-text available
The science of science (SOS) is a rapidly developing field which aims to understand, quantify and predict scientific research and the resulting outcomes. The problem is essentially related to almost all scientific disciplines and thus has attracted attention of scholars from different backgrounds. Progress on SOS will lead to better solutions for many challenging issues, ranging from the selection of candidate faculty members by a university to the development of research fields to which a country should give priority. While different measurements have been designed to evaluate the scientific impact of scholars, journals and academic institutions, the multiplex structure, dynamics and evolution mechanisms of the whole system have been much less studied until recently. In this article, we review the recent advances in SOS, aiming to cover the topics from empirical study, network analysis, mechanistic models, ranking, prediction, and many important related issues. The results summarized in this review significantly deepen our understanding of the underlying mechanisms and statistical rules governing the science system. Finally, we review the forefront of SOS research and point out the specific difficulties as they arise from different contexts, so as to stimulate further efforts in this emerging interdisciplinary field.
Article
Full-text available
Citations between scientific papers and related bibliometric indices, such as the $h$-index for authors and the impact factor for journals, are being increasingly used -- often in controversial ways -- as quantitative tools for research evaluation. Yet, a fundamental research question remains still open: to which extent do quantitative metrics capture the significance of scientific works? We analyze the network of citations among the $449,935$ papers published by the American Physical Society (APS) journals between $1893$ and $2009$, and focus on the comparison of metrics built on the citation count with network-based metrics. We contrast five article-level metrics with respect to the rankings that they assign to a set of fundamental papers, called Milestone Letters, carefully selected by the APS editors for "making long-lived contributions to physics, either by announcing significant discoveries, or by initiating new areas of research". A new metric, which combines PageRank centrality with the explicit requirement that paper score is not biased by paper age, outperforms the others in identifying the Milestone Letters short after they are published. The lack of time bias in the new metric makes it also possible to use it to compare papers of different age on the same scale. We find that network-based metrics generally identify the Milestone Letters better than metrics based on the citation count, which suggests that the structure of the citation network contains information that can be used to improve the ranking of scientific publications. The methods and results presented here are relevant for all evolving systems where network centrality metrics are applied, for example the World Wide Web and online social networks.
Article
Full-text available
Identifying influential nodes in dynamical processes is crucial in understanding network structure and function. Degree, H-index and coreness are widely used metrics, but previously treated as unrelated. Here we show their relation by constructing an operator, in terms of which degree, H-index and coreness are the initial, intermediate and steady states of the sequences, respectively. We obtain a family of H-indices that can be used to measure a node's importance. We also prove that the convergence to coreness can be guaranteed even under an asynchronous updating process, allowing a decentralized local method of calculating a node's coreness in large-scale evolving networks. Numerical analyses of the susceptible-infected-removed spreading dynamics on disparate real networks suggest that the H-index is a good tradeoff that in many cases can better quantify node influence than either degree or coreness.
Article
Full-text available
Significance Scientists perform a tiny subset of all possible experiments. What characterizes the experiments they choose? And what are the consequences of those choices for the pace of scientific discovery? We model scientific knowledge as a network and science as a sequence of experiments designed to gradually uncover it. By analyzing millions of biomedical articles published over 30 y, we find that biomedical scientists pursue conservative research strategies exploring the local neighborhood of central, important molecules. Although such strategies probably serve scientific careers, we show that they slow scientific advance, especially in mature fields, where more risk and less redundant experimentation would accelerate discovery of the network. We also consider institutional arrangements that could help science pursue these more efficient strategies.
Article
Full-text available
Currently the ranking of scientists is based on the $h$-index, which is widely perceived as an imprecise and simplistic though still useful metric. We find that the $h$-index actually favours modestly performing researchers and propose a simple criterion for proper ranking.
Article
The identification and ranking of vital nodes in complex networks have been a critical issue for a long time. In this paper, we present an extension of existing disruptive metrics and introduce new ones, namely the disruptive coefficient (D) and 2-step disruptive coefficient (2-step D), as innovative tools for identifying critical nodes in complex networks. Our approach emphasizes the importance of disruptiveness in characterizing nodes within the network and detecting their criticality. Our new measures take into account both prior and posterior information of the focal nodes, by evaluating their ability to disrupt the previous network paradigm, setting them apart from traditional measures. We conduct an empirical analysis of four real-world networks to compare the rankings or identification of nodes using D and 2stepD with those obtained from four renowned benchmark measures, namely, degree, h-index, PageRank, and the CD index. Our analysis reveals significant differences between the nodes identified by D and 2stepD and those identified by the benchmark measures. We also examine the correlation coefficient and efficiency of the metrics and find that D and 2stepD have significant correlations with the CD index, but have weak correlations with the benchmark measures. Furthermore, we show that D and 2stepD outperform CD index and random ways in intentional attacks. We find power law distributions for D, 2stepD, and CD, indicating a small number of highly disruptive nodes and a large number of less disruptive nodes in the networks. Our results suggest that D and 2stepD are capable of providing valuable and distinct insights for identifying critical nodes in complex networks.
Article
A well-designed method for evaluating scientists is vital for the scientific community. It can be used to rank scientists in various practical tasks, such as hiring, funding application and promotion. However, a large number of evaluation methods are designed based on citation counts which can merely evaluate scientists’ scientific impact but can not evaluate their innovation ability which actually is a crucial characteristic for scientists. In addition, when evaluating scientists, it has become increasingly common to only focus on their representative works rather than all their papers. Accordingly, we here propose a hybrid method by combining scientific impact with innovation under representative works framework to evaluate scientists. Our results are validated on the American Physical Society journals dataset and the prestigious laureates datasets. The results suggest that the correlation between citation and disruption is weak, which enables us to incorporate them. In addition, the analysis shows that using representative works framework to evaluate scientists is advantageous and our hybrid method can effectively identify the Nobel Prize laureates and several other prestigious prizes laureates with higher precision and better mean ranking. The evaluation performance of the hybrid method is shown to be the best compared with the mainstream methods. This study provides policy makers an effective way to evaluate scientists from more comprehensive dimensions.
Article
Hiring appropriate editors, chairs and committee members for academic journals and conferences is challenging. It requires a targeted search for high profile scholars who are active in the field as well as in the publication venue. Many author-level metrics have been employed for this task, such as the h-index, PageRank and their variants. However, these metrics are global measures which evaluate authors’ productivity and impact without differentiating the publication venues. From the perspective of a venue, it is also important to have a localised metric which can specifically indicate the significance of academic authors for the particular venue. In this paper, we propose a relevance-based author ranking algorithm to measure the significance of authors to individual venues. Specifically, we develop a co-authorship network considering the author-venue relationship which integrates the statistical relevance of authors to individual venues. The RelRank, an improved PageRank algorithm embedding author relevance, is then proposed to rank authors for each venue. Extensive experiments are carried out to analyse the proposed RelRank in comparison with classic author-level metrics on three datasets of different research domains. We also evaluate the effectiveness of the RelRank and comparison metrics in recommending editorial boards of three venues using test data. Results demonstrate that the RelRank is able to identify not only the high profile scholars but also those who are particularly significant for individual venues.
Article
A large-scale study provides a causal test for a cornerstone of social science
Article
What science does, what science could do, and how to make science work? If we want to know the answers to these questions, we need to be able to uncover the mechanisms of science, going beyond metrics that are easily collectible and quantifiable. In this perspective piece, we link metrics to mechanisms by demonstrating how emerging metrics of science not only offer complementaries to existing ones, but also shed light on the hidden structure and mechanisms of science. Based on fundamental properties of science, we classify existing theories and findings into: hot and cold science referring to attention shift between scientific fields, fast and slow science reflecting productivity of scientists and teams, soft and hard science revealing reproducibility of scientific research. We suggest that interest about mechanisms of science since Derek J. de Solla Price, Robert K. Merton, Eugene Garfield, and many others complement the zeitgeist in pursuing new, complex metrics without understanding the underlying processes. We propose that understanding and modeling the mechanisms of science condition effective development and application of metrics.
Article
Science is built on scholarly consensus that shifts with time. This raises the question of how new and revolutionary ideas are evaluated and become accepted into the canon of science. Using two recently proposed metrics, atypicality and diruption, we measure how research draws upon novel combinations of prior research and the degree it creates a new direction by eclipsing its intellectual forebears in subsequent work. Atypical papers are nearly two times more likely to disrupt science than conventional papers, but this is a slow process taking ten years or longer for disruption scores to converge. We provide the first computational model reformulating atypicality as the distance across latent knowledge spaces learned by neural networks. The evolution of this knowledge space characterizes how yesterday's novelty forms today's scientific conventions, which condition the noveltyof tomorrow's breakthroughs.
Article
Recent works aimed to understand how to identify “milestone” scientific papers of great significance from large-scale citation networks. To this end, previous results found that global ranking metrics that take into account the whole network structure (such as Google’s PageRank) outperform local metrics such as the citation count. Here, we show that by leveraging the recursive equation that defines the PageRank algorithm, we can propose a family of local impact metrics. Our results reveal that the obtained PageRank-based local metrics outperform the citation count and other local metrics in identifying the seminal papers. Compared with global metrics, these local metrics can reach similar performance in the identification of seminal papers in shorter computational time, without requiring the whole network topology. Our findings could help to better understand the nature of groundbreaking research from citation network analysis and find practical applications in large-scale data.
Article
Wu et al. (2019) used the disruption(D) index to measure scientific and technological advances in Nature. Their findings spurred extensive discussion in academia on whether we can measure the disruption (i.e., innovation or novelty) of a research paper or a patent based on the number of citations. In this paper, we calculate the D index of ∼0.76 million publications published between 1954 and 2013 in six disciplines including both sciences and social sciences in English and Chinese. We found that the number of references has a negative effect on the D index of a paper with a relatively small number of references, and a positive effect on the D index of a paper with a large number of references. We also found that low coverage of a citation database boosts D values. Specifically, low coverage of non-journal literature in the Web of Science (WOS) boosted D values in social sciences, and the exclusion of non-Chinese language literature in the Chinese Social Sciences Citation Index (CSSCI) resulted in the inflation of D values in Chinese language literature. Limitations of the D index observed in scientific papers also exist in technological patents. This paper sheds light on the use of citation-based measurements of scientific and technological advances and highlights the limitations of this index.
Article
Scientists’ leaving scientific work at an age when they should have continued generating more scientific output inevitably leads to limited success in their careers as scientists. However, this phenomenon has received relatively little attention in science of science and scientometrics, relative to how frequently it occurs in academia. Motivated by this gap, this study seeks to identify and characterize scientists who leave the profession before the expected time in science, i.e., scientists whose productivity digresses from the normal pattern at a stage when they should be capable of publishing more papers. We first argue that simply using the termination of publishing as the sign of leaving the profession is inappropriate. Using termination of publishing as the only standard also leads to an underestimation of the overall productivity of certain types of scientists. Thus, we designed a novel measure to evaluate whether a scientist has left science. Utilizing a large-scale bibliographic dataset from Microsoft Academic Graph (MAG) in the field of mathematics, we identified over 10,000 scientists who left before their 20th career year, and paired each of them with a scientist who survived for longer than 25 years. We found that the characteristics in the first five years of the careers of the scientists who left early can be summarized as incompetency in their research abilities and a lack of collaboration with senior or highly cited authors. The implications of the findings are discussed.
Article
BACKGROUND . The increasing availability of digital data on scholarly inputs and outputs – from research funding, productivity, and collaboration to paper citations and scientist mobility – offers unprecedented opportunities to explore the structure and evolution of science. The science of science (SciSci) offers a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales: It provides insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science. In the past decade, SciSci has benefited from an influx of natural, computational, and social scientists who together have developed big data–based capabilities for empirical analysis and generative modeling that capture the unfolding of science, its institutions, and its workforce. The value proposition of SciSci is that with a deeper understanding of the factors that drive successful science, we can more effectively address environmental, societal, and technological problems. ADVANCES . Science can be described as a complex, self-organizing, and evolving network of scholars, projects, papers, and ideas. This representation has unveiled patterns characterizing the emergence of new scientific fields through the study of collaboration networks and the path of impactful discoveries through the study of citation networks. Microscopic models have traced the dynamics of citation accumulation, allowing us to predict the future impact of individual papers. SciSci has revealed choices and trade-offs that scientists face as they advance both their own careers and the scientific horizon. For example, measurements indicate that scholars are risk-averse, preferring to study topics related to their current expertise, which constrains the potential of future discoveries. Those willing to break this pattern engage in riskier careers but become more likely to make major breakthroughs. Overall, the highest-impact science is grounded in conventional combinations of prior work but features unusual combinations. Last, as the locus of research is shifting into teams, SciSci is increasingly focused on the impact of team research, finding that small teams tend to disrupt science and technology with new ideas drawing on older and less prevalent ones. In contrast, large teams tend to develop recent, popular ideas, obtaining high, but often short-lived, impact. OUTLOOK . SciSci offers a deep quantitative understanding of the relational structure between scientists, institutions, and ideas because it facilitates the identification of fundamental mechanisms responsible for scientific discovery. These interdisciplinary data-driven efforts complement contributions from related fields such as scientometrics and the economics and sociology of science. Although SciSci seeks long-standing universal laws and mechanisms that apply across various fields of science, a fundamental challenge going forward is accounting for undeniable differences in culture, habits, and preferences between different fields and countries. This variation makes some cross-domain insights difficult to appreciate and associated science policies difficult to implement. The differences among the questions, data, and skills specific to each discipline suggest that further insights can be gained from domain-specific SciSci studies, which model and identify opportunities adapted to the needs of individual research fields. Abstract . Identifying fundamental drivers of science and developing predictive models to capture its evolution are instrumental for the design of policies that can improve the scientific enterprise – for example, through enhanced career paths for scientists, better performance evaluation for organizations hosting research, discovery of novel effective funding vehicles, and even identification of promising regions along the scientific frontier. The science of science uses large-scale data on the production of science to search for universal and domainspecific patterns. Here, we review recent developments in this transdisciplinary field.
Article
This study focuses on a recently introduced type of indicator measuring disruptiveness in science. Disruptive research diverges from current lines of research by opening up new lines. In the current study, we included the initially proposed indicator of this new type (Funk & Owen-Smith, 2017; Wu, Wang, & Evans, 2019) and several variants with DI1: DI5, DI1n, DI5n, and DEP. Since indicators should measure what they propose to measure, we investigated the convergent validity of the indicators. We used a list of milestone papers, selected and published by editors of Physical Review Letters, and investigated whether this human (experts)-based list is related to values of the several disruption indicators variants and – if so – which variants show the highest correlation with expert judgements. We used bivariate statistics, multiple regression models, and (coarsened) exact matching (CEM) to investigate the convergent validity of the indicators. The results show that the indicators correlate differently with the milestone paper assignments by the editors. It is not the initially proposed disruption index that performed best (DI1), but the variant DI5 which has been introduced by Bornmann, Devarakonda, Tekles, and Chacko (2020a). In the CEM analysis of this study, the DEP variant – introduced by Bu, Waltman, and Huang (in press) – also showed favorable results.
Article
Citation distributions are lognormal. We use 30 lognormally distributed synthetic series of numbers that simulate real series of citations to investigate the consistency of the h index. Using the lognormal cumulative distribution function, the equation that defines the h index can be formulated; this equation shows that h has a complex dependence on the number of papers (N). We also investigate the correlation between h and the number of papers exceeding various citation thresholds, from 5 to 500 citations. The best correlation is for the 100 threshold but numerous data points deviate from the general trend. The size-independent indicator h/N shows no correlation with the probability of publishing a paper exceeding any of the citation thresholds. In contrast with the h index, the total number of citations shows a high correlation with the number of papers exceeding the thresholds of 10 and 50 citations; the mean number of citations correlates with the probability of publishing a paper that exceeds any level of citations. Thus, in synthetic series, the number of citations and the mean number of citations are much better indicators of research performance than h and h/N. We discuss that in real citation distributions there are other difficulties.
Article
The conventional wisdom classifies technologies into dichotomous types, such as competence-enhancing versus competence-destroying or sustaining versus disruptive. This categorization corresponds to the two routes of technology evolution: either consolidating or destabilizing past achievements. However, the combinational view suggests that a technology is a recombination of existing components, and hence it may consolidate some of its prior arts while destabilizing others. Therefore, we propose a dual technology can be simultaneously destabilizing and consolidating. To identify dual technologies, we develop the destabilization index (D) and the consolidation index (C) using patent citation networks. To validate the proposed indexes, we first select representative US patent examples to illustrate the face validity by showing how D and C indexes capture the dual characteristics. Secondly, we assess convergent and discriminant validity by examining the correlations between D and C indexes and other innovation measures. Finally, we evaluate nomological validity by studying the antecedents and the predictive power of the dual characteristics using 2.6 million patents from USPTO's dataset from 1976 to 2006. Regression results show that theory-driven factors such as patent novelty and government interest are associated with D and C indexes as expected. We also find that high D and C indexes are positively associated with patent value as perceived by its owner, and entity size moderates the effects of D and C indexes on patent value differently.
Article
This paper introduces the perspective of dynamic citation process to identify citation patterns of scientific breakthroughs. We construct a series of citation metrics and apply them to over 100 pairs of Nobel and non-Nobel papers with millions of citations. As expected, we find that most metrics cannot distinguish the two groups under similar conditions of discipline, publication year, venue, and citation impact. Some metrics, however, not only show significant discriminative power, but also reflect scientific breakthroughs’ temporal and structural characteristics—namely, prematurity and fruitfulness. Breakthrough works, that is, have long-lasting impact, but recognition lags behind; they do not just solve a problem, but more importantly open up new questions. Three metrics—average clustering coefficient, connectivity, and density of citing literature networks—show particular promise for early identification of breakthrough works. Our findings bear significant implications for science and technology management practices: from a science policy standpoint, our work demonstrates the utility of this citation process-based approach and provides a new dimension for both innovation researchers and decision makers in search of emerging scientific breakthroughs.
Article
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics’ ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics’ performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.
Article
Significance Past studies have shown that faculty at prestigious universities tend to be more productive and prominent than faculty at less prestigious universities. This pattern is usually attributed to a competitive job market that selects inherently productive faculty into prestigious positions. Here, we test the extent to which, instead, faculty’s work environments drive their productivity. Using comprehensive data on an entire field of research, we use a matched-pair experimental design to isolate the effects of training at, versus working in, prestigious environments. We find that faculty’s work environments, not selection effects, drive their productivity and prominence, establishing that where a researcher works serves as a mechanism for cumulative advantage, locking in past success via job placement and thereby facilitating future success.
Article
The application of a new citation metric prompts a reassessment of the relationship between the size of scientific teams and research impact, and calls into question the trend to emphasize ‘big team’ science.
Article
To understand quantitatively how scientists choose and shift their research focus over time is of high importance, because it affects the ways in which scientists are trained, science is funded, knowledge is organized and discovered, and excellence is recognized and rewarded1, 2, 3, 4, 5, 6, 7, 8, 9. Despite extensive investigation into various factors that influence a scientist’s choice of research topics8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, quantitative assessments of mechanisms that give rise to macroscopic patterns characterizing research-interest evolution of individual scientists remain limited. Here we perform a large-scale analysis of publication records, and we show that changes in research interests follow a reproducible pattern characterized by an exponential distribution. We identify three fundamental features responsible for the observed exponential distribution, which arise from a subtle interplay between exploitation and exploration in research-interest evolution5,22. We developed a random-walk-based model, allowing us to accurately reproduce the empirical observations. This work uncovers and quantitatively analyses macroscopic patterns that govern changes in research interests, thereby showing that there is a high degree of regularity underlying scientific research and individual careers.
Article
Scientific impact—that is the Q Are there quantifiable patterns behind a successful scientific career? Sinatra et al. analyzed the publications of 2887 physicists, as well as data on scientists publishing in a variety of fields. When productivity (which is usually greatest early in the scientist's professional life) is accounted for, the paper with the greatest impact occurs randomly in a scientist's career. However, the process of generating a high-impact paper is not an entirely random one. The authors developed a quantitative model of impact, based on an element of randomness, productivity, and a factor Q that is particular to each scientist and remains constant during the scientist's career. Science , this issue p. 596
Article
This article outlines a network approach to the study of technological change. We propose that new inventions reshape networks of interlinked technologies by shifting inventors’ attention to or away from the knowledge on which those inventions build. Using this approach, we develop novel indexes of the extent to which a new invention consolidates or destabilizes existing technology streams. We apply these indexes in analyses of university research commercialization and find that, although federal research funding pushes campuses to create inventions that are more destabilizing, deeper commercial ties lead them to produce technologies that consolidate the status quo. By quantifying the effects that new technologies have on their predecessors, the indexes we propose allow patent-based studies of innovation to capture conceptually important phenomena that are not detectable with established measures. The measurement approach presented here offers empirical insights that support theoretical development in studies of innovation, entrepreneurship, technology strategy, science policy, and social network theory. This paper was accepted by Lee Fleming, entrepreneurship and innovation.
Article
Research funding organizations invest substantial resources to monitor mission-relevant research findings to identify and support promising new lines of inquiry. To that end, we have been pursuing the development of tools to identify research publications that have a strong likelihood of driving new avenues of research. This paper describes our work towards incorporating multiple time-dependent and -independent features of publications into a model to identify candidate breakthrough papers as early as possible following publication. We used multiple random forest models to assess the ability of indicators to reliably distinguish a gold standard set of breakthrough publications as identified by subject matter experts from among a comparison group of similar Thomson Reuters Web of Science™ publications. These indicators were then tested for their predictive value in random forest models. Model parameter optimization and variable selection were used to construct a final model based on indicators that can be measured within 6 months post-publication; the final model had an estimated true positive rate of 0.77 and false positive rate of 0.01.
Article
Citation impact indicators nowadays play an important role in research evaluation, and consequently these indicators have received a lot of attention in the bibliometric and scientometric literature. This paper provides an in-depth review of the literature on citation impact indicators. First, an overview is given of the literature on bibliographic databases that can be used to calculate citation impact indicators (Web of Science, Scopus, and Google Scholar). Next, selected topics in the literature on citation impact indicators are reviewed in detail. The first topic is the selection of publications and citations to be included in the calculation of citation impact indicators. The second topic is the normalization of citation impact indicators, in particular normalization for field differences. Counting methods for dealing with co-authored publications are the third topic, and citation impact indicators for journals are the last topic. The paper concludes by offering some recommendations for future research.
Article
A Sleeping Beauty (SB) in science refers to a paper whose importance is not recognized for several years after publication. Its citation history exhibits a long hibernation period followed by a sudden spike of popularity. Previous studies suggest a relative scarcity of SBs. The reliability of this conclusion is, however, heavily dependent on identification methods based on arbitrary threshold parameters for sleeping time and number of citations, applied to small or monodisciplinary bibliographic datasets. Here we present a systematic, large-scale, and multidisciplinary analysis of the SB phenomenon in science. We introduce a parameter-free measure that quantifies the extent to which a specific paper can be considered an SB. We apply our method to 22 million scientific papers published in all disciplines of natural and social sciences over a time span longer than a century. Our results reveal that the SB phenomenon is not exceptional. There is a continuous spectrum of delayed recognition where both the hibernation period and the awakening intensity are taken into account. Although many cases of SBs can be identified by looking at monodisciplinary bibliographic data, the SB phenomenon becomes much more apparent with the analysis of multidisciplinary datasets, where we can observe many examples of papers achieving delayed yet exceptional importance in disciplines different from those where they were originally published. Our analysis emphasizes a complex feature of citation dynamics that so far has received little attention, and also provides empirical evidence against the use of short-term citation metrics in the quantification of scientific impact.