Article
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Data occupy a key role in our information society. However, although the amount of published data continues to grow and terms such as “data deluge” and “big data” today characterize numerous (research) initiatives, much work is still needed in the direction of publishing data in order to make them effectively discoverable, available, and reusable by others. Several barriers hinder data publishing, from lack of attribution and rewards, vague citation practices, and quality issues to a rather general lack of a data-sharing culture. Lately, data journals have come forward to overcoming some of these barriers. In this study of more than 100 currently existing data journals, we describe the approaches they promote for data set description, availability, citation, quality, and open access. We close by identifying ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

Supplementary resource (1)

... características diferentes dos artigos convencionais, descrevendo metodicamente conjuntos de dados compartilhados em repositórios, desde sua estrutura e formato aos métodos utilizados em sua coleta, organização e processamento, com o objetivo de apoiar a reprodutibilidade e o reúso dos dados. Os metadados que o compõem podem ser mais ou menos específicos, a depender do nível de descrição adotado para os conjuntos de dados que pretendem representar (Candela et al., 2015). ...
... Em um panorama geral, percebe-se que, embora não se tenha um padrão único definitivo para a tipologia dos artigos de dados na área, todas as revistas que possuem orientações específicas para a sua publicação adotam uma descrição similar do que vem a ser este documento. Em resumo, é um paper orientado a descrever um conjunto de dados disponibilizados em repositório, geralmente sob uso de licenças abertas, de modo a facilitar seu reúso e garantir aos pesquisadores os devidos direitos de autoria pelos conjuntos de dados compartilhados (Candela et al., 2015;Kim, 2020). ...
... Esta pesquisa se limitou à análise comparativa das diretrizes editoriais para a publicação de artigo de dados em Ciências da Saúde a partir das orientações aos autores oferecidas pelas revistas consultadas. Apesar de poucas revistas fornecerem um template estruturado de artigo de dados para download, todas as 26 revistas, que colocaram os artigos de dados de alguma maneira como tipologia independente, seguiram o padrão observado por Candela et al. (2015) ...
Article
Full-text available
Esta pesquisa tem o objetivo de analisar as diretrizes editoriais de revistas científicas de Ciências da Saúde que aceitam artigos de dados e identificar os pontos comuns para orientar sua elaboração, visando apoiar editores de revistas brasileiras. A amostra foi composta pelas revistas indexadas nas bases de dados Scopus e Web of Science que publicam artigos de dados, e a coleta de dados foi realizada diretamente nos sites das publicações, no link das instruções ou orientações aos autores das 54 revistas recuperadas referentes às Ciências da Saúde. Do total de títulos, 26 revistas (48%) indicavam algum tipo de orientação em relação à submissão de artigos de dados e quatro títulos apresentavam template de orientação. Conclui-se que apesar de poucas revistas fornecerem um template estruturado de artigo de dados para download, percebe-se o avanço das revistas na construção de uma descrição consistente dos principais componentes dos artigos de dados. O recorte adotado nesta pesquisa possibilitou a construção de uma visão mais clara acerca dos padrões implícitos adotados nos artigos de dados, nas ciências da saúde.
... Over time, more journals have begun accepting data papers. In this research, all periodicals accepting data papers are termed data journals; and more specifically, we distinguish journals primarily publishing data papers (i.e., exclusively data journals; the operationalization of this concept is discussed in the Methods section) from the rest that accept data papers just as a genre in addition to research articles (i.e., mixed data journals), following how these categories are defined in previous studies 9,14 . ...
... As data papers are becoming a popular way for researchers to publish their research data in many disciplines 14,15 , this new genre has become an important data source for investigating how data is used by scientists. This echoes increasing interest in research data from the field of quantitative science studies 16,17 . ...
... The absence of data papers from large-scale empirical studies represents a major gap in the existing research infrastructure for effectively tracing data papers. Efforts have been made to identify data journals 14,29 , but to our knowledge, no research has been conducted to understand how these journals and their publications are indexed in scholarly databases, such as the Web of Science (WoS) and Scopus, which are frequently used as the direct data source in quantitative science studies. This gap makes it harder for researchers to easily extract a large body of data papers from scholarly databases and analyze them, especially using quantitative methods. ...
Article
Full-text available
The data paper is becoming a popular way for researchers to publish their research data. The growing numbers of data papers and journals hosting them have made them an important data source for understanding how research data is published and reused. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively, which should be addressed in the future.
... Moreover, data papers are making it easier for research data to be peer-reviewed, a significant prerequisite for the integration of data objects into the research system (Costello et al., 2013;Mayernik et al., 2015). Over time, more journals have begun accepting data papers; these periodicals are termed data journals (Candela et al., 2015). ...
... As data papers are becoming a popular way for researchers to publish their research data in many disciplines (Candela et al., 2015;Griffiths, 2009), this new genre has become an important data source for investigating how data is used by scientists. This echoes increasing interest in research data from the field of quantitative science studies (Cousijn et al., 2019;Silvello, 2018). ...
... The absence of data papers from large-scale empirical studies represents a major gap in the existing research infrastructure for effectively tracing data papers. Efforts have been made to identify data journals (Candela et al., 2015;Walters, 2020), but to our knowledge, no research has been conducted to understand how these journals and their publications are indexed in scholarly databases, such as the Web of Science and Scopus, which are frequently used as the direct data source in quantitative science studies. This gap makes it harder for researchers to easily extract a large body of data papers from the scholarly databases and analyze them, especially using quantitative methods. ...
Preprint
Full-text available
As part of the data-driven paradigm and open science movement, the data paper is becoming a popular way for researchers to publish their research data, based on academic norms that cross knowledge domains. Data journals have also been created to host this new academic genre. The growing number of data papers and journals has made them an important large-scale data source for understanding how research data is published and reused in our research system. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively. As a result, we argue that efforts should be made by data journals and databases to improve the quality of metadata for this emerging genre.
... multidisziplinär ausgerichtete Journals sind (vgl. Candela et al. 2015bCandela et al. , S. 1750 Walters 2020), "data article" bzw. "data descriptor" verwendet (vgl. ...
... Die Literaturanalyse in Kapitel 2 hat gezeigt, dass Charakteristika von Data Journals bereits umfassend untersucht wurden. Die Veröffentlichung der Studie mit dem größten Sample von 116 Data Journals liegt allerdings schon sieben Jahre zurück (vgl.Candela et al. 2015b). Eine weitere relevante Studie für die Forschungsfrage dieser Arbeit untersuchte ein kleineres Sample von 39 Journals mit dem Fokus aufPeer-Review-Kriterien (vgl. Carpenter 2017). Danach erschienene Untersuchungen beschränken sich auf kleinere Samples, pure Data Journals oder haben Qualitätssicherung nicht als inhaltlichen Schwerpunkt ...
... ein weiteres Journal verwiesen werden, das inzwischen eingestellt ist; es sind aber auch Journals hinzugekommen.Die Data Journals wurden auf ihre Charakteristika hin betrachtet. Auf der Grundlage der Listung im DOAJ beträgt der Anteil an Journals, die im Open Access erscheinen, 83,7 %. Zu ähnlichen Ergebnissen kamen auch andere Untersuchungen (vgl.Candela et al. 2015b) bzw. werden Data Journals auch als reine Open Access Journals definiert (vgl.Austin, Bloom et al. 2017, S. 82). Fast alle Data Journals sind nicht nur frei zugänglich, sondern bieten auch offene Lizenzen an.Der disziplinäre Schwerpunkt der Data Journals liegt in den natur-und lebenswissenschaftlichen Disziplinen, was auch durch andere ...
Thesis
Full-text available
Die Qualitätssicherung von Forschungsdaten ist im Kontext offener Wissenschaft ein wichtiges Thema. Sollen geteilte Daten dabei unterstützen, Forschungsergebnisse nachzuvollziehen und die Nachnutzung von Daten ermöglicht werden, bestehen entsprechende Anforderungen an ihre Qualität. Bei Datenqualität und Qualitätssicherung im Kontext von Datenpublikationen handelt es sich allerdings um komplexe und divers verwendete Konzepte. Bislang wird die Qualitätssicherung von Datenpublikationen punktuell ausführlich beschrieben, jedoch fehlt eine Betrachtung, die die möglichen Maßnahmen systematisch beschreibt. Darüber, wie einzelne Maßnahmen bei Repositorien verbreitet sind, ist ebenfalls kaum etwas bekannt. In der Dissertation wird herausgearbeitet, wie Qualität und Qualitätssicherung für Forschungsdaten definiert und systematisiert werden können. Auf dieser Basis wird ein theoretischer Ansatz für die Systematisierung qualitätssichernder Maßnahmen erarbeitet. Er dient als Grundstruktur für die Untersuchung von Data Journals und Repositorien. Dazu werden Guidelines von 135 Data Journals und Zertifizierungsdokumente von 99 Repositorien analysiert, die das Zertifikat CoreTrustSeal in der Version 2017–2019 erhalten haben. Die Analysen zeigen, wie Datenqualität in Data Journal Guidelines und durch Repositorien definiert wird und geben einen Einblick in die Praxis der Qualitätssicherung bei Repositorien. Die Ergebnisse bilden die Grundlage für eine Umfrage zur Verbreitung qualitätssichernder Maßnahmen, die auch offene Prozesse der Qualitätssicherung, Verantwortlichkeiten und die transparente Dokumentation der Datenqualität berücksichtigt. An der Umfrage im Jahr 2021 nahmen 332 Repositorien teil, die im Verzeichnis re3data indexiert sind. Die Ergebnisse der Untersuchungen zeigen den Status quo der Qualitätssicherung und die Definition von Datenqualität bei Data Journals und Forschungsdatenrepositorien auf. Sie zeigen außerdem, dass Repositorien mit vielfältigen Maßnahmen zur Qualitätssicherung von Datenpublikationen beitragen. Die Ergebnisse fließen in ein Framework für die Qualitätssicherung von Datenpublikationen in Repositorien ein.
... Não houve diferença significativa entre as grandes áreas da Ciência. Diante dos resultados apresentados, há evidências de conscientização da comunidade científica em relação às boas práticas e políticas de compartilhamento de dados serem algo que pode partir dos órgãos superiores que fomentam as pesquisas, conforme abordado por Candela et al. (2015). Uma vez que, ao promover esta compreensão, pode-se gerar a reutilização dos dados. ...
... Entretanto, estes corresponderam apenas a 19,55% do total de 1171 indivíduos. Houve diferença estatística a 5% entre as grandes áreas, com p-valor do teste quiquadrado igual a 0,0007, vide Tabela 7. Desta forma, é preciso ultrapassar a barreira citada por Candela et al. (2015) sobre a carência de promover uma cultura de compartilhamento dos dados que envolva todos que participam das pesquisas. ...
... Note-se que, dentre os 26,39% que alegaram tal problemática como motivo para não compartilhar dados, o grupo da área de Humanas (6,75%) e Ciências Sociais (6,06%), juntos, somam aproximadamente 13% do total de 1171 indivíduos, ou seja, metade dos que optaram pela categoria "Sim". A publicação de dados é vista como um pré-requisito para o compartilhamento de dados (CANDELA et al., 2015), assim como pode garantir a preservação deles ao longo do tempo (AUSTIN et al., 2017). Neste sentido, existem algumas abordagens que permitem que os dados sejam publicados, Curty e Aventurier (2017) apresentam três opções, tais como: os repositórios de dados científicos, as publicações ampliadas e os artigos de dados, cujo foco desta última é colocar os dados como primordial. ...
Article
Full-text available
Objetivo: O objetivo deste trabalho foi analisar os dados obtidos e integrados a partir da investigação realizada com os professores pesquisadores vinculados aos programas de pós-graduação brasileiros, na área da Ciência da Informação, e os demais abordados na pesquisa intitulada de “Práticas e percepções dos pesquisadores brasileiros”, no que tange as razões pelo não compartilhamento dos seus dados. Metodologia: Caracteriza-se como uma pesquisa bibliográfica e exploratória e possui uma abordagem quanti-qualitativa. Os dados foram processados e posteriormente submetidos ao teste qui-quadrado. Resultados: Verificou-se que as Ciências Humanas e as Ciências Sociais são as áreas com maiores desafios no contexto do compartilhamento dos dados. A falta de exigência para a publicação de dados e a carência de infraestrutura foram as principais barreiras apresentadas pelos pesquisadores. Ficou constatado que a área da Ciência da Informação necessita de uma infraestrutura que incentive os pesquisadores a compartilharem os seus dados. Conclusões: Conclui-se que há a necessidade de implementação de políticas mais efetivas voltadas à disponibilização de dados de pesquisa, de modo a facilitar o uso/reúso dos dados por toda comunidade científica.
... We also observe that data occupy a crucial role today in research, emerging as a driving instrument in science (Candela, Castelli, Manghi, & Tani, 2015). Data citations should be given the same scholarly status of traditional citations and contribute to bibliometrics indicators (Belter, 2014;Peters, Kraker, Lex, Gumpenberger, & Gorraiz, 2016). ...
... For these databases, systems such as Mendeley 11 store data alongside the publication, so that a citation to the publication also serves as a citation to the data. Data journals (Candela et al., 2015), i.e., journals publishing papers describing data sets, are also employed as proxies to cite static data sets. ...
... Data journals (Candela et al., 2015) enable the publication of papers describing a database that works as a proxy for it and its authors and receives its citations. This is a possible solution, but it is not complete since it does not consider citations referring to general queries. ...
Article
Full-text available
The citation graph is a computational artifact that is widely used to represent the domain of published literature. It represents connections between published works, such as citations and authorship. Among other things, the graph supports the computation of bibliometric measures such as h-indexes and impact factors. There is now an increasing demand that we should treat the publication of data in the same way that we treat conventional publications. In particular, we should cite data for the same reasons that we cite other publications. In this paper we discuss what is needed for the citation graph to represent data citation. We identify two challenges: (i) to model the evolution of credit appropriately (through references) over time and (ii) to model data citation not only to a dataset treated as a single object but also to parts of it. We describe an extension of the current citation graph model that addresses these challenges. It is built on two central concepts: citable units and reference subsumption. We discuss how this extension would enable data citation to be represented within the citation graph and how it allows for improvements in current practices for bibliometric computations both for scientific publications and for data.
... The rise of FAIR (Findability, Accessibility, Interoperability, and Reusability) principles [57] further propels the need to cite data and count data citations. Such necessary practices shed light on the work of the creators and curators of datasets, work that would otherwise remain uncredited [6,13,20,58,62]. Much of the work in the current literature considers the development of data citation as a driving force to "facilitate giving scholar credit" [41]. ...
... If we assign all the credit x to r 1 , then all and only the curators of r 1 are credited. Whereas, if we assign x to D, then all and only the database administrators are credited -this is what typically happens when we cite data papers as proxies for the databases they describe [20]. On the other hand, if we distribute x in part to r 1 and in part to other records, say r 2 and r 7 , which somehow contributed to the generation of r 1 , then the curators of r 1 , r 2 and r 7 are credited. ...
... A database cited as a whole, even though only parts of the databases are used in the papers or datasets. Alternatively, the so called "data papers" can be cited, being traditional papers that describe a database [20]. An example is paper [33], which, every few years, describes the information contained in GtoPdb. ...
Article
Full-text available
It is widely accepted that data is fundamental for research and should therefore be cited as textual scientific publications. However, issues like data citation, handling and counting the credit generated by such citations, remain open research questions. Data credit is a new measure of value built on top of data citation, which enables us to annotate data with a value, representing its importance. Data credit can be considered as a new tool that, together with traditional citations, helps to recognize the value of data and its creators in a world that is ever more depending on data. In this paper we define data credit distribution (DCD) as a process by which credit generated by citations is given to the single elements of a database. We focus on a scenario where a paper cites data from a database obtained by issuing a query. The citation generates credit which is then divided among the database entities responsible for generating the query output. One key aspect of our work is to credit not only the explicitly cited entities, but even those that contribute to their existence, but which are not accounted in the query output. We propose a data credit distribution strategy (CDS) based on data provenance and implement a system that uses the information provided by data citations to distribute the credit in a relational database accordingly. As use case and for evaluation purposes, we adopt the IUPHAR/BPS Guide to Pharmacology (GtoPdb), a curated relational database. We show how credit can be used to highlight areas of the database that are frequently used. Moreover, we also underline how credit rewards data and authors based on their research impact, and not merely on the number of citations. This can lead to designing new bibliometrics for data citations.
... The first data journal, the Journal of Chemical and Engineering Data, was launched in 1956 [13], but the number of data journals has grown only in recent times. Schöpfel et al. [4], who updated the study performed by Candela et al. [14] on the number of data journals and the areas of interest covered by them, show that the number of data journals did not increase dramatically during those five years (from 20 data journals in 2015 to 28 in 2019, some of them no longer active), while the number of data papers published rose sharply from 846 in 2013 to 11,500 in 2019. Likewise, Walters [15], found that of the 169 journals that reported to publish research relating to data, only 19 journals (11.2%) were classified as "pure" data journals, such that at least half the journals' publications were data reports, 109 (64.5%) devoted some publications to data reports (about 1.6%) but prioritised other types of publications, 21 (12.4%) ...
... The fields available for filtering are those of the Australian and New Zealand Standard Research Classification (ANZSRC), which Dimensions has implemented in their Fields of Research (FOR) system. 14 We agreed upon the following as loosely defining the bulk of HSS publications currently in JOHD and RDJ: Since several publications span different fields, we filtered out duplicate entries, which resulted in a final dataset containing 358,770 titles, each with information on publication date, total citations, and Altmetric score. ...
Article
Full-text available
The humanities and social sciences (HSS) have recently witnessed an exponential growth in data-driven research. In response, attention has been afforded to datasets and accompanying data papers as outputs of the research and dissemination ecosystem. In 2015, two data journals dedicated to HSS disciplines appeared in this landscape: Journal of Open Humanities Data (JOHD) and Research Data Journal for the Humanities and Social Sciences (RDJ). In this paper, we analyse the state of the art in the landscape of data journals in HSS using JOHD and RDJ as exemplars by measuring performance and the deep impact of data-driven projects, including metrics (citation count; Altmetrics, views, downloads, tweets) of data papers in relation to associated research papers and the reuse of associated datasets. Our findings indicate: that data papers are published following the deposit of datasets in a repository and usually following research articles; that data papers have a positive impact on both the metrics of research papers associated with them and on data reuse; and that Twitter hashtags targeted at specific research campaigns can lead to increases in data papers’ views and downloads. HSS data papers improve the visibility of datasets they describe, support accompanying research articles, and add to transparency and the open research agenda.
... 21 Examples of journals publishing data descriptors include Medical Physics and Nature Scientific Data. 22 Finally, code repositories, such as GitHub, have allowed for the rapid and dynamic development of code related to scientific studies in the medical domain, and fostered community engagement for future developments. 23 While the current components of scientific dissemination remain relatively independent, in the future, one could envision a modular framework where automated processes link these components in an integrated fashion (Fig. 2). ...
... Collection contents are described through "wiki pages", which also list relevant publications and instructions for data use. Increasingly, focused "data descriptors," in-depth manuscripts detailing individual datasets, such as those published through Nature Scientific Data, 22 are also generated for TCIA collections to engender greater transparency in data generation, collection protocols, and intended use-cases. For end-users, TCIA provides web interfaces and software (National Biomedical Imaging Archive Data Retriever) to easily retrieve and catalog collections on local computing infrastructure. ...
Article
Full-text available
Artificial intelligence (AI) has exceptional potential to positively impact the field of radiation oncology. However, large curated datasets - often involving imaging data and corresponding annotations - are required to develop radiation oncology AI models. Importantly, the recent establishment of Findable, Accessible, Interoperable, Reusable (FAIR) principles for scientific data management have enabled an increasing number of radiation oncology related datasets to be disseminated through data repositories, thereby acting as a rich source of data for AI model building. This manuscript reviews the current and future state of radiation oncology data dissemination, with a particular emphasis on published imaging datasets, AI data challenges, and associated infrastructure. Moreover, we provide historical context of FAIR data dissemination protocols, difficulties in the current distribution of radiation oncology data, and recommendations regarding data dissemination for eventual utilization in AI models. Through FAIR principles and standardized approaches to data dissemination, radiation oncology AI research has nothing to lose and everything to gain.
... Data publication is an important instrument for making data easier to access, understand, and reuse (Parsons & Fox, 2013), to eventually support more effective reuse of data. Data papers have been generally recognized as a new form of data publication and have been increasingly adopted in many research communities (Candela et al., 2015;Gorgolewski et al., 2013). A major distinction between data papers and research article is their focus: while both types of publications are normally peer-reviewed, data papers are designed to describe data objects (Carlson & Oda, 2018), a major divergence from the IMRaD paper structure (Sollaci & Pereira, 2004). ...
... During the past few years, data papers have become an increasingly popular type of scientific publication. Candela et al. (2015) identified over 100 journals that accept data papers, and there is evidence that the number of data papers has continued to increase since then (El-Tawil & Agrawal, 2019). Empirical evidence has also shown that data papers are frequently cited, although the citation pattern normally takes on a highly long-tailed distribution (Kotti & Spinellis, 2019) and the papers may be cited for reasons other than reuse of the data (Jiao & Darch, 2020). ...
Preprint
Full-text available
The data paper is an emerging academic genre that focuses on the description of research data objects. However, there is a lack of empirical knowledge about this rising genre in quantitative science studies, particularly from the perspective of its linguistic features. To fill this gap, this research aims to offer a first quantitative examination of which rhetorical moves-rhetorical units performing a coherent narrative function-are used in data paper abstracts, as well as how these moves are used. To this end, we developed a new classification scheme for rhetorical moves in data paper abstracts by expanding a well-received system that focuses on English-language research article abstracts. We used this expanded scheme to classify and analyze rhetorical moves used in two flagship data journals, Scientific Data and Data in Brief. We found that data papers exhibit a combination of IMRaD- and data-oriented moves and that the usage differences between the journals can be largely explained by journal policies concerning abstract and paper structure. This research offers a novel examination of how the data paper, a novel data-oriented knowledge representation, is composed, which greatly contributes to a deeper understanding of data and data publication in the scholarly communication system.
... The efforts for improving data-sharing practices in the scientific field has trigger the creation of research data journals and tracks, e.g. see these compilations of data journals [51][52][53] , devoted to publish mainly data papers (more than 50% of their total publication volume). To build our data sample, we selected from this list the journals meeting the following criteria: 1) Journal is active at July of 2023 and publish data papers regularly, 2) it publishes data papers from different scientific fields (inter-disciplinary scope), 3) it publishes data papers written in English. ...
Preprint
Full-text available
To ensure the fairness and trustworthiness of machine learning (ML) systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, the adoption of these practices by academic institutions has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their completeness and coverage of the requested dimensions, and trends in recent years, putting special emphasis on the most and least documented dimensions. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data's preparedness for its transparent and fairer use in ML technologies.
... In addition, IICF recommender systems can facilitate interdisciplinary research by uncovering related data-collections across different research areas. By analyzing the usage patterns of researchers from various domains, the system can identify and recommend data-collections that are of potential interest to researchers from other disciplines [220]. This cross-disciplinary recommendation can promote interdisciplinary collaboration and enable researchers to leverage insights from related fields, ultimately leading to novel findings and scientific advancements. ...
Thesis
Full-text available
Effective Research Data Management (RDM) practices are essential for fostering research collaboration, increasing discoverability and repurposing research data, and advancing scientific progress in higher education. In recent years, adopting Open Science Platforms (OSPs) and the Findable, Accessible, Interoperable, and Reusable (FAIR) data principles has highlighted the need for improved RDM methodologies and tools for flourishing higher education achievements. However, existing literature has provided limited guidance on monitoring RDM processes, their adoption, and their use. This dissertation addresses this gap by investigating how to enable dis- covering and enhancing process-aware RDM activities via modeling the underlying researcher’s actual practices. This dissertation presents a series of methodologies as a framework combining data acquisition, abstraction, knowledge discovery, and operation enhancement techniques. Furthermore, the case studies highlight the challenges associated with RDM-related activities by assessing the proposed methodologies’ validity in real-world environments. Initially, this work presents a universal reference software architecture for RDM ser- vices; then, it proposes four approaches for data acquisition, including a novel Hybrid logger technique for acquiring datasets from information systems that operate on distributed settings, providing a comprehensive view of user activities by evaluating corresponding software component executions. This approach enables a projection of user behavior and facilitates the development of further machine-learning studies. Furthermore, this work introduces a semi-supervised learning approach for abstract- ing datasets by accommodating non-sequential events in distributed systems while balancing data granularity and model fitness. The methodology for discovering process-aware activities incorporates a modular and layered architecture, providing insights into RDM compliance, identifying deviations, and optimizing user experience. Additionally, it outlines a method for determining and visualizing the user and system interactions and discovers the RDM phases of research projects, providing a practical understanding of the progression and activities of different research groups. Finally, this thesis proposes and evaluates two recommender systems, demonstrating the potential of Content-Based and Collaborative Filtering recommender systems in enabling the reusability of research data repositories and fostering cooperation among researchers. The findings contribute significantly to the expanding body of literature on RDM and provide valuable insights into the potential of the presented methodologies for enhancing RDM practices in OSPs. In conclusion, this dissertation offers holistic strategies for addressing the difficulties re- lated to facilitating RDM in OSPs, providing guidelines for implementing necessary ar- chitecture and demonstrating the applicability of the proposed methods to other RDM services that adhere to the reference software architecture of RDM systems.
... Until recently, the time-consuming efforts to improve data quality and accessibility have received limited recognition. Many authors have advocated for increased recognition of data publication in the form of citations (e.g., Callaghan et al., 2012;Kratz and Strasser, 2014;Candela et al., 2015), and some journals have developed publication types that are specifically intended to describe data. Receiving credit in the form of citations for published datasets without an accompanying manuscript would further incentivize proper archiving and documenting of data (Callaghan et al., 2012). ...
Article
Full-text available
Given the high costs of constructing, maintaining, monitoring, and sampling paired watersheds, it is prudent to ask “Are paired watershed studies still worth the effort?” We present a compilation of 90 North American paired watershed studies and use examples from the Caspar Creek Experimental Watersheds to contend that paired watershed studies are still worth the effort and will continue to remain relevant in an era of big data and short funding cycles. We offer three reasons to justify this assertion. First, paired watersheds allow for watershed-scale experiments that have produced insights into hydrologic processes, water quality, and nutrient cycling for over 100 years. Paired watersheds remain an important guide to inform best management practices for timber harvesting and other land-management concerns. Second, paired watersheds can produce long climate, streamflow, and water quality records because sites are frequently maintained over the course of multiple experiments or long post-treatment periods. Long-term datasets can reveal ecological surprises, such as changes in climate-streamflow relationships driven by slow successional processes. Having multiple watershed records helps identify the cause of these changes. Third, paired watersheds produce data that are ideal for developing and testing hydrologic models. Ultimately, the fate of paired watersheds is up to the scientific community and funding agencies. We hope that their importance continues to be recognized.
... Both disciplines also less frequently cite or mention data papers. This reflects the slower emergence of data papers and journals in SSH (Candela et al., 2015) and possibly a history of using data from governmental sources, where data papers may not be as relevant. There is some evidence that the landscape of data papers in SSH may be changing, and that data papers may have an effect on metrics of associated papers and data (McGillivray et al., 2022). ...
Article
Full-text available
Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing - or not citing - data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse. Peer Review https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00264
... Our results show that R software papers are published in both journals dedicated to software papers and those that accept software papers along with research articles. This is also the situation reported for the new academic genre of data papers (Candela et al., 2015), which makes it more difficult to trace software publishing activities through the bibliographic universe. Moreover, some of the top journals to publish R papers are those highly specialized journals that are not indexed in major research databases, most notably The R Journal and R News. ...
Preprint
Full-text available
Under the data-driven research paradigm, research software has come to play crucial roles in nearly every stage of scientific inquiry. Scholars are advocating for the formal citation of software in academic publications, treating it on par with traditional research outputs. However, software is hardly consistently cited: one software entity can be cited as different objects, and the citations can change over time. These issues, however, are largely overlooked in existing empirical research on software citation. To fill the above gaps, the present study compares and analyzes a longitudinal dataset of citation formats of all R packages collected in 2021 and 2022, in order to understand the citation formats of R-language packages, important members in the open-source software family, and how the citations evolve over time. In particular, we investigate the different document types underlying the citations and what metadata elements in the citation formats changed over time. Furthermore, we offer an in-depth analysis of the disciplinarity of journal articles cited as software (software papers). By undertaking this research, we aim to contribute to a better understanding of the complexities associated with software citation, shedding light on future software citation policies and infrastructure.
... However, the current number of data repositories in China is only 48 (Li et al., 2022). Therefore, the FAIR principle requiring data to be findable, accessible, interoperable, and reproducible is proposed to guide scientists when sharing their research datasets, even in to open scientific data, data journals-a new form of data publication-have exposed some problems, such as the lack of rewards, vague data descriptions, and quality issues (Candela et al., 2015). These issues highlight the need for better data service repositories for scientists. ...
Article
Full-text available
This paper explores the effect of publishing a data paper in the Open Access journal Data in Brief (DIB) on the citation counts of the related research paper. Using regression analysis, citation content analysis and a survey method, we investigate whether research papers with a related data paper have higher citation counts and the potential reasons. After controlling variables that correlate with the citation counts, research papers with a related data paper were found to have higher citation counts than those published in the same issue of the same journal. Next, we explored the causal relationship between the two variables by surveying the corresponding authors of 618 papers who shared datasets in DIB from 2014 to 2021. The results show that the authors acknowledge the benefits of sharing data in DIB, including citation increase and career reputation enhancement. We further explored how the data papers in DIB increase the citations of the related research papers by using citation content analysis. We found that scientists co‐cite the data papers and their related research papers for the purpose of reusing the underlying data or portraying a better understanding of the underlying data and related research articles.
... Developing more review papers is important to provide an overview of the state-of-the-art of ML for healthcare to the African research audience . Limited interest is shown in short communications like notes, editorials, short surveys and letters to the editor as these kinds of publications have a less significant weight than articles, reviews, and conference papers in evidence-based research (Candela et al., 2015). The lack of documented and local datasets limits the development of customized solutions for digital health in the continent. ...
Preprint
Full-text available
Machine learning has seen enormous growth in the last decade, with healthcare being a prime application for advanced diagnostics and improved patient care. The application of machine learning for healthcare is particularly pertinent in Africa, where many countries are resource-scarce. However, it is unclear how much research on this topic is arising from African institutes themselves, which is a crucial aspect for applications of machine learning to unique contexts and challenges on the continent. Here, we conduct a bibliometric study of African contributions to research publications related to machine learning for healthcare, as indexed in Scopus, between 1993 and 2022. We identified 3,772 research outputs, with most of these published since 2020. North African countries currently lead the way with 64.5% of publications for the reported period, yet Sub-Saharan Africa is rapidly increasing its output. We found that international support in the form of funding and collaborations is correlated with research output generally for the continent, with local support garnering less attention. Understanding African research contributions to machine learning for healthcare is a crucial first step in surveying the broader academic landscape, forming stronger research communities, and providing advanced and contextually aware biomedical access to Africa.
... Besides, African scientists do not significantly publish data papers to describe their datasets for ML for healthcare. Data papers are very important to provide detailed information about Africa-related datasets and ensure their availability for other scientists working on biomedical applications in the African context [82]. The lack of documented and local datasets limits the development of customized solutions for digital health in the continent. ...
Preprint
Full-text available
Machine learning has seen enormous growth in the last decade, with healthcare being a prime application for advanced diagnostics and improved patient care. The application of machine learning for healthcare is particularly pertinent in Africa, where many countries are resource-scarce. However, it is unclear how much research on this topic is arising from African institutes themselves, which is a crucial aspect for applications of machine learning to unique contexts and challenges on the continent. Here, we conduct a bibliometric study of African contributions to research publications related to machine learning for healthcare, as indexed in Scopus, between 1993 and 2022. We identified 3,772 research outputs, with most of these published since 2020. North African countries currently lead the way with 64.5% of publications for the reported period, yet Sub-Saharan Africa is rapidly increasing its output. We found that international support in the form of funding and collaborations is correlated with research output generally for the continent, with local support garnering less attention. Understanding African research contributions to machine learning for healthcare is a crucial first step in surveying the broader academic landscape, forming stronger research communities, and providing advanced and contextually aware biomedical access to Africa.
... Depending on the organization of open access, such publishing houses allow scientists to distribute by selfarchiving not only research results (postprints), but also manuscripts of their research (preprints) (Suber, P., 2012). On the other hand, the emphasis on the importance of data sharing and reuse, which has been developing in recent years, makes data journals a new channel for realizing this goal by facilitating the dissemination by scientists beyond the article of primary scientific products generated during research, such as datasets, software, program code, experiments, etc.(Candela et al., 2015). ...
Article
Full-text available
Today, one can observe shifts in the research landscape, which is formed by digitization and open science principles. The open science movement continues to gain momentum, attention and debate. In parallel with the principle of unity, open science gives rise to a taxonomy of several related ideas, guidelines and concepts, such as open access, open replicable research and open data. Over the past fifteen years, research institutions have focused on open access to publications. However, recently the focus of attention has shifted to research data as a “new currency” in research activities and their distribution in open access, and the guiding principles of data management are becoming crucial for the wide implementation of open science practices and the effective use of data in research, industry, business and other sectors of the economy. In this context, it is relevant to carry out a thorough study of primary scientific works on open science issues and to study the role of the concept of “open research data” in the paradigm of a holistic ecosystem of open science and business ecosystem. In this work, it is proposed to use the methods of quantitative and qualitative bibliometric analysis, which allows to identify the main trends and form the basis for further research. The information base for this work was the international scientometric database Scopus, which enables to analyze bibliographic data using built-in tools and import them for external use in the VOSviewer software. The study revealed an increasing trend in the number of publications on the subject under study, with the highest annual growth rate in 2017 (76%) and 2019 (66%). Qualitative bibliographic analysis made it possible to analyze the most cited and, therefore, trending works on the selected topic. In terms of the number of citations per year, the results show that the studies with such directions in open science as open program code (open source); data/research reproducibility, research data management; open access to publications (open access) are most popular. In addition, a cluster analysis of the co-prevalence of keywords was conducted. It formed clusters dedicated to both institutional and infrastructural problems of the development of open science and research data. Separately, the results of the analysis create a scientific basis for further research into the key determinants of the effectiveness of the implementation of a proper research data management system at the micro, meso, and macro levels. It will improve the effectiveness of the implementation of scientific developments from one field of knowledge to another, while achieving increased interdisciplinary research. In parallel with this, interested persons of the real sector of the economy get the opportunity to analyze scientific results, determining the possibility of their adoption in their own activities.
... Existen multitud de ejemplos sobre datos de colecciones científicas publicados en data papers (véase por ejemplo,[31,194,[196][197][198]), lo que pone de manifiesto su importancia. Esto se ha conseguido en parte, gracias al incremento del número de revistas de datos interdisciplinares[195,199] donde publicar estos artículos; incluso las principales revistas científicas, como Nature, cuentan actualmente con su propia revista de datos (Scientific Data; https://www.nature.com/sdata/journal-informationdel).En el capítulo II y capitulo IV, se concluye que para poder efectuar una conservación efectiva de la biodiversidad es necesario tener en cuenta el factor social. También en el capítulo II, se identifica la reconexión humano-naturaleza como una herramienta útil para conseguir este objetivo. ...
Thesis
Full-text available
Durante siglos la comunidad científica ha recolectado animales, plantas, rocas y minerales, a escala global, con la finalidad de estudiar diversos aspectos relacionadas con la ciencia y la tecnología. Parte de este material se encuentra actualmente en colecciones científicas de todo el mundo. Las colecciones científicas son esencialmente repositorios sistematizados y accesibles para la comunidad científica, que albergan testimonios espacio temporales de la diversidad biológica y geológica conocida en el planeta Tierra. Las colecciones biológicas cuya finalidad es la investigación, preservan organismos o partes de organismos (especímenes únicos, tangibles, perdurables en el tiempo e insustituibles) y sus muestras derivadas (como tejidos conservados, semillas etc.). Los avances tecnológicos, la digitalización y mejoras en la accesibilidad a las colecciones, junto a su uso combinado con otras fuentes de datos de biodiversidad, han revolucionado la investigación en colecciones científicas en las últimas décadas. Pese a su importancia, sus contribuciones son ampliamente subestimadas tanto por la sociedad como por parte de las administraciones. El objetivo de esta tesis, es determinar el valor actual de las Colecciones Científicas mediante el estudio de especímenes físicos o sus metadatos asociados, complementados con otras fuentes de datos de biodiversidad. Para alcanzar esta meta, nos propusimos los siguientes objetivos específicos: i) analizar las posibilidades que ofrece el estudio directo de especímenes conservados en Colecciones Científicas; ii) determinar la importancia de los registros procedentes de Colecciones Científicas a la hora de evaluar las posibles afecciones del cambio de uso del suelo sobre la flora amenaza; iii) determinar la importancia de los registros procedentes de Colecciones Científicas a la hora evaluar las áreas de interés para la conservación de la flora amenazada y sus posibles impactos futuros por el cambio climático; iv) elaborar propuestas metodológicas que relacionen datos procedentes de Colecciones Científicas con otras fuentes de información ambiental, a la hora abordar cuestiones de conservación a escala global. En conjunto, en esta tesis, se han descubierto nuevas estructuras cuticulares en grillos, algunas de las cuales se han relacionado con reproducción y se ha podido determinar el grado de trogolomorfismo en el género Petaloptila; además se han analizado los efectos del uso agrícola sobre la flora amenazada española, posibles cambios futuros en los hotspots canarios ocasionados por el cambio climático y desarrollado una metodología para la conservación de las aves migratorias. Se abordan cuestiones como la importancia de la recolección, los sesgos y sus posibles soluciones, la ciencia en abierto, el valor de los datos de ciencia ciudadana o los data papers. Nuestros resultados reafirman que las colecciones científicas son una pieza clave en investigación y en educación. Es necesario potenciar el crecimiento de las colecciones, continuar con las tareas de digitalización y asegurar una dotación económica y personal a los centros de colecciones para poder continuar con su labor en un futuro.
... Fortunately, researchers can take steps to reduce or avoid the inappropriate use of data and code. Data and code can be published alongside detailed metadata information, or with a data paper in an indexed, peer-reviewed journal, including a thorough description of datasets and processes, terms and considerations for reuse, and any limitations, assumptions, caveats, and shortcomings [65]. When one is accustomed to the nuances or assumptions of methods that they frequently use, it can be easy to forget to include important information that would allow others to replicate the study. ...
Article
Full-text available
The biological sciences community is increasingly recognizing the value of open, reproducible and transparent research practices for science and society at large. Despite this recognition, many researchers fail to share their data and code publicly. This pattern may arise from knowledge barriers about how to archive data and code, concerns about its reuse, and misaligned career incentives. Here, we define, categorize and discuss barriers to data and code sharing that are relevant to many research fields. We explore how real and perceived barriers might be overcome or reframed in the light of the benefits relative to costs. By elucidating these barriers and the contexts in which they arise, we can take steps to mitigate them and align our actions with the goals of open science, both as individual scientists and as a scientific community.
... We understood that the development of Pecheker started at the end of a former period when data acquisition programs were strongly based on well delimited templates, which design necessary preceded and constrained the field work (FAO, 1999). The changeover to a new time in the world of databases occurred at the end of the 2000s with the new era of "Big Data" (Candela et al., 2015;Davenport et al., 2012;Marx, 2013;Waller and Fawcett, 2013). In the world of fisheries science, this new era is characterized by a reversal of the hierarchy between the data and the database, as we observed it at the beginning of the Pecheker project: nowadays, the design of a fishery database must not be only performed upstream the organisation of the data flow, but needs also, in same time, to be considered downstream the chain; the main goal is to obtain a tool not only able to generate the data flow, but a tool also able to absorb the data flow, however it is shaped, diverse and abundant. ...
... However, with the recent emphasis on the importance of data sharing and reuse, data journals have emerged as a new channel for this purpose. Data journals publish data papers that describe facts about data, such as data collection methods and data features, and the described data are disclosed and maintained in data repositories [3]. In data journals, data and data papers are shared in a citable format through a peer-reviewed quality assurance process so that they can be recognized as research achievements [4,5]. ...
Article
Full-text available
Purpose: This study investigated the usefulness and limitations of data journals by analyzing motivations for submission, review and publication processes according to researchers with experience publishing in data journals.Methods: Among 79 data journals indexed in Web of Science, we selected four data journals where data papers accounted for more than 20% of the publication volume and whose corresponding authors belonged to South Korean research institutes. A qualitative analysis was conducted of the subjective experiences of seven corresponding authors who agreed to participate in interviews. To analyze interview transcriptions, clusters were created by restructuring the theme nodes using Nvivo 12.Results: The most important element of data journals to researchers was their usefulness for obtaining credit for research performance. Since the data in repositories linked to data papers are screened using journals’ review processes, the validity, accuracy, reusability, and reliability of data are ensured. In addition, data journals provide a basis for data sharing using repositories and data-centered follow-up research using citations and offer detailed descriptions of data.Conclusion: Data journals play a leading role in data-centered research. Data papers are recognized as research achievements through citations in the same way as research papers published in conventional journals, but there was also a perception that it is difficult to attain a similar level of academic recognition with data papers as with research papers. However, researchers highly valued the usefulness of data journals, and data journals should thus be developed into new academic communication channels that enhance data sharing and reuse.
... In certain cases, public recognition such as badges of open data for articles following the best data sharing practices and increasing numbers of citations may promote data release by an order of magnitude 42 . Citable data papers are certainly another way forward 43,44 , because these provide access to a well-organised dataset and add to the authors' publication record. Encouraging enlisting published data sets with download and citation metrics in grant and job applications alongside with other bibliometric indicators should promote data sharing. ...
Article
Full-text available
Data sharing is one of the cornerstones of modern science that enables large-scale analyses and reproducibility. We evaluated data availability in research articles across nine disciplines in Nature and Science magazines and recorded corresponding authors’ concerns, requests and reasons for declining data sharing. Although data sharing has improved in the last decade and particularly in recent years, data availability and willingness to share data still differ greatly among disciplines. We observed that statements of data availability upon (reasonable) request are inefficient and should not be allowed by journals. To improve data sharing at the time of manuscript acceptance, researchers should be better motivated to release their data with real benefits such as recognition, or bonus points in grant and job applications. We recommend that data management costs should be covered by funding agencies; publicly available research data ought to be included in the evaluation of applications; and surveillance of data sharing should be enforced by both academic publishers and funders. These cross-discipline survey data are available from the plutoF repository.
... Data publications could also play a major role in this issue, which are stand-alone peer-reviewed publications that do not answer a research question, but instead spend the entire paper describing the creation of a dataset in rich detail (Costello, 2009;Smith, 2009;Chavan and Penev, 2011;Candela et al., 2015). In seeking to bring the work of data labeling from the background to the foreground, our work is also aligned with scholars who have focused on the often under-compensated labor of crowdworkers and have called for researchers to detail how much they pay for data labeling (Silberman et al., 2018). ...
Article
Full-text available
Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent ‘best practices’ around labeling training data were followed in applied ML publications within a single domain (social media platforms). In this paper, we expand by studying publications that apply supervised ML in a far broader spectrum of disciplines, focusing on human-labeled data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed, while acknowledging that a greater range of application fields necessarily produces greater diversity of labeling and annotation methods. Because much of machine learning research and education only focuses on what is done once a “ground truth” or “gold standard” of training data is available, it is especially relevant to discuss issues around the equally-important aspect of whether such data is reliable in the first place. This determination becomes increasingly complex when applied to a variety of specialized fields, as labeling can range from a task requiring little-to-no background knowledge to one that must be performed by someone with career expertise. Peer Review https://publons.com/publon/10.1162/qss_a_00144
... We understood that the development of Pecheker started at the end of a former period when data acquisition programs were strongly based on well delimited templates, which design necessary preceded and constrained the field work (FAO, 1999). The changeover to a new time in the world of databases occurred at the end of the 2000s with the new era of "Big Data" (Candela et al., 2015;Davenport et al., 2012;Marx, 2013;Waller and Fawcett, 2013). In the world of fisheries science, this new era is characterized by a reversal of the hierarchy between the data and the database, as we observed it at the beginning of the Pecheker project: nowadays, the design of a fishery database must not be only performed upstream the organisation of the data flow, but needs also, in same time, to be considered downstream the chain; the main goal is to obtain a tool not only able to generate the data flow, but a tool also able to absorb the data flow, however it is shaped, diverse and abundant. ...
Article
Full-text available
The scientific monitoring of the Southern Ocean French fishing industry is based on the use the Pecheker database. Pecheker is dedicated to the digital curation of the data collected on field by scientific observers and which analysis allows the scientists of the Muséum national d’Histoire naturelle institution to provide guidelines and advice for the regulation of the fishing activity, the protection of the fish stocks and the protection of the marine ecosystems. The template of Pecheker has been developed to make the database adapted to the ecosystem-based management concept. Considering the global context of biodiversity erosion, this modern approach of management aims to take account of the environmental background of the fisheries to ensure their sustainable development. Completeness and high quality of the raw data is a key element for an ecosystem-based management database such as Pecheker. Here, we present the development of this database as a case study of fisheries data curation to be shared with the readers. Full code to deploy a database based on the Pecheker template is provided in supplementary materials. Considering the success factors we could identify, we propose a discussion about how the community could build a global fisheries information system based on a network of small databases including interoperability standards.
... Although standardized metadata are a prerequisite for interoperability and comprehensive discovery services, less structured descriptions can convey necessary context for understanding and reusing data. One example of unstructured metadata are data papers, which mirror traditional scientific publication formats, but focus on an in-depth description of data collection and processing (Candela et al., 2015). ...
Thesis
Full-text available
Structured metadata are of particular importance in the context of facilitating research data (re-)use. Although research data repositories create and manage metadata records, existing research offers limited insights into the relationship between repositories and metadata for research data. Therefore, in conducting a quantitative assessment informed by metadata quality requirements, this thesis aims at making distinctive features of metadata for research data visible, specifying the potential influence of repository characteristics on metadata, and exploring changes to metadata records. The analysis showed variations in metadata completeness across repositories. Within repositories, metadata descriptions are relatively homogenous. These findings suggest that repositories have developed distinctive and consistent practices for describing data. On average, descriptions comprise 487.3 characters, and 5.52 years passed between the year a dataset was published and the metadata record was registered. Differences in the completeness of metadata records, description length and timeliness were significant across repository types and certification status, whereas differences in collection homogeneity were not significant. Overall, most metadata records in the sample were changed, which conforms with the conceptualization of metadata for research data as dynamic and changeable objects. Differences in the number of changes are significant across repository types.
... The author gets a peerreviewed article, perhaps in a high-impact journal, and the reader obtains a dataset that has been more rigorously evaluated and more fully described than it might otherwise have been (Walters, 2020). In this respect, Candela et al. (2015) studied more than 100 data journals, and described the approaches to promote for data set description, availability, citation, quality, and open access. Buneman et al. (2020) proposed to create a different system for publishing citation summaries so that authorship could be recognized by citation analyzers. ...
Article
Full-text available
Data sharing by researchers is a centerpiece of Open Science principles and scientific progress. For a sample of 6019 researchers, we analyze the extent/frequency of their data sharing. Specifically, the relationship with the following four variables: how much they value data citations, the extent to which their data-sharing activities are formally recognized, their perceptions of whether sufficient credit is awarded for data sharing, and the reported extent to which data citations motivate their data sharing. In addition, we analyze the extent to which researchers have reused openly accessible data, as well as how data sharing varies by professional age-cohort, and its relationship to the value they place on data citations. Furthermore, we consider most of the explanatory variables simultaneously by estimating a multiple linear regression that predicts the extent/frequency of their data sharing. We use the dataset of the State of Open Data Survey 2019 by Springer Nature and Digital Science. Results do allow us to conclude that a desire for recognition/credit is a major incentive for data sharing. Thus, the possibility of receiving data citations is highly valued when sharing data, especially among younger researchers, irrespective of the frequency with which it is practiced. Finally, the practice of data sharing was found to be more prevalent at late research career stages, despite this being when citations are less valued and have a lower motivational impact. This could be due to the fact that later-career researchers may benefit less from keeping their data private.
... The author gets a peerreviewed article, perhaps in a high-impact journal, and the reader obtains a dataset that has been more rigorously evaluated and more fully described than it might otherwise have been (Walters, 2020). In this respect, Candela et al. (2015) studied more than 100 data journals, and described the approaches to promote for data set description, availability, citation, quality, and open access. Buneman et al. (2020) proposed to create a different system for publishing citation summaries so that authorship could be recognized by citation analyzers. ...
Preprint
Full-text available
Data sharing by researchers is a centerpiece of Open Science principles and scientific progress. For a sample of 6019 researchers, we analyze the extent/frequency of their data sharing. Specifically, the relationship with the following four variables: how much they value data citations, the extent to which their data-sharing activities are formally recognized, their perceptions of whether sufficient credit is awarded for data sharing, and the reported extent to which data citations motivate their data sharing. In addition, we analyze the extent to which researchers have reused openly accessible data, as well as how data sharing varies by professional age-cohort, and its relationship to the value they place on data citations. Furthermore, we consider most of the explanatory variables simultaneously by estimating a multiple linear regression that predicts the extent/frequency of their data sharing. We use the dataset of the State of Open Data Survey 2019 by Springer Nature and Digital Science. Results do allow us to conclude that a desire for recognition/credit is a major incentive for data sharing. Thus, the possibility of receiving data citations is highly valued when sharing data, especially among younger researchers, irrespective of the frequency with which it is practiced. Finally, the practice of data sharing was found to be more prevalent at late research career stages, despite this being when citations are less valued and have a lower motivational impact. This could be due to the fact that later-career researchers may benefit less from keeping their data private.
... number of articles) are two data journals, which are open to all fields: Elsevier's "Data in Brief" and Springer's "Scientific Data". The change in this area is addressed in [13], where the current state is compared to the one described in [14] from 2015. The number of data journals grows slower today, while the number of published data papers increases fast. ...
Article
Full-text available
Open research data practices are a relatively new, thus still evolving part of scientific work, and their usage varies strongly within different scientific domains. In the literature, the investigation of open research data practices covers the whole range of big empirical studies covering multiple scientific domains to smaller, in depth studies analysing a single field of research. Despite the richness of literature on this topic, there is still a lack of knowledge on the (open) research data awareness and practices in materials science and engineering. While most current studies focus only on some aspects of open research data practices, we aim for a comprehensive understanding of all practices with respect to the considered scientific domain. Hence this study aims at 1) drawing the whole picture of search, reuse and sharing of research data 2) while focusing on materials science and engineering. The chosen approach allows to explore the connections between different aspects of open research data practices, e.g. between data sharing and data search. In depth interviews with 13 researchers in this field were conducted, transcribed verbatim, coded and analysed using content analysis. The main findings characterised research data in materials science and engineering as extremely diverse, often generated for a very specific research focus and needing a precise description of the data and the complete generation process for possible reuse. Results on research data search and reuse showed that the interviewees intended to reuse data but were mostly unfamiliar with (yet interested in) modern methods as dataset search engines, data journals or searching public repositories. Current research data sharing is not open, but bilaterally and usually encouraged by supervisors or employers. Project funding does affect data sharing in two ways: some researchers argue to share their data openly due to their funding agency's policy, while others face legal restrictions for sharing as their projects are partly funded by industry. The time needed for a precise description of the data and their generation process is named as biggest obstacle for data sharing. From these findings, a precise set of actions is derived suitable to support Open Data, involving training for researchers and introducing rewards for data sharing on the level of universities and funding bodies.
... [6] Data journals may overcome several barriers to open data, as they promote the publication of data papers in a way that reflects the scientific publication model (Candela et al., 2015). Another significant issue considered by the librarians' community to communicate about the availability and accessibility of substantial quantities of textual material, in particular, full texts of journal papers and books under the title of alternative metrics (often ...
Article
Full-text available
Purpose –The purpose of the paper is to recognize different errands and responsibilities that the UAE academic libraries must undertake towards the trendy changes in researchers’ information seeking behavior, and fulfill to the advancements carried in by the emergence of Research 2.0. Design/methodology/approach– The researchers comprehensively reviewed the appropriate literature related to the academic libraries’ activities viz., information literacy (IL) education, research data services (RDS), awareness-raising, and support individual faculty members in the United Arab Emirates. Findings – The UAE librarians organize information literacy education for the students of all programs, primarily to research scholars and faculty in both Arabic and English languages. The faculty members are supported with discipline-specific databases, print and digital versions of books and journals along with other online services. Regarding raising awareness, library professionals in the country actively involved in the transformation of all types of knowledge sources and their updates to all stakeholders of the education, whereas research data service is slowly gearing up in many academic libraries. Originality/value – The paper proposes to be an addition to the body of knowledge about academic library support through information literacy, awareness raising, faculty attention and research data services to researchers in the UAE.
Article
Under the data-driven research paradigm, research software has come to play crucial roles in nearly every stage of scientific inquiry. Scholars are advocating for the formal citation of software in academic publications, treating it on par with traditional research outputs. However, software is hardly consistently cited: one software entity can be cited as different objects, and the citations can change over time. These issues, however, are largely overlooked in existing empirical research on software citation. To fill the above gaps, the present study compares and analyzes a longitudinal dataset of citation formats of all R packages collected in 2021 and 2022, in order to understand the citation formats of R-language packages, important members in the open-source software family, and how the citations evolve over time. In particular, we investigate the different document types underlying the citations and what metadata elements in the citation formats changed over time. Furthermore, we offer an in-depth analysis of the disciplinarity of journal articles cited as software (software papers). By undertaking this research, we aim to contribute to a better understanding of the complexities associated with software citation, shedding light on future software citation policies and infrastructure.
Article
Full-text available
A data paper describing research data helps credit researchers producing the data while helping other researchers verify previous research and start new research by reusing the data. Publishing a data paper and depositing data to a public data repository are increasing with these benefits. A domestic academic society that plans to publish data papers faces challenges, including timely acquiring tremendous knowledge concerning data paper structures and templates, peer review policy and process, and trustworthy data repositories, as a data paper has different characteristics, unlike a research paper. However, the need for more research and information concerning the critical elements of data paper and the peer-review process makes it difficult to operate for data paper review and publication. To address these issues, we propose essential concepts of the data paper and the data paper peer-review, including the process model of the peer-review with in-depth analysis of five data journals’ data paper templates, articles, and other guides worldwide. Academic societies intending to publish or add data papers as a new type of paper may establish policies and define a peer-review process by adopting the proposed conceptual models, effectively streamlining the preparation of data paper publication.
Article
Open data as an integral part of the open science movement enhances the openness and sharing of scientific datasets. Nevertheless, the normative utilization of data journals, data papers, scientific datasets, and data citations necessitates further research. This study aims to investigate the citation practices associated with data papers and to explore the role of data papers in disseminating scientific datasets. Dataset accession numbers from NCBI databases were employed to analyze the prevalence of data citations for data papers from PubMed Central. A dataset citation practice identification rule was subsequently established. The findings indicate a consistent growth in the number of biomedical data journals published in recent years, with data papers gaining attention and recognition as both publications and data sources. Although the use of data papers as citation sources for data remains relatively rare, there has been a steady increase in data paper citations for data utilization through formal data citations. Furthermore, the increasing proportion of datasets reported in data papers that are employed for analytical purposes highlights the distinct value of data papers in facilitating the dissemination and reuse of datasets to support novel research.
Article
Full-text available
The exponential increase of published data and the diversity of systems require the adoption of good practices to achieve quality indexes that enable discovery, access, and reuse. To identify good practices, an integrative review was used, as well as procedures from the ProKnow-C methodology. After applying the ProKnow-C procedures to the documents retrieved from the Web of Science, Scopus and Library, Information Science & Technology Abstracts databases, an analysis of 31 items was performed. This analysis allowed observing that in the last 20 years the guidelines for publishing open government data had a great impact on the Linked Data model implementation in several domains and currently the FAIR principles and the Data on the Web Best Practices are the most highlighted in the literature. These guidelines presents orientations in relation to various aspects for the publication of data in order to contribute to the optimization of quality, independent of the context in which they are applied. The CARE and FACT principles, on the other hand, although they were not formulated with the same objective as FAIR and the Best Practices, represent great challenges for information and technology scientists regarding ethics, responsibility, confidentiality, impartiality, security, and transparency of data. Keywords: Best practices; Data publishing on the Web; Linked Open Data; Data quality
Thesis
Full-text available
The most important element of scientific studies are research data. The theories and hypotheses which are put forward in researches are shaped and/or proved depending on the data. Therefore, all scientists produce data during their research. By sharing these data, scientific productivity is ensured as well as verification and transparency of the researches. For this reason, it is important to share and to open the research data. Sharing data is possible through effective management of data. Research data management consists of several interrelated stages. These stages can be specified as planning; data collection and creation; metadata creation; storage, preservation and security; sharing. Each stage should be considered as an important process that shapes the scientific activities of academicians. These processes which are resulted in sharing enable data reuse. This study aims to reveal the attitudes of researchers about the data management processes. By these means, it will be possible to propose a model related to the management of research data at Ankara University which can be a sample for similar institutions. For this purpose, it was tried to determine the data management attitudes of researchers who conducted Scientific Research Projects (BAP) in Ankara University between 2013-2018 by using survey technique. Total number of participants who constituted research population is 376. A total of 194 BAP executives participated in the study and the number of participants was calculated according to the disciplines was reached and the sample was statistically represented in the target population. Followings are the important conclusions reached relating to data management process of the researchers participated to the study: •Most of the academicians do not have any data management plan and planning activities do not seem to be at the desired level. •Familiarity with metadata is low and standards are not used. •Researchers who store large amounts of data indefinitely use their personal storage space extensively and do not considered the associated costs. Also, the use of institutional storage area low in storage and backup activities during the research process. This situation threatens the data security. •The researchers find sufficient to share data only through publication. In this context, it is difficult to say that the participants share their research data. •One of the biggest obstacles to sharing data is ethical concerns about plagiarism which will result in the loss of publication and career opportunities. Additionally, the structure of the academic system that brings awards, prestige and reputation only through publication makes sharing the publication preferable to sharing the data. According to the findings, it is seen that researchers do not plan data management processes inclusively and they need services, trainings and regulations in the context of these processes. In this sense; two fundamental hypotheses have been confirmed that "Academicians conducting Scientific Research Project in Ankara University do not have a comprehensive data management plan for the management of research data" and "Academicians conducting Scientific Research Project in Ankara University need services, trainings and regulations (policies, directives, instructions, etc.) in the context of the management of research data". Unshared data cannot be reused and it is impossible for the data that is not reused to become a project idea. With the conclusions listed above it shows that there is one-dimensional data management process rather than a life cycle process in which data is reused. Instead of this, the model proposed in our study aims to realize open research data and make it reusable again.
Thesis
Full-text available
Energy appears as a major issue in the face of the current socio-ecological crisis. Energy modelling can be used to explore the design and management possibilities of components and systems, and thus to discern sustainable energy pathways. However, historical energy modelling and the main current approaches are proprietary and lack transparency, although the emergence of open energy modelling is promising. This thesis introduces the practices, interests and obstacles of open energy modelling, before presenting the ORUCE (Open and Reproducible Use Cases For Energy) method, designed as a transferable process to make these practices accessible to researchers in the field. This method focuses in particular on use cases as good vectors for reproducibility and capitalising on knowledge. Actual use cases in contact with energy stakeholders are presented, on the topics of waste heat recovery and photovoltaic self-consumption, illustrating the variety of uses of the ORUCE method. Finally, a concept of a collaborative open energy modelling platform is presented. This concept was refined in a user experience inquiry, and the resulting platform aims to make energy studies and associated resources accessible to stakeholders in research, public authorities and citizen collectives.
Article
Full-text available
Introdução: A estruturação dos conjuntos de dados sobre Biodiversidade está sendo divulgada em uma linguagem reservada para a descrição do substrato da comunicação científica denominada Data Papers, isto é, os dados que sustentam pesquisas científicas nesse campo do conhecimento, independentemente do modelo tradicional de comunicação científica. Objetivo: Analisar as publicações em formato de Data Papers no campo da Biodiversidade em âmbito internacional. Metodologia: Pesquisa documental de abordagem qualitativa e aplica técnicas para coleta e exame das informações por meio de Análise de Conteúdo. Verifica a situação de 33 revistas apontados pela Global Biodiversity Information Facility (GBIF) que oferecem publicações em formato de Data Papers. Identifica-se: os temas correlatos à biodiversidade; os tipos de licenças, indexadores, a quantidade de Data Papers publicados, os títulos que possuem acesso aberto ou fechado, as revistas que mais publicam Data Papers sobre Biodiversidade e o idioma que foram publicados. Resultados: O número em Data Papers teve crescimento exponencial entre 2017 até maio de 2022 logo, os artigos sobre o campo da Biodiversidade também têm aumentado em diversos temas que envolvem todo o seu ecossistema. Conclusão: Os Data Papers analisados se caracterizam como documentos revisados por pares e representam conjuntos de dados indexados com padrões de metadados adequados para preservar digitalmente os dados registrados nas revistas que foram contempladas na presente análise.
Preprint
Full-text available
The biological sciences community is increasingly recognizing the value of open, reproducible, and transparent research practices for science and society at large. Despite this recognition, many researchers remain reluctant to share their data and code publicly. This hesitation may arise from knowledge barriers about how to archive data and code, concerns about its re-use, and misaligned career incentives. Here, we define, categorise, and discuss barriers to data and code sharing that are relevant to many research fields. We explore how real and perceived barriers might be overcome or reframed in light of the benefits relative to costs. By elucidating these barriers and the contexts in which they arise, we can take steps to mitigate them and align our actions with the goals of open science, both as individual scientists and as a scientific community.
Article
Digital data is a basic form of research product for which citation, and the generation of credit or recognition for authors, are still not well understood. The notion of data credit has therefore recently emerged as a new measure, defined and based on data citation groundwork. Data credit is a real value representing the importance of data cited by a research entity. We can use credit to annotate data contained in a curated scientific database and then as a proxy of the significance and impact of that data in the research world. It is a method that, together with citations, helps recognize the value of data and its creators. In this paper, we explore the problem of Data Credit Distribution, the process by which credit is distributed to the database parts responsible for producing data being cited by a research entity. We adopt as use case the IUPHAR/BPS Guide to Pharmacology (GtoPdb), a widely-used curated scientific relational database. We focus on Select-Project-Join (SPJ) queries under bag semantics, and we define three distribution strategies based on how-provenance, responsibility, and the Shapley value. Using these distribution strategies, we show how credit can highlight frequently used database areas and how it can be used as a new bibliometric measure for data and their curators. In particular, credit rewards data and authors based on their research impact, not only on the citation count. We also show how these distribution strategies vary in their sensitivity to the role of an input tuple in the generation of the output data and reward input tuples differently.
Article
Full-text available
This article proposes a 4-step model for scientific dissemination that aims to promote evidence-based professional practice in Operations Management or Human Resource Management as well as research with a more transparent and reproducible process. These 4 steps include:1 social network announcements,2 dissemination to scientific journals, 3 dissemination to social networks, and 4 scientific dissemination to professional journals. Central to the 4-step model is a three-stage publication process within the second step, which adds an additional stage to the two previously proposed (Marin-Garcia, 2015). These three publication stages begin with a protocol paper, are followed by a data paper, and finish with a traditional article. Each stage promotes research with merit which is citable and recognizable as such before the scientific evaluation bodies. As two of these stages are largely unknown within the fields of Business and Management, I define the details of a protocol paper and a data paper including their contents. In addition, I provide examples of both papers as well as the other steps of the science dissemination model. This model can be adopted by researchers as a means of achieving greater impact and transfer of research results. This work intends to help researchers to understand, to evaluate, and to make better decisions about how their research reaches society at large outside of academia.In this way, WPOM aligns with the recommendations of several leading journals in the field of business management on the need to promote transparent, accessible, and replicable science (Beugelsdijk et al., 2020). WPOM goes one step further in compliance with this direction by having relevant journals that not only accept, but also actively encourage the publication of protocol papers and data papers. WPOM strives to pioneer in this field of Business and Management.This article also explores the potential prevalence of protocol papers and data papers within the set of all articles published in journals indexed in Clarivate Web of Science and Scopus.With this editorial, WPOM is committed to promoting this model by accepting for review any of the three types of scientific contributions including protocol papers, data papers, and traditional papers.
Chapter
This chapter addresses questions related to the complex relationships between information, data, and human beings, frequently treated as the foundation of information and data ecologies. We focus on issues that have varied interfaces with literacies, but are not literacies in the proper sense of the word. The first part of this chapter focuses on openness, reproducibility, credibility, and sharing of digital data. Attention is given to research data’s Findability, Accessibility, Interoperability, and Reuse. There is also a short discussion of the relationship between research data and copyright. In the second part, data journals and data papers are targeted, and attention is paid to the problems of measuring and evaluating research data. The third part touches on varied issues, such as possible coauthorships between librarians and researchers, research data management, reputation management, information and data overload, posttruth phenomena and the influence of posttruth, as well as the deluge of publications related to the COVID-19 pandemic.
Chapter
This chapter acquaints the reader with the general and often changing nature of research on data quality. It is emphasized that research data quality is closely related to business data; however, the goals of scholarly research have become different, especially as the environments shaping the two are different. From among data quality’s attributes, trust receives particular attention. Technical and scientific quality, the relationship of data quality to data reuse, and other quality factors are also examined, including big data quality, intrinsic and extrinsic data quality, and the semiotic representation of quality attributes, as well as their time-related dimensions and retrievability. Although data reuse was addressed in an earlier chapter, its relationship to data quality is touched on in this chapter as well. Sharing the previously mentioned origin with data quality and being closely associated with it, data governance is also portrayed.
Chapter
Research data management (RDM) should be central for both researchers and academic libraries. The latter provide related services that are described in this chapter. RDM embraces the entire research cycle, aiming at making the research process as efficient as possible and facilitating cooperation with other players involved in it. To get a clear picture of the nature of RDM services, a short history of the academic library’s readiness and involvement is described. Skills and competencies necessary for serving research and researchers are enumerated, followed by a portrayal of the planning and building of services, giving particular attention to the research data life cycle and to the importance of data management plans. The tasks related to data reference, data citation, and data retrieval are presented. The relationship between RDM and data curation, as well as between RDM and research support services, is characterized.
Article
As data becomes omnipresent in the scientific system, a new academic genre aiming to describe data objects (data papers) and the venue to publish these articles (data journals) gradually emerged from the end of the 2000s. However, it is largely unknown how much these scientific outputs are indexed in scientific databases, which has greatly prevented them from being thoroughly studied in large‐scale, quantitative studies. This poster presents our preliminary efforts to address this gap, by compiling a list of data journals that primarily accept data papers (i.e., exclusively data journals) and examining their presence in four major scientific databases. Our results indicate that exclusively data journals are comprehensively indexed in Crossref and Dimensions, two relatively new scientific databases, which can be used to conduct future studies on data papers and journals. The next steps of our project are also discussed in this poster.
Article
The data paper is an emerging academic genre that focuses on the description of research data objects. However, there is a lack of empirical knowledge about this rising genre in quantitative science studies, particularly from the perspective of its linguistic features. To fill this gap, this research aims to offer a first quantitative examination of which rhetorical moves—rhetorical units performing a coherent narrative function—are used in data paper abstracts, as well as how these moves are used. To this end, we developed a new classification scheme for rhetorical moves in data paper abstracts by expanding a well‐received system that focuses on English‐language research article abstracts. We used this expanded scheme to classify and analyze rhetorical moves used in two flagship data journals, Scientific Data and Data in Brief. We found that data papers exhibit a combination of introduction, method, results, and discussion‐ and data‐oriented moves and that the usage differences between the journals can be largely explained by journal policies concerning abstract and paper structure. This research offers a novel examination of how the data paper, a data‐oriented knowledge representation, is composed, which greatly contributes to a deeper understanding of research data and its publication in the scholarly communication system.
Article
Full-text available
오픈 사이언스 시대 연구데이터의 공개를 가속화하고 접근성 및 인용가능성 개선 및 연구데이터에 대한 표준화된 기술 문서 제공은 또 다른 과학적 발견에 기여할 수 있어 데이터 출판이 주목을 받고 있다. 또한 출판된 데이터 역시 연구논문과 동등한 지위를 유지할 수 있는 방안으로 데이터 논문이 대두되고, 새로운 학술출판의 유형으로 데이터 저널 발간이 증가추세에 있다. 특히 생태학 분야는 대규모 연구데이터가 생산되고 관리되어야 하는 분야로 전세계적으로 데이터 저널 발간이 활발하다. 반면 국내에는 데이터 저널 연구가 초기 단계이고, 생태학 분야 데이터 저널이 전무하다. 이에 본 연구에서는 생태 분야의 데이터 저널을 발간하기 위한 전략을 탐색하고 제시하였다. 먼저 국내외 데이터 저널 발간 현황과 국내 저널 출판 현황을 조사하였다. 또한 학술출판 및 오픈액세스 정책 전문가, 생태학 학술지 발간 전문가로 구성된 전문가그룹 인터뷰를 수행하였다. 현재 데이터 저널 발간 인프라가 제대로 구축되지 않고 이에 대한 평가체제가 갖추어 지지 않은 국내 학술출판 관행을 반영하고 국내외 조사결과와 전문가 FGI를 실시 결과를 바탕으로 생태학 분야 데이터 저널 출간 방향, 데이터 논문 투고지침, 저널 구성 및 발행주기, 저널 편집위원 구성, 원고의 수급 측면에서 전략을 제시하였다.
Chapter
The Writing Center is the newest, innovative service, established as a project-based initiative within the organization of the Library of Corvinus University Budapest. The present and future goals of the Writing Center require a wide spectrum of services if wanting to cater for the needs of doctoral students and faculty members. This includes traditional and novel tasks, such as fostering publication activities, combating information overload, being familiar with abstract writing, and Open Access offered to experienced and to early career researchers. The goal in this chapter is to demonstrate how the learning and research support activities of a library, comprising curricular and extra-curricular courses, trainings, and consultations can be integrated into the knowledge structures of the university as a whole. The authors place special emphasis on the role of group-based and individual mentoring throughout a university career, spanning from student to researcher, and on the development of transversal skills through the training programs of the Writing Center.
Research
Full-text available
Scientific research revolves around the production, analysis, storage, management, and re-use of data. Data sharing offers important benefits for scientific progress and advancement of knowledge. However, several limitations and barriers in the general adoption of data sharing are still in place. Probably the most important challenge is that data sharing is not yet very common among scholars and is not yet seen as a regular activity among scientists, although important efforts are being invested in promoting data sharing. This report seeks to further explore the possibilities of metrics for datasets (i.e. the creation of reliable data metrics) and an effective reward system that aligns the main interests of the main stakeholders involved in the process. The report reviews the current literature on data sharing and data metrics. It presents interviews with the main stakeholders on data sharing and data metrics. It also analyses the existing repositories and tools in the field of data sharing that have special relevance for the promotion and development of data metrics
Article
Full-text available
PREFACE The growth in the capacity of the research community to collect and distribute data presents huge opportunities. It is already transforming old methods of scientific research and permitting the creation of new ones. However, the exploitation of these opportunities depends upon more than computing power, storage, and network connectivity. Among the promises of our growing universe of online digital data are the ability to integrate data into new forms of scholarly publishing to allow peer-examination and review of conclusions or analysis of experimental and observational data and the ability for subsequent researchers to make new analyses of the same data, including their combination with other data sets and uses that may have been unanticipated by the original producer or collector. The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented. Recognizing the needs for better data referencing and citation practices and investing effort to address those needs has come at different rates in different fields and disciplines. As competing conventions and practices emerge in separate communities, inconsistencies and incompatibilities can interfere with promoting the sharing and use of research data. In order to reconcile this problem, sharing experiences across communities may be necessary, or at least helpful, to achieving the full potential of published data.
Article
Full-text available
International attention to scientific data continues to grow. Opportunities emerge to re-visit long-standing approaches to managing data and to critically examine new capabilities. We describe the cognitive importance of metaphor. We describe several metaphors for managing, sharing, and stewarding data and examine their strengths and weaknesses. We particularly question the applicability of a "publication" approach to making data broadly available. Our preliminary conclusions are that no one metaphor satisfies enough key data system attributes and that multiple metaphors need to co-exist in support of a healthy data ecosystem. We close with proposed research questions and a call for continued discussion.
Article
Full-text available
PREFACE The growth in the capacity of the research community to collect and distribute data presents huge opportunities. It is already transforming old methods of scientific research and permitting the creation of new ones. However, the exploitation of these opportunities depends upon more than computing power, storage, and network connectivity. Among the promises of our growing universe of online digital data are the ability to integrate data into new forms of scholarly publishing to allow peer-examination and review of conclusions or analysis of experimental and observational data and the ability for subsequent researchers to make new analyses of the same data, including their combination with other data sets and uses that may have been unanticipated by the original producer or collector. The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented. Recognizing the needs for better data referencing and citation practices and investing effort to address those needs has come at different rates in different fields and disciplines. As competing conventions and practices emerge in separate communities, inconsistencies and incompatibilities can interfere with promoting the sharing and use of research data. In order to reconcile this problem, sharing experiences across communities may be necessary, or at least helpful, to achieving the full potential of published data.
Article
Full-text available
The two pillars of the modern scientific communication are Data Centers and Research Digital Libraries (RDLs), whose technologies and admin staff support researchers at storing, curating, sharing, and discovering the data and the publications they produce. Being realized to maintain and give access to the results of complementary phases of the scientific research process, such systems are poorly integrated with one another and generally do not rely on the strengths of the other. Today, such a gap hampers achieving the objectives of the modern scientific communication, that is, publishing, interlinking, and discovery of all outcomes of the research process, from the experimental and observational datasets to the final paper. In this work, we envision that instrumental to bridge the gap is the construction of “Scientific Communication Infrastructures”. The main goal of these infrastructures is to facilitate interoperability between Data Centers and RDLs and to provide services that simplify the implementation of the large variety of modern scientific communication patterns.
Article
Full-text available
Dissemination of research outcomes via traditional publications, in either paper or digital form, does not suffice to satisfy modern e-Research and e-Science scholarly communication requirements, which demand for sharing and immediate access to scientific publications, datasets, or experimental con-text of research activities. "Enhanced publications" emerged as a possible mean to address these new needs. They are digital publications, with own "identity" and "descriptive metadata", made of several "parts": a mandatory publication "text" plus "related material" (e.g. datasets, other publications, images, tables, workflows, devices). The state-of-the-art on enhanced publica-tions has today reached the point where some kind of common understanding is required, in order to provide the tools and language for scientists to com-pare, analyze, or simply discuss the multitude of solutions in the field. In this paper we propose a classification of enhanced publication solutions based on the structure and semantics of the given enhanced publications ("document model features") and the functionality they support to manage and consume the given enhanced publications ("consuming purposes").
Article
Full-text available
Research on bias in peer review examines scholarly communication and funding processes to assess the epistemic and social legitimacy of the mechanisms by which knowledge communities vet and self-regulate their work. Despite vocal concerns, a closer look at the empirical and methodological limitations of research on bias raises questions about the existence and extent of many hypothesized forms of bias. In addition, the notion of bias is predicated on an implicit ideal that, once articulated, raises questions about the normative implications of research on bias in peer review. This review provides a brief description of the function, history, and scope of peer review; articulates and critiques the conception of bias unifying research on bias in peer review; characterizes and examines the empirical, methodological, and normative claims of bias in peer review research; and assesses possible alternatives to the status quo. We close by identifying ways to expand conceptions and studies of bias to contend with the complexity of social interactions among actors involved directly and indirectly in peer review.
Article
Full-text available
The NERC Science Information Strategy Data Citation and Publication project aims to develop and formalise a method for formally citing and publishing the datasets stored in its environmental data centres. It is believed that this will act as an incentive for scientists, who often invest a great deal of effort in creating datasets, to submit their data to a suitable data repository where it can properly be archived and curated. Data citation and publication will also provide a mechanism for data producers to receive credit for their work, thereby encouraging them to share their data more freely.
Article
Full-text available
INTRODUCTION Data citation should be a necessary corollary of data publication and reuse. Many researchers are reluctant to share their data, yet they are increasingly encouraged to do just that. Reward structures must be in place to encourage data publication, and citation is the appropriate tool for scholarly acknowledgment. Data citation also allows for the identification, retrieval, replication, and verification of data underlying published studies. METHODS This study examines author behavior and sources of instruction in disciplinary and cultural norms for writing style and citation via a content analysis of journal articles, author instructions, style manuals, and data publishers. Instances of data citation are benchmarked against a Data Citation Adequacy Index. RESULTS Roughly half of journals point toward a style manual that addresses data citation, but the majority of journal articles failed to include an adequate citation to data used in secondary analysis studies. DISCUSSION Full citation of data is not currently a normative behavior in scholarly writing. Multiplicity of data types and lack of awareness regarding existing standards contribute to the problem. CONCLUSION Citations for data must be promoted as an essential component of data publication, sharing, and reuse. Despite confounding factors, librarians and information professionals are well-positioned and should persist in advancing data citation as a normative practice across domains. Doing so promotes a value proposition for data sharing and secondary research broadly, thereby accelerating the pace of scientific research.
Article
Full-text available
This article discusses recent innovations in how peer review is conducted in light of the various functions journals fulfill in scholarly communities.
Article
Full-text available
This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax.
Article
Full-text available
Concerns over data quality impede the use of public biodiversity databases and subsequent benefits to society. Data publication could follow the well-established publication process: with automated quality checks, peer review, and editorial decisions. This would improve data accuracy, reduce the need for users to 'clean' the data, and might increase data use. Authors and editors would get due credit for a peer-reviewed (data) publication through use and citation metrics. Adopting standards related to data citation, accessibility, metadata, and quality control would facilitate integration of data across data sets. Here, we propose a staged publication process involving editorial and technical quality controls, of which the final (and optional) stage includes peer review, the most meritorious publication standard in science.
Article
Full-text available
The 'Berlin Declaration' was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the 'Berlin Declaration' should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community.
Article
Full-text available
With the launch of GigaScience journal, here we provide insight into the accompanying database GigaDB, which allows the integration of manuscript publication with supporting data and tools. Reinforcing and upholding GigaScience's goals to promote open-data and reproducibility of research, GigaDB also aims to provide a home, when a suitable public repository does not exist, for the supporting data or tools featured in the journal and beyond.
Article
Full-text available
This report presents recent metadata developments for Dryad, a digital repository hosting datasets underlying publications in the field of evolutionary biology. We review our efforts to bring the Dryad application profile into conformance with the Singapore Framework and discuss practical issues underlying the application profile implementation in a DSpace environment. The report concludes by outlining the next steps planned as Dryad moves into the next phase of development.
Article
Full-text available
Existing norms for scientific communication are rooted in anachronistic practices of bygone eras, making them needlessly inefficient. We outline a path that moves away from the existing model of scientific communication to improve the efficiency in meeting the purpose of public science - knowledge accumulation. We call for six changes: (1) full embrace of digital communication, (2) open access to all published research, (3) disentangling publication from evaluation, (4) breaking the "one article, one journal" model with a grading system for evaluation and diversified dissemination outlets, (5) publishing peer review, and, (6) allowing open, continuous peer review. We address conceptual and practical barriers to change, and provide examples showing how the suggested practices are being used already. The critical barriers to change are not technical or financial; they are social. While scientists guard the status quo, they also have the power to change it.
Article
Full-text available
Free and open access to primary biodiversity data is essential for informed decision-making to achieve conservation of biodiversity and sustainable development. However, primary biodiversity data are neither easily accessible nor discoverable. Among several impediments, one is a lack of incentives to data publishers for publishing of their data resources. One such mechanism currently lacking is recognition through conventional scholarly publication of enriched metadata, which should ensure rapid discovery of 'fit-for-use' biodiversity data resources. We review the state of the art of data discovery options and the mechanisms in place for incentivizing data publishers efforts towards easy, efficient and enhanced publishing, dissemination, sharing and re-use of biodiversity data. We propose the establishment of the 'biodiversity data paper' as one possible mechanism to offer scholarly recognition for efforts and investment by data publishers in authoring rich metadata and publishing them as citable academic papers. While detailing the benefits to data publishers, we describe the objectives, work flow and outcomes of the pilot project commissioned by the Global Biodiversity Information Facility in collaboration with scholarly publishers and pioneered by Pensoft Publishers through its journals Zookeys, PhytoKeys, MycoKeys, BioRisk, NeoBiota, Nature Conservation and the forthcoming Biodiversity Data Journal. We then debate further enhancements of the data paper beyond the pilot project and attempt to forecast the future uptake of data papers as an incentivization mechanism by the stakeholder communities. We believe that in addition to recognition for those involved in the data publishing enterprise, data papers will also expedite publishing of fit-for-use biodiversity data resources. However, uptake and establishment of the data paper as a potential mechanism of scholarly recognition requires a high degree of commitment and investment by the cross-sectional stakeholder communities.
Conference Paper
Full-text available
Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. We argue in this position paper that data mining, statistical analysis, machine learning, information retrieval, data integration, etc., are necessary solutions to deal with software data. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created. In order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners. Software engineering researchers can get inspired by other fields, such as, bioinformatics, where results of mining and analyzing biological data are often stored in databases shared across the world.
Article
Full-text available
Concerns that the growing competition for funding and citations might distort science are frequently discussed, but have not been verified directly. Of the hypothesized problems, perhaps the most worrying is a worsening of positive-outcome bias. A system that disfavours negative results not only distorts the scientific literature directly, but might also discourage high-risk projects and pressure scientists to fabricate and falsify their data. This study analysed over 4,600 papers published in all disciplines between 1990 and 2007, measuring the frequency of papers that, having declared to have “tested” a hypothesis, reported a positive support for it. The overall frequency of positive supports has grown by over 22% between 1990 and 2007, with significant differences between disciplines and countries. The increase was stronger in the social and some biomedical disciplines. The United States had published, over the years, significantly fewer positive results than Asian countries (and particularly Japan) but more than European countries (and in particular the United Kingdom). Methodological artefacts cannot explain away these patterns, which support the hypotheses that research is becoming less pioneering and/or that the objectivity with which results are produced and published is decreasing.
Article
Full-text available
Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers--data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results. A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the short- and long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region. Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.
Article
Full-text available
The rapid growth of the internet and related technologieshas already had a tremendous impact on scientific publish-ing. This journal has given attention to open accesspublishing (Ascoli 2005; Bug 2005; Merkel-Sobotta 2005;Velterop 2005), to reforming the review process (DeSchutter 2007; Saper and Maunsell 2009) and to theproblems with getting authors to share their data (Ascoli2006; Kennedy 2006; Teeters et al. 2008; Van Horn andBall 2008) and how to enhance the use of shared data(Gardner et al. 2008; Kennedy 2010).But the impact of the internet and data warehousing onscience will be much larger and there is a growing interestin how these technologies can be leveraged to improve thescientific process (Hey et al. 2009). Let’s travel towards thefuture and imagine that not only the tools and infrastructureare available to share scientific data at any time after it isgenerated, but that it has also become standard practice forthe community to do so. How this can be achieved is notthe focus of this editorial, instead I want to speculate on therelationship between scientific papers and data repositories(Bourne 2005, 2010; Cinkosky et al. 1991) in such anenvironment. It is important for the scientific community todiscuss these issues now because, while these technologiesare expected to radically improve the scientific process,they will also change the way in which our work isevaluated.I propose that we should distinguish data publishingfrom paper publishing (Callaghan et al. 2009; Cinkosky etal. 1991) and, when established for specific scientific fields,promote data publishing as the primary outlet for much ofthe scientific output.A good metaphor for data publishing is to look at howcomplete organism genomic sequences are published inhigh impact journals now (Srivastava et al. 2010; Warren etal. 2010). Such papers serve really two goals: to announcethe availability of the genome sequence in GenBank and todescribe some scientific conclusions based on the analysisof the genome. The perceived importance of the latterdetermines whether a high impact journal will accept thepaper and therefore the authors spend a lot of effort inhyping this part. But are these two components irrevocablyintertwined? Couldn’t one just publish the data, in this caseby depositing the complete sequence in a database, andannounce this fact through a form of publication? Theanalysis can then be published separately at a later time ordistributed over different papers, etc. This is not donebecause at present the publication of the paper in the highimpact journal is considered to be the optimal reward forthe researchers, both for career advancement and forsuccess in obtaining new grants (Bourne 2005). I call datapublication a method where the data providers, who may bedifferent from the people who analyze the data, receivecredit for their work when they deposit the sequence in thedatabase and where subsequent access to the data is trackedand considered equivalent to paper citation.There are a number of advantages to considering datapublication as a separate process. First, credit assignmentbecomes more explicitly defined among the authors.Several journals (like Nature, Science, the PLoS series,etc.) have taken steps towards a more granular creditassignment by asking authors to explicitly list their
Article
Full-text available
When I took on the role of Editor-in-Chief of this open-access journal, I began, for the first time, to think about scholarly communication beyond submitting my papers and getting them published. This thinking led to previous Perspectives [1]–[3], all of which shared an underlying theme—there are many opportunities to achieve better dissemination and comprehension of our science, and as producers of that output I believe authors have a responsibility to see it used in the best possible way.
Article
Full-text available
The demands of data-intensive science represent a challenge for diverse scientific communities.
Article
High-throughput scientific instruments are generating massive amounts of data. Today one of the main challenges faced by researchers is to make the best use of the world’s growing wealth of data. Data (re)usability is becoming a distinct characteristic of modern scientific practice, as it allows reanalysis of evidence, reproduction and verification of results, minimizing duplication of effort, and building on the work of others. The paper addresses the technological dimension of data reusability: the scientific data universe, the impediments of data (re)reuse; the data publication process as a bridge between data author and user and the relevant technologies enabling this process. 1
Article
The DataCite Metadata Scheme is being designed to support dataset citation and discovery. It features a small set of mandatory properties, and an additional set of optional properties for more detailed description. Among these is a powerful mechanism for describing relationships between the registered dataset and other objects. The scheme is supported organizationally and will allow for community input on an ongoing basis.
Article
Article processing charges (APCs) are a central mechanism for funding open access (OA) scholarly publishing. We studied the APCs charged and article volumes of journals that were listed in the Directory of Open Access Journals as charging APCs. These included 1,370 journals that published 100,697 articles in 2010. The average APC was $906 U.S. dollars (USD) calculated over journals and $904 USD calculated over articles. The price range varied between $8 and $3,900 USD, with the lowest prices charged by journals published in developing countries and the highest by journals with high-impact factors from major international publishers. Journals in biomedicine represent 59% of the sample and 58% of the total article volume. They also had the highest APCs of any discipline. Professionally published journals, both for profit and nonprofit, had substantially higher APCs than journals published by societies, universities, or scholars/researchers. These price estimates are lower than some previous studies of OA publishing and much lower than is generally charged by subscription publishers making individual articles OA in what are termed hybrid journals. © 2012 Wiley Periodicals, Inc.
Chapter
“To make progress in science, we need to be open and share.” This quote from Neelie Kroes (2012), vice president of the European Commission describes the growing public demand for an Open Science. Part of Open Science is, next to Open Access to peer-reviewed publications, the Open Access to research data, the basis of scholarly knowledge. The opportunities and challenges of Data Sharing are discussed widely in the scholarly sector. The cultures of Data Sharing differ within the scholarly disciplines. Well advanced are for example disciplines like biomedicine and earth sciences. Today, more and more funding agencies require a proper Research Data Management and the possibility of data re-use. Many researchers often see the potential of Data Sharing, but they act cautiously. This situation shows a clear ambivalence between the demand for Data Sharing and the current practice of Data Sharing. Starting from a baseline study on current discussions, practices and developments the article describe the challenges of Open Research Data. The authors briefly discuss the barriers and drivers to Data Sharing. Furthermore, the article analyses strategies and approaches to promote and implement Data Sharing. This comprises an analysis of the current landscape of data repositories, enhanced publications and data papers. In this context the authors also shed light on incentive mechanisms, data citation practises and the interaction between data repositories and journals. In the conclusions the authors outline requirements of a future Data Sharing culture.
Article
This paper examines how scientists working in government agencies in the U.S. are reacting to the “ethos of sharing” government-generated data. For scientists to leverage the value of existing government data sets, critical data sets must be identified and made as widely available as possible. However, government data sets can only be leveraged when policy makers first assess the value of data, in much the same way they decide the value of grants for research outside government. We argue that legislators should also remove structural barriers to interoperability by funding technical infrastructure according to issue clusters rather than administrative programs. As developers attempt to make government data more accessible through portals, they should consider a range of other nontechnical constraints attached to the data. We find that agencies react to the large number of constraints by mostly posting their data on their own websites only rather than in data portals that can facilitate sharing. Despite the nontechnical constraints, we find that scientists working in government agencies exercise some autonomy in data decisions, such as data documentation, which determine whether or not the data can be widely shared. Fortunately, scientists indicate a willingness to share the data they collect or maintain. However, we argue further that a complete measure of access should also consider the normative decisions to collect (or not) particular data.
Article
Many authors appear to think that most open access (OA) journals charge authors for their publications. This brief communication examines the basis for such beliefs and finds it wanting. Indeed, in this study of over 9,000 OA journals included in the Directory of Open Access Journals, only 28% charged authors for publishing in their journals. This figure, however, was highest in various disciplines in medicine (47%) and the sciences (43%) and lowest in the humanities (4%) and the arts (0%).
Article
Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2-4], and journal [5, 6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8-11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested data sets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.
Article
A response to 21 commentaries of Nosek & Bar-Anan - Scientific Utopia I: Opening Scientific Communication. We make four points: (1) the potential for things to go wrong is not a justification to do nothing; (2) some changes, particularly open access, appear to be inevitable; (3) when authors control publishing, articles will get better, not worse; and (4) despite the substantial cumulative changes, if our proposal were adopted in whole, those that wished to produce and consume their science like they do today could retain most of those practices. However, faced with the alternatives, we believe that they would not choose to do so. We close with practical suggestions that individual scientists can do to embody the value of openness in scientific communication.
Article
In 2008, ESSD was established to provide a venue for publishing highly important research data, with two main aims: To provide reward for data "authors" through fully qualified citation of research data, classically aligned with the certification of quality of a peer reviewed journal. A major step towards this goal was the definition and rationale of article structure and review criteria for articles about datasets.
Article
Cheap open-access journals raise questions about the value publishers add for their money.
Article
We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much-improved data curation. (Hanson, Sugden, & Alberts, 2011) Researchers are producing an unprecedented deluge of data by using new methods and instrumentation. Others may wish to mine these data for new discoveries and innovations. However, research data are not readily available as sharing is common in only a few fields such as astronomy and genomics. Data sharing practices in other fields vary widely. Moreover, research data take many forms, are handled in many ways, using many approaches, and often are difficult to interpret once removed from their initial context. Data sharing is thus a conundrum. Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: (1) to reproduce or to verify research, (2) to make results of publicly funded research available to the public, (3) to enable others to ask new questions of extant data, and (4) to advance the state of research and innovation. These rationales differ by the arguments for sharing, by beneficiaries, and by the motivations and incentives of the many stakeholders involved. The challenges are to understand which data might be shared, by whom, with whom, under what conditions, why, and to what effects. Answers will inform data policy and practice.
Article
We introduce a set of integrated developments in web application software, networking, data citation standards, and statistical methods designed to put some of the universe of data and data sharing practices on somewhat firmer ground. We have focused on social science data, but aspects of what we have developed may apply more widely. The idea is to facilitate the public distribution of persistent, authorized, and verifiable data, with powerful but easy-to-use technology, even when the data are confidential or proprietary. We intend to solve some of the sociological problems of data sharing via technological means, with the result intended to benefit both the scientific community and the sometimes apparently contradictory goals of individual researchers. Government Version of Record
Article
Authors, reviewers and editors must act to protect the quality of research.
The fourth paradigm: Data--intensive scientific discovery. Microsoft Research. It's not about the data
  • T Hey
  • S Tansley
Hey, T., Tansley, S., & Tolle, K. (Eds.) (2009). The fourth paradigm: Data--intensive scientific discovery. Microsoft Research. It's not about the data. (2012). Nature Genetics, 44(2), 111. doi:10.1038/ng.1099
Data publishing: Peer review shared standards and collaboration. Presentation at 8th Research Data Management Forum Southampton
  • R Lawrence
Lawrence, R. (2012). Data publishing: Peer review, shared standards and collaboration. Presentation at 8th Research Data Management Forum, Southampton. Retrieved April, 2014, from http://www.dcc.ac.uk/webfm_send/798
Scientific data reusability: Conceptual foundations, impediments and enabling technologies (Tech. Rep.). Istituto di Scienza e Tecnologie dell'Informazione " A. Faedo Open access: The true cost of science publishing
  • C Thanos
Thanos, C. (2014). Scientific data reusability: Conceptual foundations, impediments and enabling technologies (Tech. Rep.). Istituto di Scienza e Tecnologie dell'Informazione " A. Faedo, " CNR. Van Noorden, R. (2013). Open access: The true cost of science publishing.