Article

Data Journals: A Survey

December 2014
Journal of the Association for Information Science and Technology

December 2014

DOI:10.1002/asi.23358

Authors:

Leonardo Candela

Italian National Research Council

Donatella Castelli

Italian National Research Council

Paolo Manghi

Italian National Research Council

Alice Tani

Italian National Research Council

Data occupy a key role in our information society. However, although the amount of published data continues to grow and terms such as “data deluge” and “big data” today characterize numerous (research) initiatives, much work is still needed in the direction of publishing data in order to make them effectively discoverable, available, and reusable by others. Several barriers hinder data publishing, from lack of attribution and rewards, vague citation practices, and quality issues to a rather general lack of a data-sharing culture. Lately, data journals have come forward to overcoming some of these barriers. In this study of more than 100 currently existing data journals, we describe the approaches they promote for data set description, availability, citation, quality, and open access. We close by identifying ways to expand and strengthen the data journals approach as a means to promote data set access and exploitation.

DataJournalsList20140404

Data

December 2014

Leonardo Candela · Donatella Castelli · Paolo Manghi · Alice Tani

Download

Diretrizes editoriais para a publicação de artigos de dados em Ciências da Saúde

Article

Full-text available

Dec 2023

Esta pesquisa tem o objetivo de analisar as diretrizes editoriais de revistas científicas de Ciências da Saúde que aceitam artigos de dados e identificar os pontos comuns para orientar sua elaboração, visando apoiar editores de revistas brasileiras. A amostra foi composta pelas revistas indexadas nas bases de dados Scopus e Web of Science que publicam artigos de dados, e a coleta de dados foi realizada diretamente nos sites das publicações, no link das instruções ou orientações aos autores das 54 revistas recuperadas referentes às Ciências da Saúde. Do total de títulos, 26 revistas (48%) indicavam algum tipo de orientação em relação à submissão de artigos de dados e quatro títulos apresentavam template de orientação. Conclui-se que apesar de poucas revistas fornecerem um template estruturado de artigo de dados para download, percebe-se o avanço das revistas na construção de uma descrição consistente dos principais componentes dos artigos de dados. O recorte adotado nesta pesquisa possibilitou a construção de uma visão mais clara acerca dos padrões implícitos adotados nos artigos de dados, nas ciências da saúde.

How are exclusively data journals indexed in major scholarly databases? An examination of four databases

Article

Full-text available

Oct 2023

The data paper is becoming a popular way for researchers to publish their research data. The growing numbers of data papers and journals hosting them have made them an important data source for understanding how research data is published and reused. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively, which should be addressed in the future.

How are exclusively data journals indexed in major scholarly databases? An examination of the Web of Science, Scopus, Dimensions, and OpenAlex

Preprint

Full-text available

Jul 2023

As part of the data-driven paradigm and open science movement, the data paper is becoming a popular way for researchers to publish their research data, based on academic norms that cross knowledge domains. Data journals have also been created to host this new academic genre. The growing number of data papers and journals has made them an important large-scale data source for understanding how research data is published and reused in our research system. One barrier to this research agenda is a lack of knowledge as to how data journals and their publications are indexed in the scholarly databases used for quantitative analysis. To address this gap, this study examines how a list of 18 exclusively data journals (i.e., journals that primarily accept data papers) are indexed in four popular scholarly databases: the Web of Science, Scopus, Dimensions, and OpenAlex. We investigate how comprehensively these databases cover the selected data journals and, in particular, how they present the document type information of data papers. We find that the coverage of data papers, as well as their document type information, is highly inconsistent across databases, which creates major challenges for future efforts to study them quantitatively. As a result, we argue that efforts should be made by data journals and databases to improve the quality of metadata for this emerging genre.

Qualitätssicherung von Datenpublikationen bei Data Journals und Forschungsdatenrepositorien

Thesis

Full-text available

Feb 2023

Maxi Kindling

Die Qualitätssicherung von Forschungsdaten ist im Kontext offener Wissenschaft ein wichtiges Thema. Sollen geteilte Daten dabei unterstützen, Forschungsergebnisse nachzuvollziehen und die Nachnutzung von Daten ermöglicht werden, bestehen entsprechende Anforderungen an ihre Qualität. Bei Datenqualität und Qualitätssicherung im Kontext von Datenpublikationen handelt es sich allerdings um komplexe und divers verwendete Konzepte. Bislang wird die Qualitätssicherung von Datenpublikationen punktuell ausführlich beschrieben, jedoch fehlt eine Betrachtung, die die möglichen Maßnahmen systematisch beschreibt. Darüber, wie einzelne Maßnahmen bei Repositorien verbreitet sind, ist ebenfalls kaum etwas bekannt. In der Dissertation wird herausgearbeitet, wie Qualität und Qualitätssicherung für Forschungsdaten definiert und systematisiert werden können. Auf dieser Basis wird ein theoretischer Ansatz für die Systematisierung qualitätssichernder Maßnahmen erarbeitet. Er dient als Grundstruktur für die Untersuchung von Data Journals und Repositorien. Dazu werden Guidelines von 135 Data Journals und Zertifizierungsdokumente von 99 Repositorien analysiert, die das Zertifikat CoreTrustSeal in der Version 2017–2019 erhalten haben. Die Analysen zeigen, wie Datenqualität in Data Journal Guidelines und durch Repositorien definiert wird und geben einen Einblick in die Praxis der Qualitätssicherung bei Repositorien. Die Ergebnisse bilden die Grundlage für eine Umfrage zur Verbreitung qualitätssichernder Maßnahmen, die auch offene Prozesse der Qualitätssicherung, Verantwortlichkeiten und die transparente Dokumentation der Datenqualität berücksichtigt. An der Umfrage im Jahr 2021 nahmen 332 Repositorien teil, die im Verzeichnis re3data indexiert sind. Die Ergebnisse der Untersuchungen zeigen den Status quo der Qualitätssicherung und die Definition von Datenqualität bei Data Journals und Forschungsdatenrepositorien auf. Sie zeigen außerdem, dass Repositorien mit vielfältigen Maßnahmen zur Qualitätssicherung von Datenpublikationen beitragen. Die Ergebnisse fließen in ein Framework für die Qualitätssicherung von Datenpublikationen in Repositorien ein.

Compartilhamento de dados no contexto da ciência brasileira um estudo integrativo

Article

Full-text available

Oct 2021

Objetivo: O objetivo deste trabalho foi analisar os dados obtidos e integrados a partir da investigação realizada com os professores pesquisadores vinculados aos programas de pós-graduação brasileiros, na área da Ciência da Informação, e os demais abordados na pesquisa intitulada de “Práticas e percepções dos pesquisadores brasileiros”, no que tange as razões pelo não compartilhamento dos seus dados. Metodologia: Caracteriza-se como uma pesquisa bibliográfica e exploratória e possui uma abordagem quanti-qualitativa. Os dados foram processados e posteriormente submetidos ao teste qui-quadrado. Resultados: Verificou-se que as Ciências Humanas e as Ciências Sociais são as áreas com maiores desafios no contexto do compartilhamento dos dados. A falta de exigência para a publicação de dados e a carência de infraestrutura foram as principais barreiras apresentadas pelos pesquisadores. Ficou constatado que a área da Ciência da Informação necessita de uma infraestrutura que incentive os pesquisadores a compartilharem os seus dados. Conclusões: Conclui-se que há a necessidade de implementação de políticas mais efetivas voltadas à disponibilização de dados de pesquisa, de modo a facilitar o uso/reúso dos dados por toda comunidade científica.

Data Citation and the Citation Graph

Article

Full-text available

Nov 2021

The citation graph is a computational artifact that is widely used to represent the domain of published literature. It represents connections between published works, such as citations and authorship. Among other things, the graph supports the computation of bibliometric measures such as h-indexes and impact factors. There is now an increasing demand that we should treat the publication of data in the same way that we treat conventional publications. In particular, we should cite data for the same reasons that we cite other publications. In this paper we discuss what is needed for the citation graph to represent data citation. We identify two challenges: (i) to model the evolution of credit appropriately (through references) over time and (ii) to model data citation not only to a dataset treated as a single object but also to parts of it. We describe an extension of the current citation graph model that addresses these challenges. It is built on two central concepts: citable units and reference subsumption. We discuss how this extension would enable data citation to be represented within the citation graph and how it allows for improvements in current practices for bibliometric computations both for scientific publications and for data.

Data Credit Distribution: A New Method to Estimate Databases Impact

Article

Full-text available

Aug 2020

It is widely accepted that data is fundamental for research and should therefore be cited as textual scientific publications. However, issues like data citation, handling and counting the credit generated by such citations, remain open research questions. Data credit is a new measure of value built on top of data citation, which enables us to annotate data with a value, representing its importance. Data credit can be considered as a new tool that, together with traditional citations, helps to recognize the value of data and its creators in a world that is ever more depending on data. In this paper we define data credit distribution (DCD) as a process by which credit generated by citations is given to the single elements of a database. We focus on a scenario where a paper cites data from a database obtained by issuing a query. The citation generates credit which is then divided among the database entities responsible for generating the query output. One key aspect of our work is to credit not only the explicitly cited entities, but even those that contribute to their existence, but which are not accounted in the query output. We propose a data credit distribution strategy (CDS) based on data provenance and implement a system that uses the information provided by data citations to distribute the credit in a relational database accordingly. As use case and for evaluation purposes, we adopt the IUPHAR/BPS Guide to Pharmacology (GtoPdb), a curated relational database. We show how credit can be used to highlight areas of the database that are frequently used. Moreover, we also underline how credit rewards data and authors based on their research impact, and not merely on the number of citations. This can lead to designing new bibliometrics for data citations.

Deep Impact: A Study on the Impact of Data Papers and Datasets in the Humanities and Social Sciences

Article

Full-text available

Oct 2022

The humanities and social sciences (HSS) have recently witnessed an exponential growth in data-driven research. In response, attention has been afforded to datasets and accompanying data papers as outputs of the research and dissemination ecosystem. In 2015, two data journals dedicated to HSS disciplines appeared in this landscape: Journal of Open Humanities Data (JOHD) and Research Data Journal for the Humanities and Social Sciences (RDJ). In this paper, we analyse the state of the art in the landscape of data journals in HSS using JOHD and RDJ as exemplars by measuring performance and the deep impact of data-driven projects, including metrics (citation count; Altmetrics, views, downloads, tweets) of data papers in relation to associated research papers and the reuse of associated datasets. Our findings indicate: that data papers are published following the deposit of datasets in a repository and usually following research articles; that data papers have a positive impact on both the metrics of research papers associated with them and on data reuse; and that Twitter hashtags targeted at specific research campaigns can lead to increases in data papers’ views and downloads. HSS data papers improve the visibility of datasets they describe, support accompanying research articles, and add to transparency and the open research agenda.

Artificial Intelligence for Radiation Oncology Applications Using Public Datasets

Article

Full-text available

Oct 2022
SEMIN RADIAT ONCOL

Artificial intelligence (AI) has exceptional potential to positively impact the field of radiation oncology. However, large curated datasets - often involving imaging data and corresponding annotations - are required to develop radiation oncology AI models. Importantly, the recent establishment of Findable, Accessible, Interoperable, Reusable (FAIR) principles for scientific data management have enabled an increasing number of radiation oncology related datasets to be disseminated through data repositories, thereby acting as a rich source of data for AI model building. This manuscript reviews the current and future state of radiation oncology data dissemination, with a particular emphasis on published imaging datasets, AI data challenges, and associated infrastructure. Moreover, we provide historical context of FAIR data dissemination protocols, difficulties in the current distribution of radiation oncology data, and recommendations regarding data dissemination for eventual utilization in AI models. Through FAIR principles and standardized approaches to data dissemination, radiation oncology AI research has nothing to lose and everything to gain.

The data paper as a socio-linguistic epistemic object: A content analysis on the rhetorical moves used in data paper abstracts

Preprint

Full-text available

Jun 2021

The data paper is an emerging academic genre that focuses on the description of research data objects. However, there is a lack of empirical knowledge about this rising genre in quantitative science studies, particularly from the perspective of its linguistic features. To fill this gap, this research aims to offer a first quantitative examination of which rhetorical moves-rhetorical units performing a coherent narrative function-are used in data paper abstracts, as well as how these moves are used. To this end, we developed a new classification scheme for rhetorical moves in data paper abstracts by expanding a well-received system that focuses on English-language research article abstracts. We used this expanded scheme to classify and analyze rhetorical moves used in two flagship data journals, Scientific Data and Data in Brief. We found that data papers exhibit a combination of IMRaD- and data-oriented moves and that the usage differences between the journals can be largely explained by journal policies concerning abstract and paper structure. This research offers a novel examination of how the data paper, a novel data-oriented knowledge representation, is composed, which greatly contributes to a deeper understanding of data and data publication in the scholarly communication system.

On the Readiness of Scientific Data for a Fair and Transparent Use in Machine Learning

Preprint

Full-text available

Apr 2024

To ensure the fairness and trustworthiness of machine learning (ML) systems, recent legislative initiatives and relevant research in the ML community have pointed out the need to document the data used to train ML models. Besides, data-sharing practices in many scientific domains have evolved in recent years for reproducibility purposes. In this sense, the adoption of these practices by academic institutions has encouraged researchers to publish their data and technical documentation in peer-reviewed publications such as data papers. In this study, we analyze how this scientific data documentation meets the needs of the ML community and regulatory bodies for its use in ML technologies. We examine a sample of 4041 data papers of different domains, assessing their completeness and coverage of the requested dimensions, and trends in recent years, putting special emphasis on the most and least documented dimensions. As a result, we propose a set of recommendation guidelines for data creators and scientific data publishers to increase their data's preparedness for its transparent and fairer use in ML technologies.

A Comprehensive Data Analytics Framework to Support Research Data Management in Distributed Systems

Thesis

Full-text available

Dec 2023

M. Amin Yazdi

Effective Research Data Management (RDM) practices are essential for fostering research collaboration, increasing discoverability and repurposing research data, and advancing scientific progress in higher education. In recent years, adopting Open Science Platforms (OSPs) and the Findable, Accessible, Interoperable, and Reusable (FAIR) data principles has highlighted the need for improved RDM methodologies and tools for flourishing higher education achievements. However, existing literature has provided limited guidance on monitoring RDM processes, their adoption, and their use. This dissertation addresses this gap by investigating how to enable dis- covering and enhancing process-aware RDM activities via modeling the underlying researcher’s actual practices. This dissertation presents a series of methodologies as a framework combining data acquisition, abstraction, knowledge discovery, and operation enhancement techniques. Furthermore, the case studies highlight the challenges associated with RDM-related activities by assessing the proposed methodologies’ validity in real-world environments. Initially, this work presents a universal reference software architecture for RDM ser- vices; then, it proposes four approaches for data acquisition, including a novel Hybrid logger technique for acquiring datasets from information systems that operate on distributed settings, providing a comprehensive view of user activities by evaluating corresponding software component executions. This approach enables a projection of user behavior and facilitates the development of further machine-learning studies. Furthermore, this work introduces a semi-supervised learning approach for abstract- ing datasets by accommodating non-sequential events in distributed systems while balancing data granularity and model fitness. The methodology for discovering process-aware activities incorporates a modular and layered architecture, providing insights into RDM compliance, identifying deviations, and optimizing user experience. Additionally, it outlines a method for determining and visualizing the user and system interactions and discovers the RDM phases of research projects, providing a practical understanding of the progression and activities of different research groups. Finally, this thesis proposes and evaluates two recommender systems, demonstrating the potential of Content-Based and Collaborative Filtering recommender systems in enabling the reusability of research data repositories and fostering cooperation among researchers. The findings contribute significantly to the expanding body of literature on RDM and provide valuable insights into the potential of the presented methodologies for enhancing RDM practices in OSPs. In conclusion, this dissertation offers holistic strategies for addressing the difficulties re- lated to facilitating RDM in OSPs, providing guidelines for implementing necessary ar- chitecture and demonstrating the applicability of the proposed methods to other RDM services that adhere to the reference software architecture of RDM systems.

Past and future roles of paired watersheds: a North American inventory and anecdotes from the Caspar Creek Experimental Watersheds

Article

Full-text available

Oct 2023

Given the high costs of constructing, maintaining, monitoring, and sampling paired watersheds, it is prudent to ask “Are paired watershed studies still worth the effort?” We present a compilation of 90 North American paired watershed studies and use examples from the Caspar Creek Experimental Watersheds to contend that paired watershed studies are still worth the effort and will continue to remain relevant in an era of big data and short funding cycles. We offer three reasons to justify this assertion. First, paired watersheds allow for watershed-scale experiments that have produced insights into hydrologic processes, water quality, and nutrient cycling for over 100 years. Paired watersheds remain an important guide to inform best management practices for timber harvesting and other land-management concerns. Second, paired watersheds can produce long climate, streamflow, and water quality records because sites are frequently maintained over the course of multiple experiments or long post-treatment periods. Long-term datasets can reveal ecological surprises, such as changes in climate-streamflow relationships driven by slow successional processes. Having multiple watershed records helps identify the cause of these changes. Third, paired watersheds produce data that are ideal for developing and testing hydrologic models. Ultimately, the fate of paired watersheds is up to the scientific community and funding agencies. We hope that their importance continues to be recognized.

Tracing data: A survey investigating disciplinary differences in data citation

Article

Full-text available

Sep 2023

Data citations, or citations in reference lists to data, are increasingly seen as an important means to trace data reuse and incentivize data sharing. Although disciplinary differences in data citation practices have been well documented via scientometric approaches, we do not yet know how representative these practices are within disciplines. Nor do we yet have insight into researchers’ motivations for citing - or not citing - data in their academic work. Here, we present the results of the largest known survey (n = 2,492) to explicitly investigate data citation practices, preferences, and motivations, using a representative sample of academic authors by discipline, as represented in the Web of Science (WoS). We present findings about researchers’ current practices and motivations for reusing and citing data and also examine their preferences for how they would like their own data to be cited. We conclude by discussing disciplinary patterns in two broad clusters, focusing on patterns in the social sciences and humanities, and consider the implications of our results for tracing and rewarding data sharing and reuse. Peer Review https://www.webofscience.com/api/gateway/wos/peer-review/10.1162/qss_a_00264

How do software citation formats evolve over time? A longitudinal analysis of R programming language packages

Preprint

Full-text available

Jul 2023

Under the data-driven research paradigm, research software has come to play crucial roles in nearly every stage of scientific inquiry. Scholars are advocating for the formal citation of software in academic publications, treating it on par with traditional research outputs. However, software is hardly consistently cited: one software entity can be cited as different objects, and the citations can change over time. These issues, however, are largely overlooked in existing empirical research on software citation. To fill the above gaps, the present study compares and analyzes a longitudinal dataset of citation formats of all R packages collected in 2021 and 2022, in order to understand the citation formats of R-language packages, important members in the open-source software family, and how the citations evolve over time. In particular, we investigate the different document types underlying the citations and what metadata elements in the citation formats changed over time. Furthermore, we offer an in-depth analysis of the disciplinarity of journal articles cited as software (software papers). By undertaking this research, we aim to contribute to a better understanding of the complexities associated with software citation, shedding light on future software citation policies and infrastructure.

Opening research data contributes to the citations of related research articles: Evidence from Data in Brief

Article

Full-text available

May 2023
LEARN PUBL

This paper explores the effect of publishing a data paper in the Open Access journal Data in Brief (DIB) on the citation counts of the related research paper. Using regression analysis, citation content analysis and a survey method, we investigate whether research papers with a related data paper have higher citation counts and the potential reasons. After controlling variables that correlate with the citation counts, research papers with a related data paper were found to have higher citation counts than those published in the same issue of the same journal. Next, we explored the causal relationship between the two variables by surveying the corresponding authors of 618 papers who shared datasets in DIB from 2014 to 2021. The results show that the authors acknowledge the benefits of sharing data in DIB, including citation increase and career reputation enhancement. We further explored how the data papers in DIB increase the citations of the related research papers by using citation content analysis. We found that scientists co‐cite the data papers and their related research papers for the purpose of reusing the underlying data or portraying a better understanding of the underlying data and related research articles.

Machine Learning for Healthcare: A Bibliometric Study of Contributions from Africa

Preprint

Full-text available

Apr 2023

Machine learning has seen enormous growth in the last decade, with healthcare being a prime application for advanced diagnostics and improved patient care. The application of machine learning for healthcare is particularly pertinent in Africa, where many countries are resource-scarce. However, it is unclear how much research on this topic is arising from African institutes themselves, which is a crucial aspect for applications of machine learning to unique contexts and challenges on the continent. Here, we conduct a bibliometric study of African contributions to research publications related to machine learning for healthcare, as indexed in Scopus, between 1993 and 2022. We identified 3,772 research outputs, with most of these published since 2020. North African countries currently lead the way with 64.5% of publications for the reported period, yet Sub-Saharan Africa is rapidly increasing its output. We found that international support in the form of funding and collaborations is correlated with research output generally for the continent, with local support garnering less attention. Understanding African research contributions to machine learning for healthcare is a crucial first step in surveying the broader academic landscape, forming stronger research communities, and providing advanced and contextually aware biomedical access to Africa.

Machine Learning for Healthcare: A Bibliometric Study of Contributions from Africa

Preprint

Full-text available

Jan 2023

Open Research Data in the Open Science Ecosystem and Business Environment

Article

Full-text available

Dec 2022

Today, one can observe shifts in the research landscape, which is formed by digitization and open science principles. The open science movement continues to gain momentum, attention and debate. In parallel with the principle of unity, open science gives rise to a taxonomy of several related ideas, guidelines and concepts, such as open access, open replicable research and open data. Over the past fifteen years, research institutions have focused on open access to publications. However, recently the focus of attention has shifted to research data as a “new currency” in research activities and their distribution in open access, and the guiding principles of data management are becoming crucial for the wide implementation of open science practices and the effective use of data in research, industry, business and other sectors of the economy. In this context, it is relevant to carry out a thorough study of primary scientific works on open science issues and to study the role of the concept of “open research data” in the paradigm of a holistic ecosystem of open science and business ecosystem. In this work, it is proposed to use the methods of quantitative and qualitative bibliometric analysis, which allows to identify the main trends and form the basis for further research. The information base for this work was the international scientometric database Scopus, which enables to analyze bibliographic data using built-in tools and import them for external use in the VOSviewer software. The study revealed an increasing trend in the number of publications on the subject under study, with the highest annual growth rate in 2017 (76%) and 2019 (66%). Qualitative bibliographic analysis made it possible to analyze the most cited and, therefore, trending works on the selected topic. In terms of the number of citations per year, the results show that the studies with such directions in open science as open program code (open source); data/research reproducibility, research data management; open access to publications (open access) are most popular. In addition, a cluster analysis of the co-prevalence of keywords was conducted. It formed clusters dedicated to both institutional and infrastructural problems of the development of open science and research data. Separately, the results of the analysis create a scientific basis for further research into the key determinants of the effectiveness of the implementation of a proper research data management system at the micro, meso, and macro levels. It will improve the effectiveness of the implementation of scientific developments from one field of knowledge to another, while achieving increased interdisciplinary research. In parallel with this, interested persons of the real sector of the economy get the opportunity to analyze scientific results, determining the possibility of their adoption in their own activities.

El valor actual de las Colecciones Científicas. Aplicaciones en Biología de las Especies, Conservación y Cambio Global

Thesis

Full-text available

Jan 2022

Jose Luis Molina Pardo

Durante siglos la comunidad científica ha recolectado animales, plantas, rocas y minerales, a escala global, con la finalidad de estudiar diversos aspectos relacionadas con la ciencia y la tecnología. Parte de este material se encuentra actualmente en colecciones científicas de todo el mundo. Las colecciones científicas son esencialmente repositorios sistematizados y accesibles para la comunidad científica, que albergan testimonios espacio temporales de la diversidad biológica y geológica conocida en el planeta Tierra. Las colecciones biológicas cuya finalidad es la investigación, preservan organismos o partes de organismos (especímenes únicos, tangibles, perdurables en el tiempo e insustituibles) y sus muestras derivadas (como tejidos conservados, semillas etc.). Los avances tecnológicos, la digitalización y mejoras en la accesibilidad a las colecciones, junto a su uso combinado con otras fuentes de datos de biodiversidad, han revolucionado la investigación en colecciones científicas en las últimas décadas. Pese a su importancia, sus contribuciones son ampliamente subestimadas tanto por la sociedad como por parte de las administraciones. El objetivo de esta tesis, es determinar el valor actual de las Colecciones Científicas mediante el estudio de especímenes físicos o sus metadatos asociados, complementados con otras fuentes de datos de biodiversidad. Para alcanzar esta meta, nos propusimos los siguientes objetivos específicos: i) analizar las posibilidades que ofrece el estudio directo de especímenes conservados en Colecciones Científicas; ii) determinar la importancia de los registros procedentes de Colecciones Científicas a la hora de evaluar las posibles afecciones del cambio de uso del suelo sobre la flora amenaza; iii) determinar la importancia de los registros procedentes de Colecciones Científicas a la hora evaluar las áreas de interés para la conservación de la flora amenazada y sus posibles impactos futuros por el cambio climático; iv) elaborar propuestas metodológicas que relacionen datos procedentes de Colecciones Científicas con otras fuentes de información ambiental, a la hora abordar cuestiones de conservación a escala global. En conjunto, en esta tesis, se han descubierto nuevas estructuras cuticulares en grillos, algunas de las cuales se han relacionado con reproducción y se ha podido determinar el grado de trogolomorfismo en el género Petaloptila; además se han analizado los efectos del uso agrícola sobre la flora amenazada española, posibles cambios futuros en los hotspots canarios ocasionados por el cambio climático y desarrollado una metodología para la conservación de las aves migratorias. Se abordan cuestiones como la importancia de la recolección, los sesgos y sus posibles soluciones, la ciencia en abierto, el valor de los datos de ciencia ciudadana o los data papers. Nuestros resultados reafirman que las colecciones científicas son una pieza clave en investigación y en educación. Es necesario potenciar el crecimiento de las colecciones, continuar con las tareas de digitalización y asegurar una dotación económica y personal a los centros de colecciones para poder continuar con su labor en un futuro.

Why don't we share data and code? Perceived barriers and benefits to public archiving practices

Article

Full-text available

Nov 2022

The biological sciences community is increasingly recognizing the value of open, reproducible and transparent research practices for science and society at large. Despite this recognition, many researchers fail to share their data and code publicly. This pattern may arise from knowledge barriers about how to archive data and code, concerns about its reuse, and misaligned career incentives. Here, we define, categorize and discuss barriers to data and code sharing that are relevant to many research fields. We explore how real and perceived barriers might be overcome or reframed in the light of the benefits relative to costs. By elucidating these barriers and the contexts in which they arise, we can take steps to mitigate them and align our actions with the goals of open science, both as individual scientists and as a scientific community.

Data Curation, Fisheries, and Ecosystem-based Management: the Case Study of the Pecheker Database

Article

Apr 2021

Korean researchers’ motivations for publishing in data journals and the usefulness of their data: a qualitative study

Article

Full-text available

Aug 2021

Purpose: This study investigated the usefulness and limitations of data journals by analyzing motivations for submission, review and publication processes according to researchers with experience publishing in data journals.Methods: Among 79 data journals indexed in Web of Science, we selected four data journals where data papers accounted for more than 20% of the publication volume and whose corresponding authors belonged to South Korean research institutes. A qualitative analysis was conducted of the subjective experiences of seven corresponding authors who agreed to participate in interviews. To analyze interview transcriptions, clusters were created by restructuring the theme nodes using Nvivo 12.Results: The most important element of data journals to researchers was their usefulness for obtaining credit for research performance. Since the data in repositories linked to data papers are screened using journals’ review processes, the validity, accuracy, reusability, and reliability of data are ensured. In addition, data journals provide a basis for data sharing using repositories and data-centered follow-up research using citations and offer detailed descriptions of data.Conclusion: Data journals play a leading role in data-centered research. Data papers are recognized as research achievements through citations in the same way as research papers published in conventional journals, but there was also a perception that it is difficult to attain a similar level of academic recognition with data papers as with research papers. However, researchers highly valued the usefulness of data journals, and data journals should thus be developed into new academic communication channels that enhance data sharing and reuse.

Data sharing practices and data availability upon request differ across scientific disciplines

Article

Full-text available

Jul 2021

Data sharing is one of the cornerstones of modern science that enables large-scale analyses and reproducibility. We evaluated data availability in research articles across nine disciplines in Nature and Science magazines and recorded corresponding authors’ concerns, requests and reasons for declining data sharing. Although data sharing has improved in the last decade and particularly in recent years, data availability and willingness to share data still differ greatly among disciplines. We observed that statements of data availability upon (reasonable) request are inefficient and should not be allowed by journals. To improve data sharing at the time of manuscript acceptance, researchers should be better motivated to release their data with real benefits such as recognition, or bonus points in grant and job applications. We recommend that data management costs should be covered by funding agencies; publicly available research data ought to be included in the evaluation of applications; and surveillance of data sharing should be enforced by both academic publishers and funders. These cross-discipline survey data are available from the plutoF repository.

“Garbage In, Garbage Out” Revisited: What Do Machine Learning Application Papers Report About Human-Labeled Training Data?

Article

Full-text available

Jun 2021

Supervised machine learning, in which models are automatically derived from labeled training data, is only as good as the quality of that data. This study builds on prior work that investigated to what extent ‘best practices’ around labeling training data were followed in applied ML publications within a single domain (social media platforms). In this paper, we expand by studying publications that apply supervised ML in a far broader spectrum of disciplines, focusing on human-labeled data. We report to what extent a random sample of ML application papers across disciplines give specific details about whether best practices were followed, while acknowledging that a greater range of application fields necessarily produces greater diversity of labeling and annotation methods. Because much of machine learning research and education only focuses on what is done once a “ground truth” or “gold standard” of training data is available, it is especially relevant to discuss issues around the equally-important aspect of whether such data is reliable in the first place. This determination becomes increasingly complex when applied to a variety of specialized fields, as labeling can range from a task requiring little-to-no background knowledge to one that must be performed by someone with career expertise. Peer Review https://publons.com/publon/10.1162/qss_a_00144

Data Curation and Fisheries Scientific Monitoring: Case Study of the Pecheker Database

Article

Full-text available

Jun 2021

The scientific monitoring of the Southern Ocean French fishing industry is based on the use the Pecheker database. Pecheker is dedicated to the digital curation of the data collected on field by scientific observers and which analysis allows the scientists of the Muséum national d’Histoire naturelle institution to provide guidelines and advice for the regulation of the fishing activity, the protection of the fish stocks and the protection of the marine ecosystems. The template of Pecheker has been developed to make the database adapted to the ecosystem-based management concept. Considering the global context of biodiversity erosion, this modern approach of management aims to take account of the environmental background of the fisheries to ensure their sustainable development. Completeness and high quality of the raw data is a key element for an ecosystem-based management database such as Pecheker. Here, we present the development of this database as a case study of fisheries data curation to be shared with the readers. Full code to deploy a database based on the Pecheker template is provided in supplementary materials. Considering the success factors we could identify, we propose a discussion about how the community could build a global fisheries information system based on a network of small databases including interoperability standards.

Quantitative assessment of metadata collections of research data repositories

Thesis

Full-text available

May 2021

Dorothea Strecker

Structured metadata are of particular importance in the context of facilitating research data (re-)use. Although research data repositories create and manage metadata records, existing research offers limited insights into the relationship between repositories and metadata for research data. Therefore, in conducting a quantitative assessment informed by metadata quality requirements, this thesis aims at making distinctive features of metadata for research data visible, specifying the potential influence of repository characteristics on metadata, and exploring changes to metadata records. The analysis showed variations in metadata completeness across repositories. Within repositories, metadata descriptions are relatively homogenous. These findings suggest that repositories have developed distinctive and consistent practices for describing data. On average, descriptions comprise 487.3 characters, and 5.52 years passed between the year a dataset was published and the metadata record was registered. Differences in the completeness of metadata records, description length and timeliness were significant across repository types and certification status, whereas differences in collection homogeneity were not significant. Overall, most metadata records in the sample were changed, which conforms with the conceptualization of metadata for research data as dynamic and changeable objects. Differences in the number of changes are significant across repository types.

To what extent is researchers' data-sharing motivated by formal mechanisms of recognition and credit?

Article

Full-text available

Feb 2021

Data sharing by researchers is a centerpiece of Open Science principles and scientific progress. For a sample of 6019 researchers, we analyze the extent/frequency of their data sharing. Specifically, the relationship with the following four variables: how much they value data citations, the extent to which their data-sharing activities are formally recognized, their perceptions of whether sufficient credit is awarded for data sharing, and the reported extent to which data citations motivate their data sharing. In addition, we analyze the extent to which researchers have reused openly accessible data, as well as how data sharing varies by professional age-cohort, and its relationship to the value they place on data citations. Furthermore, we consider most of the explanatory variables simultaneously by estimating a multiple linear regression that predicts the extent/frequency of their data sharing. We use the dataset of the State of Open Data Survey 2019 by Springer Nature and Digital Science. Results do allow us to conclude that a desire for recognition/credit is a major incentive for data sharing. Thus, the possibility of receiving data citations is highly valued when sharing data, especially among younger researchers, irrespective of the frequency with which it is practiced. Finally, the practice of data sharing was found to be more prevalent at late research career stages, despite this being when citations are less valued and have a lower motivational impact. This could be due to the fact that later-career researchers may benefit less from keeping their data private.

To what extent is researchers' data-sharing motivated by formal mechanisms of recognition and credit?

Preprint

Full-text available

Jan 2021

Search, reuse and sharing of research data in materials science and engineering-A qualitative interview study

Article

Full-text available

Sep 2020
PLOS ONE

Open research data practices are a relatively new, thus still evolving part of scientific work, and their usage varies strongly within different scientific domains. In the literature, the investigation of open research data practices covers the whole range of big empirical studies covering multiple scientific domains to smaller, in depth studies analysing a single field of research. Despite the richness of literature on this topic, there is still a lack of knowledge on the (open) research data awareness and practices in materials science and engineering. While most current studies focus only on some aspects of open research data practices, we aim for a comprehensive understanding of all practices with respect to the considered scientific domain. Hence this study aims at 1) drawing the whole picture of search, reuse and sharing of research data 2) while focusing on materials science and engineering. The chosen approach allows to explore the connections between different aspects of open research data practices, e.g. between data sharing and data search. In depth interviews with 13 researchers in this field were conducted, transcribed verbatim, coded and analysed using content analysis. The main findings characterised research data in materials science and engineering as extremely diverse, often generated for a very specific research focus and needing a precise description of the data and the complete generation process for possible reuse. Results on research data search and reuse showed that the interviewees intended to reuse data but were mostly unfamiliar with (yet interested in) modern methods as dataset search engines, data journals or searching public repositories. Current research data sharing is not open, but bilaterally and usually encouraged by supervisors or employers. Project funding does affect data sharing in two ways: some researchers argue to share their data openly due to their funding agency's policy, while others face legal restrictions for sharing as their projects are partly funded by industry. The time needed for a precise description of the data and their generation process is named as biggest obstacle for data sharing. From these findings, a precise set of actions is derived suitable to support Open Data, involving training for researchers and introducing rewards for data sharing on the level of universities and funding bodies.

Are the UAE Academic Libraries Ready to Support Research 2.0?

Article

Full-text available

Mar 2020

Purpose –The purpose of the paper is to recognize different errands and responsibilities that the UAE academic libraries must undertake towards the trendy changes in researchers’ information seeking behavior, and fulfill to the advancements carried in by the emergence of Research 2.0. Design/methodology/approach– The researchers comprehensively reviewed the appropriate literature related to the academic libraries’ activities viz., information literacy (IL) education, research data services (RDS), awareness-raising, and support individual faculty members in the United Arab Emirates. Findings – The UAE librarians organize information literacy education for the students of all programs, primarily to research scholars and faculty in both Arabic and English languages. The faculty members are supported with discipline-specific databases, print and digital versions of books and journals along with other online services. Regarding raising awareness, library professionals in the country actively involved in the transformation of all types of knowledge sources and their updates to all stakeholders of the education, whereas research data service is slowly gearing up in many academic libraries. Originality/value – The paper proposes to be an addition to the body of knowledge about academic library support through information literacy, awareness raising, faculty attention and research data services to researchers in the UAE.

How do official software citation formats evolve over time? A longitudinal analysis of R programming language packages

Article

Jun 2024

데이터논문 동료심사를 위한 핵심 개념 분석과 프로세스 모델링

Article

Full-text available

Sep 2023

A data paper describing research data helps credit researchers producing the data while helping other researchers verify previous research and start new research by reusing the data. Publishing a data paper and depositing data to a public data repository are increasing with these benefits. A domestic academic society that plans to publish data papers faces challenges, including timely acquiring tremendous knowledge concerning data paper structures and templates, peer review policy and process, and trustworthy data repositories, as a data paper has different characteristics, unlike a research paper. However, the need for more research and information concerning the critical elements of data paper and the peer-review process makes it difficult to operate for data paper review and publication. To address these issues, we propose essential concepts of the data paper and the data paper peer-review, including the process model of the peer-review with in-depth analysis of five data journals’ data paper templates, articles, and other guides worldwide. Academic societies intending to publish or add data papers as a new type of paper may establish policies and define a peer-review process by adopting the proposed conceptual models, effectively streamlining the preparation of data paper publication.

Dissemination effect of data papers on scientific datasets

Article

Nov 2023

Open data as an integral part of the open science movement enhances the openness and sharing of scientific datasets. Nevertheless, the normative utilization of data journals, data papers, scientific datasets, and data citations necessitates further research. This study aims to investigate the citation practices associated with data papers and to explore the role of data papers in disseminating scientific datasets. Dataset accession numbers from NCBI databases were employed to analyze the prevalence of data citations for data papers from PubMed Central. A dataset citation practice identification rule was subsequently established. The findings indicate a consistent growth in the number of biomedical data journals published in recent years, with data papers gaining attention and recognition as both publications and data sources. Although the use of data papers as citation sources for data remains relatively rare, there has been a steady increase in data paper citations for data utilization through formal data citations. Furthermore, the increasing proportion of datasets reported in data papers that are employed for analytical purposes highlights the distinct value of data papers in facilitating the dissemination and reuse of datasets to support novel research.

QUALIDADE DOS DADOS NA WEB: REVISÃO INTEGRATIVA SOBRE DIRETRIZES PARA PUBLICAÇÃO

Article

Full-text available

May 2023

The exponential increase of published data and the diversity of systems require the adoption of good practices to achieve quality indexes that enable discovery, access, and reuse. To identify good practices, an integrative review was used, as well as procedures from the ProKnow-C methodology. After applying the ProKnow-C procedures to the documents retrieved from the Web of Science, Scopus and Library, Information Science & Technology Abstracts databases, an analysis of 31 items was performed. This analysis allowed observing that in the last 20 years the guidelines for publishing open government data had a great impact on the Linked Data model implementation in several domains and currently the FAIR principles and the Data on the Web Best Practices are the most highlighted in the literature. These guidelines presents orientations in relation to various aspects for the publication of data in order to contribute to the optimization of quality, independent of the context in which they are applied. The CARE and FACT principles, on the other hand, although they were not formulated with the same objective as FAIR and the Best Practices, represent great challenges for information and technology scientists regarding ethics, responsibility, confidentiality, impartiality, security, and transparency of data. Keywords: Best practices; Data publishing on the Web; Linked Open Data; Data quality

과학기술분야 연구자들의 데이터 출판경험 및 인식 연구(A Study on Awareness and Experience of Data Publishing by Scientists)

Article

Full-text available

Apr 2023

Ankara Üniversitesi akademisyenlerinin araştırma verilerinin yönetimi ile ilgili tutumları ve bir model önerisi

Thesis

Full-text available

Dec 2019

Mithat Baver Zencir

The most important element of scientific studies are research data. The theories and hypotheses which are put forward in researches are shaped and/or proved depending on the data. Therefore, all scientists produce data during their research. By sharing these data, scientific productivity is ensured as well as verification and transparency of the researches. For this reason, it is important to share and to open the research data. Sharing data is possible through effective management of data. Research data management consists of several interrelated stages. These stages can be specified as planning; data collection and creation; metadata creation; storage, preservation and security; sharing. Each stage should be considered as an important process that shapes the scientific activities of academicians. These processes which are resulted in sharing enable data reuse. This study aims to reveal the attitudes of researchers about the data management processes. By these means, it will be possible to propose a model related to the management of research data at Ankara University which can be a sample for similar institutions. For this purpose, it was tried to determine the data management attitudes of researchers who conducted Scientific Research Projects (BAP) in Ankara University between 2013-2018 by using survey technique. Total number of participants who constituted research population is 376. A total of 194 BAP executives participated in the study and the number of participants was calculated according to the disciplines was reached and the sample was statistically represented in the target population. Followings are the important conclusions reached relating to data management process of the researchers participated to the study: •Most of the academicians do not have any data management plan and planning activities do not seem to be at the desired level. •Familiarity with metadata is low and standards are not used. •Researchers who store large amounts of data indefinitely use their personal storage space extensively and do not considered the associated costs. Also, the use of institutional storage area low in storage and backup activities during the research process. This situation threatens the data security. •The researchers find sufficient to share data only through publication. In this context, it is difficult to say that the participants share their research data. •One of the biggest obstacles to sharing data is ethical concerns about plagiarism which will result in the loss of publication and career opportunities. Additionally, the structure of the academic system that brings awards, prestige and reputation only through publication makes sharing the publication preferable to sharing the data. According to the findings, it is seen that researchers do not plan data management processes inclusively and they need services, trainings and regulations in the context of these processes. In this sense; two fundamental hypotheses have been confirmed that "Academicians conducting Scientific Research Project in Ankara University do not have a comprehensive data management plan for the management of research data" and "Academicians conducting Scientific Research Project in Ankara University need services, trainings and regulations (policies, directives, instructions, etc.) in the context of the management of research data". Unshared data cannot be reused and it is impossible for the data that is not reused to become a project idea. With the conclusions listed above it shows that there is one-dimensional data management process rather than a life cycle process in which data is reused. Instead of this, the model proposed in our study aims to realize open research data and make it reusable again.

Méthodes et outils pour un processus de modélisation collaboratif et ouvert des systèmes énergétiques

Thesis

Full-text available

Sep 2022

Sacha Hodencq

Energy appears as a major issue in the face of the current socio-ecological crisis. Energy modelling can be used to explore the design and management possibilities of components and systems, and thus to discern sustainable energy pathways. However, historical energy modelling and the main current approaches are proprietary and lack transparency, although the emergence of open energy modelling is promising. This thesis introduces the practices, interests and obstacles of open energy modelling, before presenting the ORUCE (Open and Reproducible Use Cases For Energy) method, designed as a transferable process to make these practices accessible to researchers in the field. This method focuses in particular on use cases as good vectors for reproducibility and capitalising on knowledge. Actual use cases in contact with energy stakeholders are presented, on the topics of waste heat recovery and photovoltaic self-consumption, illustrating the variety of uses of the ORUCE method. Finally, a concept of a collaborative open energy modelling platform is presented. This concept was refined in a user experience inquiry, and the resulting platform aims to make energy studies and associated resources accessible to stakeholders in research, public authorities and citizen collectives.

Análise das publicações de Data Papers sobre Biodiversidade: considerações acerca da preservação digital

Article

Full-text available

Jul 2022

Introdução: A estruturação dos conjuntos de dados sobre Biodiversidade está sendo divulgada em uma linguagem reservada para a descrição do substrato da comunicação científica denominada Data Papers, isto é, os dados que sustentam pesquisas científicas nesse campo do conhecimento, independentemente do modelo tradicional de comunicação científica. Objetivo: Analisar as publicações em formato de Data Papers no campo da Biodiversidade em âmbito internacional. Metodologia: Pesquisa documental de abordagem qualitativa e aplica técnicas para coleta e exame das informações por meio de Análise de Conteúdo. Verifica a situação de 33 revistas apontados pela Global Biodiversity Information Facility (GBIF) que oferecem publicações em formato de Data Papers. Identifica-se: os temas correlatos à biodiversidade; os tipos de licenças, indexadores, a quantidade de Data Papers publicados, os títulos que possuem acesso aberto ou fechado, as revistas que mais publicam Data Papers sobre Biodiversidade e o idioma que foram publicados. Resultados: O número em Data Papers teve crescimento exponencial entre 2017 até maio de 2022 logo, os artigos sobre o campo da Biodiversidade também têm aumentado em diversos temas que envolvem todo o seu ecossistema. Conclusão: Os Data Papers analisados se caracterizam como documentos revisados por pares e representam conjuntos de dados indexados com padrões de metadados adequados para preservar digitalmente os dados registrados nas revistas que foram contempladas na presente análise.

Why don't we share data and code? Perceived barriers and benefits to public archiving practices

Preprint

Full-text available

Jun 2022

The biological sciences community is increasingly recognizing the value of open, reproducible, and transparent research practices for science and society at large. Despite this recognition, many researchers remain reluctant to share their data and code publicly. This hesitation may arise from knowledge barriers about how to archive data and code, concerns about its re-use, and misaligned career incentives. Here, we define, categorise, and discuss barriers to data and code sharing that are relevant to many research fields. We explore how real and perceived barriers might be overcome or reframed in light of the benefits relative to costs. By elucidating these barriers and the contexts in which they arise, we can take steps to mitigate them and align our actions with the goals of open science, both as individual scientists and as a scientific community.

Credit distribution in relational scientific databases

Article

May 2022
INFORM SYST

Digital data is a basic form of research product for which citation, and the generation of credit or recognition for authors, are still not well understood. The notion of data credit has therefore recently emerged as a new measure, defined and based on data citation groundwork. Data credit is a real value representing the importance of data cited by a research entity. We can use credit to annotate data contained in a curated scientific database and then as a proxy of the significance and impact of that data in the research world. It is a method that, together with citations, helps recognize the value of data and its creators. In this paper, we explore the problem of Data Credit Distribution, the process by which credit is distributed to the database parts responsible for producing data being cited by a research entity. We adopt as use case the IUPHAR/BPS Guide to Pharmacology (GtoPdb), a widely-used curated scientific relational database. We focus on Select-Project-Join (SPJ) queries under bag semantics, and we define three distribution strategies based on how-provenance, responsibility, and the Shapley value. Using these distribution strategies, we show how credit can highlight frequently used database areas and how it can be used as a new bibliometric measure for data and their curators. In particular, credit rewards data and authors based on their research impact, not only on the citation count. We also show how these distribution strategies vary in their sensitivity to the role of an input tuple in the generation of the output data and reward input tuples differently.

Three-stage publishing to support evidence-based management practice

Article

Full-text available

Oct 2021

Juan A. Marin-Garcia

This article proposes a 4-step model for scientific dissemination that aims to promote evidence-based professional practice in Operations Management or Human Resource Management as well as research with a more transparent and reproducible process. These 4 steps include:1 social network announcements,2 dissemination to scientific journals, 3 dissemination to social networks, and 4 scientific dissemination to professional journals. Central to the 4-step model is a three-stage publication process within the second step, which adds an additional stage to the two previously proposed (Marin-Garcia, 2015). These three publication stages begin with a protocol paper, are followed by a data paper, and finish with a traditional article. Each stage promotes research with merit which is citable and recognizable as such before the scientific evaluation bodies. As two of these stages are largely unknown within the fields of Business and Management, I define the details of a protocol paper and a data paper including their contents. In addition, I provide examples of both papers as well as the other steps of the science dissemination model. This model can be adopted by researchers as a means of achieving greater impact and transfer of research results. This work intends to help researchers to understand, to evaluate, and to make better decisions about how their research reaches society at large outside of academia.In this way, WPOM aligns with the recommendations of several leading journals in the field of business management on the need to promote transparent, accessible, and replicable science (Beugelsdijk et al., 2020). WPOM goes one step further in compliance with this direction by having relevant journals that not only accept, but also actively encourage the publication of protocol papers and data papers. WPOM strives to pioneer in this field of Business and Management.This article also explores the potential prevalence of protocol papers and data papers within the set of all articles published in journals indexed in Clarivate Web of Science and Scopus.With this editorial, WPOM is committed to promoting this model by accepting for review any of the three types of scientific contributions including protocol papers, data papers, and traditional papers.

Information and data ecologies

Chapter

Jan 2022

Tibor Koltay

This chapter addresses questions related to the complex relationships between information, data, and human beings, frequently treated as the foundation of information and data ecologies. We focus on issues that have varied interfaces with literacies, but are not literacies in the proper sense of the word. The first part of this chapter focuses on openness, reproducibility, credibility, and sharing of digital data. Attention is given to research data’s Findability, Accessibility, Interoperability, and Reuse. There is also a short discussion of the relationship between research data and copyright. In the second part, data journals and data papers are targeted, and attention is paid to the problems of measuring and evaluating research data. The third part touches on varied issues, such as possible coauthorships between librarians and researchers, research data management, reputation management, information and data overload, posttruth phenomena and the influence of posttruth, as well as the deluge of publications related to the COVID-19 pandemic.

Data quality, the essential “ingredient”

Chapter

Jan 2022

Tibor Koltay

This chapter acquaints the reader with the general and often changing nature of research on data quality. It is emphasized that research data quality is closely related to business data; however, the goals of scholarly research have become different, especially as the environments shaping the two are different. From among data quality’s attributes, trust receives particular attention. Technical and scientific quality, the relationship of data quality to data reuse, and other quality factors are also examined, including big data quality, intrinsic and extrinsic data quality, and the semiotic representation of quality attributes, as well as their time-related dimensions and retrievability. Although data reuse was addressed in an earlier chapter, its relationship to data quality is touched on in this chapter as well. Sharing the previously mentioned origin with data quality and being closely associated with it, data governance is also portrayed.

Research data management

Chapter

Jan 2022

Tibor Koltay

Research data management (RDM) should be central for both researchers and academic libraries. The latter provide related services that are described in this chapter. RDM embraces the entire research cycle, aiming at making the research process as efficient as possible and facilitating cooperation with other players involved in it. To get a clear picture of the nature of RDM services, a short history of the academic library’s readiness and involvement is described. Skills and competencies necessary for serving research and researchers are enumerated, followed by a portrayal of the planning and building of services, giving particular attention to the research data life cycle and to the importance of data management plans. The tasks related to data reference, data citation, and data retrieval are presented. The relationship between RDM and data curation, as well as between RDM and research support services, is characterized.

A Survey of Exclusively Data Journals and How They Are Indexed by Scientific Databases

Article

Oct 2021

As data becomes omnipresent in the scientific system, a new academic genre aiming to describe data objects (data papers) and the venue to publish these articles (data journals) gradually emerged from the end of the 2000s. However, it is largely unknown how much these scientific outputs are indexed in scientific databases, which has greatly prevented them from being thoroughly studied in large‐scale, quantitative studies. This poster presents our preliminary efforts to address this gap, by compiling a list of data journals that primarily accept data papers (i.e., exclusively data journals) and examining their presence in four major scientific databases. Our results indicate that exclusively data journals are comprehensively indexed in Crossref and Dimensions, two relatively new scientific databases, which can be used to conduct future studies on data papers and journals. The next steps of our project are also discussed in this poster.

The data paper as a sociolinguistic epistemic object: A content analysis on the rhetorical moves used in data paper abstracts

Article

Oct 2021

The data paper is an emerging academic genre that focuses on the description of research data objects. However, there is a lack of empirical knowledge about this rising genre in quantitative science studies, particularly from the perspective of its linguistic features. To fill this gap, this research aims to offer a first quantitative examination of which rhetorical moves—rhetorical units performing a coherent narrative function—are used in data paper abstracts, as well as how these moves are used. To this end, we developed a new classification scheme for rhetorical moves in data paper abstracts by expanding a well‐received system that focuses on English‐language research article abstracts. We used this expanded scheme to classify and analyze rhetorical moves used in two flagship data journals, Scientific Data and Data in Brief. We found that data papers exhibit a combination of introduction, method, results, and discussion‐ and data‐oriented moves and that the usage differences between the journals can be largely explained by journal policies concerning abstract and paper structure. This research offers a novel examination of how the data paper, a data‐oriented knowledge representation, is composed, which greatly contributes to a deeper understanding of research data and its publication in the scholarly communication system.

생태학 분야 데이터 저널 발행 전략 연구-K기관을 중심으로

Article

Full-text available

Dec 2020

오픈 사이언스 시대 연구데이터의 공개를 가속화하고 접근성 및 인용가능성 개선 및 연구데이터에 대한 표준화된 기술 문서 제공은 또 다른 과학적 발견에 기여할 수 있어 데이터 출판이 주목을 받고 있다. 또한 출판된 데이터 역시 연구논문과 동등한 지위를 유지할 수 있는 방안으로 데이터 논문이 대두되고, 새로운 학술출판의 유형으로 데이터 저널 발간이 증가추세에 있다. 특히 생태학 분야는 대규모 연구데이터가 생산되고 관리되어야 하는 분야로 전세계적으로 데이터 저널 발간이 활발하다. 반면 국내에는 데이터 저널 연구가 초기 단계이고, 생태학 분야 데이터 저널이 전무하다. 이에 본 연구에서는 생태 분야의 데이터 저널을 발간하기 위한 전략을 탐색하고 제시하였다. 먼저 국내외 데이터 저널 발간 현황과 국내 저널 출판 현황을 조사하였다. 또한 학술출판 및 오픈액세스 정책 전문가, 생태학 학술지 발간 전문가로 구성된 전문가그룹 인터뷰를 수행하였다. 현재 데이터 저널 발간 인프라가 제대로 구축되지 않고 이에 대한 평가체제가 갖추어 지지 않은 국내 학술출판 관행을 반영하고 국내외 조사결과와 전문가 FGI를 실시 결과를 바탕으로 생태학 분야 데이터 저널 출간 방향, 데이터 논문 투고지침, 저널 구성 및 발행주기, 저널 편집위원 구성, 원고의 수급 측면에서 전략을 제시하였다.

Gap in the Wall: A Writing Center, Offering Complex Research Support

Chapter

Jan 2021

The Writing Center is the newest, innovative service, established as a project-based initiative within the organization of the Library of Corvinus University Budapest. The present and future goals of the Writing Center require a wide spectrum of services if wanting to cater for the needs of doctoral students and faculty members. This includes traditional and novel tasks, such as fostering publication activities, combating information overload, being familiar with abstract writing, and Open Access offered to experienced and to early career researchers. The goal in this chapter is to demonstrate how the learning and research support activities of a library, comprising curricular and extra-curricular courses, trainings, and consultations can be integrated into the knowledge structures of the university as a whole. The authors place special emphasis on the role of group-based and individual mentoring throughout a university career, spanning from student to researcher, and on the development of transversal skills through the training programs of the Writing Center.

Library and Information Studies for Arctic Social Sciences and Humanities

Book

Oct 2020

The Value of Research Data - Metrics for datasets from a cultural and technical point of view. A Knowledge Exchange Report (April 2013)

Research

Full-text available

Sep 2016

Scientific research revolves around the production, analysis, storage, management, and re-use of data. Data sharing offers important benefits for scientific progress and advancement of knowledge. However, several limitations and barriers in the general adoption of data sharing are still in place. Probably the most important challenge is that data sharing is not yet very common among scholars and is not yet seen as a regular activity among scientists, although important efforts are being invested in promoting data sharing. This report seeks to further explore the possibilities of metrics for datasets (i.e. the creation of reliable data metrics) and an effective reward system that aligns the main interests of the main stakeholders involved in the process. The report reviews the current literature on data sharing and data metrics. It presents interviews with the main stakeholders on data sharing and data metrics. It also analyses the existing repositories and tools in the field of data sharing that have special relevance for the promotion and development of data metrics

OUT OF CITE, OUT OF MIND: THE CURRENT STATE OF PRACTICE, POLICY, AND TECHNOLOGY FOR THE CITATION OF DATA

Article

Full-text available

Dec 2013
Data Sci J

PREFACE The growth in the capacity of the research community to collect and distribute data presents huge opportunities. It is already transforming old methods of scientific research and permitting the creation of new ones. However, the exploitation of these opportunities depends upon more than computing power, storage, and network connectivity. Among the promises of our growing universe of online digital data are the ability to integrate data into new forms of scholarly publishing to allow peer-examination and review of conclusions or analysis of experimental and observational data and the ability for subsequent researchers to make new analyses of the same data, including their combination with other data sets and uses that may have been unanticipated by the original producer or collector. The use of published digital data, like the use of digitally published literature, depends upon the ability to identify, authenticate, locate, access, and interpret them. Data citations provide necessary support for these functions, as well as other functions such as attribution of credit and establishment of provenance. References to data, however, present challenges not encountered in references to literature. For example, how can one specify a particular subset of data in the absence of familiar conventions such as page numbers or chapters? The traditions and good practices for maintaining the scholarly record by proper references to a work are well established and understood in regard to journal articles and other literature, but attributing credit by bibliographic references to data are not yet so broadly implemented. Recognizing the needs for better data referencing and citation practices and investing effort to address those needs has come at different rates in different fields and disciplines. As competing conventions and practices emerge in separate communities, inconsistencies and incompatibilities can interfere with promoting the sharing and use of research data. In order to reconcile this problem, sharing experiences across communities may be necessary, or at least helpful, to achieving the full potential of published data.

Is Data Publication the Right Metaphor?

Article

Full-text available

Feb 2013
Data Sci J

International attention to scientific data continues to grow. Opportunities emerge to re-visit long-standing approaches to managing data and to critically examine new capabilities. We describe the cognitive importance of metaphor. We describe several metaphors for managing, sharing, and stewarding data and examine their strengths and weaknesses. We particularly question the applicability of a "publication" approach to making data broadly available. Our preliminary conclusions are that no one metaphor satisfies enough key data system attributes and that multiple metaphors need to co-exist in support of a healthy data ecosystem. We close with proposed research questions and a call for continued discussion.

Out of cite, out of mind: The current state of practice, policy and technology for data citation

Article

Full-text available

Sep 2014
Data Sci J

A vision towards Scientific Communication Infrastructures: On bridging the realms of Research Digital Libraries and Scientific Data Centers

Article

Full-text available

Sep 2013

The two pillars of the modern scientific communication are Data Centers and Research Digital Libraries (RDLs), whose technologies and admin staff support researchers at storing, curating, sharing, and discovering the data and the publications they produce. Being realized to maintain and give access to the results of complementary phases of the scientific research process, such systems are poorly integrated with one another and generally do not rely on the strengths of the other. Today, such a gap hampers achieving the objectives of the modern scientific communication, that is, publishing, interlinking, and discovery of all outcomes of the research process, from the experimental and observational datasets to the final paper. In this work, we envision that instrumental to bridge the gap is the construction of “Scientific Communication Infrastructures”. The main goal of these infrastructures is to facilitate interoperability between Data Centers and RDLs and to provide services that simplify the implementation of the large variety of modern scientific communication patterns.

Beyond dead trees: Integrating the scientific process in the Biodiversity Data Journal

Article

Full-text available

Sep 2013

Enhanced Publications: Data Models and Information Systems

Article

Full-text available

Apr 2014

Dissemination of research outcomes via traditional publications, in either paper or digital form, does not suffice to satisfy modern e-Research and e-Science scholarly communication requirements, which demand for sharing and immediate access to scientific publications, datasets, or experimental con-text of research activities. "Enhanced publications" emerged as a possible mean to address these new needs. They are digital publications, with own "identity" and "descriptive metadata", made of several "parts": a mandatory publication "text" plus "related material" (e.g. datasets, other publications, images, tables, workflows, devices). The state-of-the-art on enhanced publica-tions has today reached the point where some kind of common understanding is required, in order to provide the tools and language for scientists to com-pare, analyze, or simply discuss the multitude of solutions in the field. In this paper we propose a classification of enhanced publication solutions based on the structure and semantics of the given enhanced publications ("document model features") and the functionality they support to manage and consume the given enhanced publications ("consuming purposes").

Bias in peer review

Article

Full-text available

Jan 2013
J AM SOC INF SCI TEC

Research on bias in peer review examines scholarly communication and funding processes to assess the epistemic and social legitimacy of the mechanisms by which knowledge communities vet and self-regulate their work. Despite vocal concerns, a closer look at the empirical and methodological limitations of research on bias raises questions about the existence and extent of many hypothesized forms of bias. In addition, the notion of bias is predicated on an implicit ideal that, once articulated, raises questions about the normative implications of research on bias in peer review. This review provides a brief description of the function, history, and scope of peer review; articulates and critiques the conception of bias unifying research on bias in peer review; characterizes and examines the empirical, methodological, and normative claims of bias in peer review research; and assesses possible alternatives to the status quo. We close by identifying ways to expand conceptions and studies of bias to contend with the complexity of social interactions among actors involved directly and indirectly in peer review.

Making Data a First Class Scientific Output: Data Citation and Publication by NERC's Environmental Data Centres

Article

Full-text available

Mar 2012

The NERC Science Information Strategy Data Citation and Publication project aims to develop and formalise a method for formally citing and publishing the datasets stored in its environmental data centres. It is believed that this will act as an incentive for scientists, who often invest a great deal of effort in creating datasets, to submit their data to a suitable data repository where it can properly be archived and curated. Data citation and publication will also provide a mechanism for data producers to receive credit for their work, thereby encouraging them to share their data more freely.

The Anatomy of a Data Citation: Discovery, Reuse, and Credit

Article

Full-text available

May 2012

INTRODUCTION Data citation should be a necessary corollary of data publication and reuse. Many researchers are reluctant to share their data, yet they are increasingly encouraged to do just that. Reward structures must be in place to encourage data publication, and citation is the appropriate tool for scholarly acknowledgment. Data citation also allows for the identification, retrieval, replication, and verification of data underlying published studies. METHODS This study examines author behavior and sources of instruction in disciplinary and cultural norms for writing style and citation via a content analysis of journal articles, author instructions, style manuals, and data publishers. Instances of data citation are benchmarked against a Data Citation Adequacy Index. RESULTS Roughly half of journals point toward a style manual that addresses data citation, but the majority of journal articles failed to include an adequate citation to data used in secondary analysis studies. DISCUSSION Full citation of data is not currently a normative behavior in scholarly writing. Multiplicity of data types and lack of awareness regarding existing standards contribute to the problem. CONCLUSION Citations for data must be promoted as an essential component of data publication, sharing, and reuse. Despite confounding factors, librarians and information professionals are well-positioned and should persist in advancing data citation as a normative practice across domains. Doing so promotes a value proposition for data sharing and secondary research broadly, thereby accelerating the pace of scientific research.

The Role of Peer Review for Scholarly Journals in the Information Age

Article

Full-text available

Jan 2007
J Electron Publish

David Solomon

This article discusses recent innovations in how peer review is conducted in light of the various functions journals fulfill in scholarly communities.

Citation and Peer Review of Data: Moving Towards Formal Data Publication

Article

Full-text available

Oct 2011

This paper discusses many of the issues associated with formally publishing data in academia, focusing primarily on the structures that need to be put in place for peer review and formal citation of datasets. Data publication is becoming increasingly important to the scientific community, as it will provide a mechanism for those who create data to receive academic credit for their work and will allow the conclusions arising from an analysis to be more readily verifiable, thus promoting transparency in the scientific process. Peer review of data will also provide a mechanism for ensuring the quality of datasets, and we provide suggestions on the types of activities one expects to see in the peer review of data. A simple taxonomy of data publication methodologies is presented and evaluated, and the paper concludes with a discussion of dataset granularity, transience and semantics, along with a recommended human-readable citation syntax.

Biodiversity data should be published, cited, and peer reviewed

Article

Full-text available

Jun 2013
TRENDS ECOL EVOL

Concerns over data quality impede the use of public biodiversity databases and subsequent benefits to society. Data publication could follow the well-established publication process: with automated quality checks, peer review, and editorial decisions. This would improve data accuracy, reduce the need for users to 'clean' the data, and might increase data use. Authors and editors would get due credit for a peer-reviewed (data) publication through use and citation metrics. Adopting standards related to data citation, accessibility, metadata, and quality control would facilitate integration of data across data sets. Here, we propose a staged publication process involving editorial and technical quality controls, of which the final (and optional) stage includes peer review, the most meritorious publication standard in science.

Klump, J.; Bertelmann, R.; Brase, J.; Diepenbroek, M.; Grobe, H.; Höck, H.; Lautenschlager, M.; Schindler, U.; Sens, I.; Wächter, J. Data publication in the open access initiative In: Data Science Journal, 5 2006. 79-83 p. 10.2481/dsj.5.79

Article

Full-text available

Jun 2006
Data Sci J

The 'Berlin Declaration' was published in 2003 as a guideline to policy makers to promote the Internet as a functional instrument for a global scientific knowledge base. Because knowledge is derived from data, the principles of the 'Berlin Declaration' should apply to data as well. Today, access to scientific data is hampered by structural deficits in the publication process. Data publication needs to offer authors an incentive to publish data through long-term repositories. Data publication also requires an adequate licence model that protects the intellectual property rights of the author while allowing further use of the data by the scientific community.

GigaDB: Announcing the GigaScience database

Article

Full-text available

Jul 2012

With the launch of GigaScience journal, here we provide insight into the accompanying database GigaDB, which allows the integration of manuscript publication with supporting data and tools. Reinforcing and upholding GigaScience's goals to promote open-data and reproducibility of research, GigaDB also aims to provide a home, when a suitable public repository does not exist, for the supporting data or tools featured in the journal and beyond.

The Dryad data repository: A Singapore Framework metadata architecture in a DSpace Environment

Article

Full-text available

Jan 2008

This report presents recent metadata developments for Dryad, a digital repository hosting datasets underlying publications in the field of evolutionary biology. We review our efforts to bring the Dryad application profile into conformance with the Singapore Framework and discuss practical issues underlying the application profile implementation in a DSpace environment. The report concludes by outlining the next steps planned as Dryad moves into the next phase of development.

Overlay Journals and Data Publishing in the Meteorological Sciences

Article

Full-text available

Jan 2009

Scientific Utopia: I. Opening Scientific Communication

Article

Full-text available

May 2012

Existing norms for scientific communication are rooted in anachronistic practices of bygone eras, making them needlessly inefficient. We outline a path that moves away from the existing model of scientific communication to improve the efficiency in meeting the purpose of public science - knowledge accumulation. We call for six changes: (1) full embrace of digital communication, (2) open access to all published research, (3) disentangling publication from evaluation, (4) breaking the "one article, one journal" model with a grading system for evaluation and diversified dissemination outlets, (5) publishing peer review, and, (6) allowing open, continuous peer review. We address conceptual and practical barriers to change, and provide examples showing how the suggested practices are being used already. The critical barriers to change are not technical or financial; they are social. While scientists guard the status quo, they also have the power to change it.

The data paper: A mechanism to incentivize data publishing in biodiversity science

Article

Full-text available

Dec 2011
BMC BIOINFORMATICS

Free and open access to primary biodiversity data is essential for informed decision-making to achieve conservation of biodiversity and sustainable development. However, primary biodiversity data are neither easily accessible nor discoverable. Among several impediments, one is a lack of incentives to data publishers for publishing of their data resources. One such mechanism currently lacking is recognition through conventional scholarly publication of enriched metadata, which should ensure rapid discovery of 'fit-for-use' biodiversity data resources. We review the state of the art of data discovery options and the mechanisms in place for incentivizing data publishers efforts towards easy, efficient and enhanced publishing, dissemination, sharing and re-use of biodiversity data. We propose the establishment of the 'biodiversity data paper' as one possible mechanism to offer scholarly recognition for efforts and investment by data publishers in authoring rich metadata and publishing them as citable academic papers. While detailing the benefits to data publishers, we describe the objectives, work flow and outcomes of the pilot project commissioned by the Global Biodiversity Information Facility in collaboration with scholarly publishers and pioneered by Pensoft Publishers through its journals Zookeys, PhytoKeys, MycoKeys, BioRisk, NeoBiota, Nature Conservation and the forthcoming Biodiversity Data Journal. We then debate further enhancements of the data paper beyond the pilot project and attempt to forecast the future uptake of data papers as an incentivization mechanism by the stakeholder communities. We believe that in addition to recognition for those involved in the data publishing enterprise, data papers will also expedite publishing of fit-for-use biodiversity data resources. However, uptake and establishment of the data paper as a potential mechanism of scholarly recognition requires a high degree of commitment and investment by the cross-sectional stakeholder communities.

Software is data too

Conference Paper

Full-text available

Nov 2010

Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. We argue in this position paper that data mining, statistical analysis, machine learning, information retrieval, data integration, etc., are necessary solutions to deal with software data. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created. In order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners. Software engineering researchers can get inspired by other fields, such as, bioinformatics, where results of mining and analyzing biological data are often stored in databases shared across the world.

Negative results are disappearing from most disciplines and countries

Article

Full-text available

Mar 2012

Daniele Fanelli

Concerns that the growing competition for funding and citations might distort science are frequently discussed, but have not been verified directly. Of the hypothesized problems, perhaps the most worrying is a worsening of positive-outcome bias. A system that disfavours negative results not only distorts the scientific literature directly, but might also discourage high-risk projects and pressure scientists to fabricate and falsify their data. This study analysed over 4,600 papers published in all disciplines between 1990 and 2007, measuring the frequency of papers that, having declared to have “tested” a hypothesis, reported a positive support for it. The overall frequency of positive supports has grown by over 22% between 1990 and 2007, with significant differences between disciplines and countries. The increase was stronger in the social and some biomedical disciplines. The United States had published, over the years, significantly fewer positive results than Asian countries (and particularly Japan) but more than European countries (and in particular the United Kingdom). Methodological artefacts cannot explain away these patterns, which support the hypotheses that research is becoming less pioneering and/or that the objectivity with which results are produced and published is decreasing.

Next Steps in Data Publishing

Article

Full-text available

Sep 2011

Data Sharing by Scientists: Practices and Perceptions

Article

Full-text available

Jun 2011
PLOS ONE

Scientific research in the 21st century is more data intensive and collaborative than in the past. It is important to study the data practices of researchers--data accessibility, discovery, re-use, preservation and, particularly, data sharing. Data sharing is a valuable part of the scientific method allowing for verification of results and extending research from prior results. A total of 1329 scientists participated in this survey exploring current data sharing practices and perceptions of the barriers and enablers of data sharing. Scientists do not make their data electronically available to others for various reasons, including insufficient time and lack of funding. Most respondents are satisfied with their current processes for the initial and short-term parts of the data or research lifecycle (collecting their research data; searching for, describing or cataloging, analyzing, and short-term storage of their data) but are not satisfied with long-term data preservation. Many organizations do not provide support to their researchers for data management both in the short- and long-term. If certain conditions are met (such as formal citation and sharing reprints) respondents agree they are willing to share their data. There are also significant differences and approaches in data management practices based on primary funding agency, subject discipline, age, work focus, and world region. Barriers to effective data sharing and preservation are deeply rooted in the practices and culture of the research process as well as the researchers themselves. New mandates for data management plans from NSF and other federal agencies and world-wide attention to the need to share and preserve data could lead to changes. Large scale programs, such as the NSF-sponsored DataNET (including projects like DataONE) will both bring attention and resources to the issue and make it easier for scientists to apply sound data management principles.

Data Publishing and Scientific Journals: The Future of the Scientific Paper in a World of Shared Data

Article

Full-text available

Oct 2010

Erik De Schutter

The rapid growth of the internet and related technologieshas already had a tremendous impact on scientific publish-ing. This journal has given attention to open accesspublishing (Ascoli 2005; Bug 2005; Merkel-Sobotta 2005;Velterop 2005), to reforming the review process (DeSchutter 2007; Saper and Maunsell 2009) and to theproblems with getting authors to share their data (Ascoli2006; Kennedy 2006; Teeters et al. 2008; Van Horn andBall 2008) and how to enhance the use of shared data(Gardner et al. 2008; Kennedy 2010).But the impact of the internet and data warehousing onscience will be much larger and there is a growing interestin how these technologies can be leveraged to improve thescientific process (Hey et al. 2009). Let’s travel towards thefuture and imagine that not only the tools and infrastructureare available to share scientific data at any time after it isgenerated, but that it has also become standard practice forthe community to do so. How this can be achieved is notthe focus of this editorial, instead I want to speculate on therelationship between scientific papers and data repositories(Bourne 2005, 2010; Cinkosky et al. 1991) in such anenvironment. It is important for the scientific community todiscuss these issues now because, while these technologiesare expected to radically improve the scientific process,they will also change the way in which our work isevaluated.I propose that we should distinguish data publishingfrom paper publishing (Callaghan et al. 2009; Cinkosky etal. 1991) and, when established for specific scientific fields,promote data publishing as the primary outlet for much ofthe scientific output.A good metaphor for data publishing is to look at howcomplete organism genomic sequences are published inhigh impact journals now (Srivastava et al. 2010; Warren etal. 2010). Such papers serve really two goals: to announcethe availability of the genome sequence in GenBank and todescribe some scientific conclusions based on the analysisof the genome. The perceived importance of the latterdetermines whether a high impact journal will accept thepaper and therefore the authors spend a lot of effort inhyping this part. But are these two components irrevocablyintertwined? Couldn’t one just publish the data, in this caseby depositing the complete sequence in a database, andannounce this fact through a form of publication? Theanalysis can then be published separately at a later time ordistributed over different papers, etc. This is not donebecause at present the publication of the paper in the highimpact journal is considered to be the optimal reward forthe researchers, both for career advancement and forsuccess in obtaining new grants (Bourne 2005). I call datapublication a method where the data providers, who may bedifferent from the people who analyze the data, receivecredit for their work when they deposit the sequence in thedatabase and where subsequent access to the data is trackedand considered equivalent to paper citation.There are a number of advantages to considering datapublication as a separate process. First, credit assignmentbecomes more explicitly defined among the authors.Several journals (like Nature, Science, the PLoS series,etc.) have taken steps towards a more granular creditassignment by asking authors to explicitly list their

What Do I Want from the Publisher of the Future?

Article

Full-text available

May 2010
PLOS COMPUT BIOL

Philip Bourne

When I took on the role of Editor-in-Chief of this open-access journal, I began, for the first time, to think about scholarly communication beyond submitting my papers and getting them published. This thinking led to previous Perspectives [1]–[3], all of which shared an underlying theme—there are many opportunities to achieve better dissemination and comprehension of our science, and as producers of that output I believe authors have a responsibility to see it used in the best possible way.

Beyond the Data Deluge (Computer Science)

Article

Full-text available

Apr 2009
SCIENCE

The demands of data-intensive science represent a challenge for diverse scientific communities.

Scientific Data Reusability: Concepts, Impediments and Enabling Technologies

Article

Sep 2015

Costantino Thanos

High-throughput scientific instruments are generating massive amounts of data. Today one of the main challenges faced by researchers is to make the best use of the world’s growing wealth of data. Data (re)usability is becoming a distinct characteristic of modern scientific practice, as it allows reanalysis of evidence, reproduction and verification of results, minimizing duplication of effort, and building on the work of others. The paper addresses the technological dimension of data reusability: the scientific data universe, the impediments of data (re)reuse; the data publication process as a bridge between data author and user and the relevant technologies enabling this process. 1

The evolution of big data as a research and scientific topic: Overview of the literature

Article

Jan 2012

Sharing Research Data.

Article

Jun 1987

isCitedBy: A Metadata Scheme for DataCite

Article

Jan 2011

The DataCite Metadata Scheme is being designed to support dataset citation and discovery. It features a small set of mandatory properties, and an additional set of optional properties for more detailed description. Among these is a powerful mechanism for describing relationships between the registered dataset and other objects. The scheme is supported organizationally and will allow for community input on an ongoing basis.

A rationale for enhanced publications

Article

Jan 2014

A study of open access journals using article processing charges

Article

Aug 2012
J AM SOC INF SCI TEC

Article processing charges (APCs) are a central mechanism for funding open access (OA) scholarly publishing. We studied the APCs charged and article volumes of journals that were listed in the Directory of Open Access Journals as charging APCs. These included 1,370 journals that published 100,697 articles in 2010. The average APC was $906 U.S. dollars (USD) calculated over journals and $904 USD calculated over articles. The price range varied between $8 and $3,900 USD, with the lowest prices charged by journals published in developing countries and the highest by journals with high-impact factors from major international publishers. Journals in biomedicine represent 59% of the sample and 58% of the total article volume. They also had the highest APCs of any discipline. Professionally published journals, both for profit and nonprofit, had substantially higher APCs than journals published by societies, universities, or scholars/researchers. These price estimates are lower than some previous studies of OA publishing and much lower than is generally charged by subscription publishers making individual articles OA in what are termed hybrid journals. © 2012 Wiley Periodicals, Inc.

Open Research Data: From Vision to Practice

Chapter

Dec 2014

“To make progress in science, we need to be open and share.” This quote from Neelie Kroes (2012), vice president of the European Commission describes the growing public demand for an Open Science. Part of Open Science is, next to Open Access to peer-reviewed publications, the Open Access to research data, the basis of scholarly knowledge. The opportunities and challenges of Data Sharing are discussed widely in the scholarly sector. The cultures of Data Sharing differ within the scholarly disciplines. Well advanced are for example disciplines like biomedicine and earth sciences. Today, more and more funding agencies require a proper Research Data Management and the possibility of data re-use. Many researchers often see the potential of Data Sharing, but they act cautiously. This situation shows a clear ambivalence between the demand for Data Sharing and the current practice of Data Sharing. Starting from a baseline study on current discussions, practices and developments the article describe the challenges of Open Research Data. The authors briefly discuss the barriers and drivers to Data Sharing. Furthermore, the article analyses strategies and approaches to promote and implement Data Sharing. This comprises an analysis of the current landscape of data repositories, enhanced publications and data papers. In this context the authors also shed light on incentive mechanisms, data citation practises and the interaction between data repositories and journals. In the conclusions the authors outline requirements of a future Data Sharing culture.

The Fourth Paradigm: Data-Intensive Scientific Discovery

Book

Oct 2009

Multiple

Managing Scientific Data as Public Assets: Data Sharing Practices and Policies Among Full-Time Government Employees

Article

Feb 2014
J AM SOC INF SCI TEC

This paper examines how scientists working in government agencies in the U.S. are reacting to the “ethos of sharing” government-generated data. For scientists to leverage the value of existing government data sets, critical data sets must be identified and made as widely available as possible. However, government data sets can only be leveraged when policy makers first assess the value of data, in much the same way they decide the value of grants for research outside government. We argue that legislators should also remove structural barriers to interoperability by funding technical infrastructure according to issue clusters rather than administrative programs. As developers attempt to make government data more accessible through portals, they should consider a range of other nontechnical constraints attached to the data. We find that agencies react to the large number of constraints by mostly posting their data on their own websites only rather than in data portals that can facilitate sharing. Despite the nontechnical constraints, we find that scientists working in government agencies exercise some autonomy in data decisions, such as data documentation, which determine whether or not the data can be widely shared. Fortunately, scientists indicate a willingness to share the data they collect or maintain. However, we argue further that a complete measure of access should also consider the normative decisions to collect (or not) particular data.

Publication Fees for Open Access Journals: Different Disciplines-Different Methods

Article

Dec 2013
J AM SOC INF SCI TEC

Many authors appear to think that most open access (OA) journals charge authors for their publications. This brief communication examines the basis for such beliefs and finds it wanting. Indeed, in this study of over 9,000 OA journals included in the Directory of Open Access Journals, only 28% charged authors for publishing in their journals. This figure, however, was highest in various disciplines in medicine (47%) and the sciences (43%) and lowest in the humanities (4%) and the arts (0%).

The Availability of Research Data Declines Rapidly with Article Age

Article

Dec 2013

Policies ensuring that research data are available on public archives are increasingly being implemented at the government [1], funding agency [2-4], and journal [5, 6] level. These policies are predicated on the idea that authors are poor stewards of their data, particularly over the long term [7], and indeed many studies have found that authors are often unable or unwilling to share their data [8-11]. However, there are no systematic estimates of how the availability of research data changes with time since publication. We therefore requested data sets from a relatively homogenous set of 516 articles published between 2 and 22 years ago, and found that availability of the data was strongly affected by article age. For papers where the authors gave the status of their data, the odds of a data set being extant fell by 17% per year. In addition, the odds that we could find a working e-mail address for the first, last, or corresponding author fell by 7% per year. Our results reinforce the notion that, in the long term, research data cannot be reliably preserved by individual researchers, and further demonstrate the urgent need for policies mandating data sharing via public archives.

Improving The Future of Research Communications and e-Scholarship

Article

Jan 2012

Scientific Communication Is Changing and Scientists Should Lead the Way

Article

Jul 2012

A response to 21 commentaries of Nosek & Bar-Anan - Scientific Utopia I: Opening Scientific Communication. We make four points: (1) the potential for things to go wrong is not a justification to do nothing; (2) some changes, particularly open access, appear to be inevitable; (3) when authors control publishing, articles will get better, not worse; and (4) despite the substantial cumulative changes, if our proposal were adopted in whole, those that wished to produce and consume their science like they do today could retain most of those practices. However, faced with the alternatives, we believe that they would not choose to do so. We close with practical suggestions that individual scientists can do to embody the value of openness in scientific communication.

"Earth System Science Data" (ESSD) - A Peer Reviewed Journal for Publication of Data

Article

Jan 2011

In 2008, ESSD was established to provide a venue for publishing highly important research data, with two main aims: To provide reward for data "authors" through fully qualified citation of research data, classically aligned with the certification of quality of a peer reviewed journal. A major step towards this goal was the definition and rationale of article structure and review criteria for articles about datasets.

Open access: The true cost of science publishing

Article

Mar 2013
NATURE

Richard Van Noorden

Cheap open-access journals raise questions about the value publishers add for their money.

Open Data and the Social Contract of Scientific Publishing

Article

May 2010

Todd Vision

The Conundrum of Sharing Research Data

Article

Jun 2011
J AM SOC INF SCI TEC

Christine L. Borgman

We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much-improved data curation. (Hanson, Sugden, & Alberts, 2011) Researchers are producing an unprecedented deluge of data by using new methods and instrumentation. Others may wish to mine these data for new discoveries and innovations. However, research data are not readily available as sharing is common in only a few fields such as astronomy and genomics. Data sharing practices in other fields vary widely. Moreover, research data take many forms, are handled in many ways, using many approaches, and often are difficult to interpret once removed from their initial context. Data sharing is thus a conundrum. Four rationales for sharing data are examined, drawing examples from the sciences, social sciences, and humanities: (1) to reproduce or to verify research, (2) to make results of publicly funded research available to the public, (3) to enable others to ask new questions of extant data, and (4) to advance the state of research and innovation. These rationales differ by the arguments for sharing, by beneficiaries, and by the motivations and incentives of the many stakeholders involved. The challenges are to understand which data might be shared, by whom, with whom, under what conditions, why, and to what effects. Answers will inform data policy and practice.

Data Papers - Peer Reviewed Publication of High Quality Data Sets

Article

May 2009

An Introduction to the Dataverse Network as an Infrastructure for Data Sharing

Article

Nov 2007

Gary King

We introduce a set of integrated developments in web application software, networking, data citation standards, and statistical methods designed to put some of the universe of data and data sharing practices on somewhat firmer ground. We have focused on social science data, but aspects of what we have developed may apply more widely. The idea is to facilitate the public distribution of persistent, authorized, and verifiable data, with powerful but easy-to-use technology, even when the data are confidential or proprietary. We intend to solve some of the sociological problems of data sharing via technological means, with the result intended to benefit both the scientific community and the sometimes apparently contradictory goals of individual researchers. Government Version of Record

Sharing Research Data

Article

Nov 1986
MED CARE

Duncan Neuhauser

The politics of publication

Article

Apr 2003

Peter A. Lawrence

Authors, reviewers and editors must act to protect the quality of research.

The fourth paradigm: Data--intensive scientific discovery. Microsoft Research. It's not about the data

Jan 2009
Nat Genet
111

T Hey
S Tansley

Hey, T., Tansley, S., & Tolle, K. (Eds.) (2009). The fourth paradigm: Data--intensive scientific discovery. Microsoft Research. It's not about the data. (2012). Nature Genetics, 44(2), 111. doi:10.1038/ng.1099

Data publishing: Peer review shared standards and collaboration. Presentation at 8th Research Data Management Forum Southampton

R Lawrence

Lawrence, R. (2012). Data publishing: Peer review, shared standards and collaboration. Presentation at 8th Research Data Management Forum, Southampton. Retrieved April, 2014, from http://www.dcc.ac.uk/webfm_send/798

Scientific data reusability: Conceptual foundations, impediments and enabling technologies (Tech. Rep.). Istituto di Scienza e Tecnologie dell'Informazione " A. Faedo Open access: The true cost of science publishing

Jan 2013

C Thanos

Thanos, C. (2014). Scientific data reusability: Conceptual foundations, impediments and enabling technologies (Tech. Rep.). Istituto di Scienza e Tecnologie dell'Informazione " A. Faedo, " CNR. Van Noorden, R. (2013). Open access: The true cost of science publishing.

Data Journals: A Survey

Abstract

No full-text available

Supplementary resource (1)

Recommended publications

Stories and statistics from library-led publishing