Figure - available from: Journal of Archaeological Method and Theory
This content is subject to copyright. Terms and conditions apply.
Workflow diagram showing key steps and software components. The boxes with a bold outline indicate key steps and tools that enable computational reproducibility in our project

Workflow diagram showing key steps and software components. The boxes with a bold outline indicate key steps and tools that enable computational reproducibility in our project

Source publication
Article
Full-text available
The use of computers and complex software is pervasive in archaeology, yet their role in the analytical pipeline is rarely exposed for other researchers to inspect or reuse. This limits the progress of archaeology because researchers cannot easily reproduce each other’s work to verify or extend it. Four general principles of reproducible research t...

Similar publications

Preprint
Full-text available
Data sharing by researchers is a centerpiece of Open Science principles and scientific progress. For a sample of 6019 researchers, we analyze the extent/frequency of their data sharing. Specifically, the relationship with the following four variables: how much they value data citations, the extent to which their data-sharing activities are formally...

Citations

... Nowadays, especially for researchers starting their research works using their machines alone or in small teams, the Linux containers technology is one of the essential reproducible tools helping provision more homogeneous computational environments (Marwick, 2017;Wiebels & Moreau, 2021). This technology allows all dependencies of the computational application, such as libraries, packages, compilers, interpreters, databases, and their respective versions, to be specified and configured programmatically so that the environment is reproduced exactly as specified. ...
Article
Full-text available
Objective: The purpose of this paper is to present a case study on how a recently proposed reproducibility framework named Environment Code-First (ECF) based on the Infrastructure-as-Code approach can improve the implementation and reproduction of computing environments by reducing complexity and manual intervention. Methodology: The study compares the manual way of implementing a pipeline and the automated method proposed by the ECF framework, showing real metrics regarding time consumption, efforts, manual intervention, and platform agnosticism. It details the steps needed to implement the computational environment of a bioinformatics pipeline named MetaWorks from the perspective of the scientist who owns the research work. Also, we present the steps taken to recreate the environment from the point of view of one who wants to reproduce the published results of a research work. Findings and Conclusion: The results demonstrate considerable benefits in adopting the ECF framework, particularly in maintaining the same applicational behavior across different machines. Such empirical evidence underscores the significance of reducing manual intervention, as it ensures the consistent recreation of the environment as many times as needed, especially by non-original researchers. Originality/Value: Verifying published findings in bioinformatics through independent validation is challenging, mainly when accounting for differences in software and hardware to recreate computational environments. Reproducing a computational environment that closely mimics the original proves intricate and demands a significant investment of time. This study contributes to educate and assist researchers in enhancing the reproducibility of their work by creating self-contained computational environments that are highly reproducible, isolated, portable, and platform-agnostic.
... Whether or not we can anticipate what future modules may be useful, it is important to construct the software in such a way that it will not be difficult to accommodate them. Further, creating the software with open-source tools such as Python and R, and hosting the code in accessible repositories such as GitHub, or by hosting interactive webapps, ensures equitable long-term access to the tools beyond the core team that initially created them (Marwick 2016, Marwick et al., 2017. Further, using accessible platforms such as GitHub facilitates central management of the project while still encouraging community-wide co-development through "branch-and-fork" dispersed collaboration. ...
Preprint
Full-text available
Designing an effective archaeological survey can be complicated and confidence that it was effective requires post-survey evaluation. The goal of SPACE is to develop software to facilitate survey designers’ decisions and partially automate tools that depend on mathematical models so that archaeologists can conduct surveys that accomplish their goals and evaluate their results more easily. We aim for SPACE to be a modular and accessible web-based platform for survey planning and quality assurance, with a “front end” that has a non-threatening, question-and-answer format. Its several interacting modules will ultimately include ones for evaluating visibility, estimating sweep widths and coverage, costing, determining sample sizes, transect and test-pit intervals, allocating effort optimally for stratified samples and predictive surveys, and quality assurance. In this paper, we focus on the module for estimating fieldwalkers’ sweep widths on the basis of “calibrations” on fields seeded with artifacts, while also reviewing the overall structure of the project. Sweep widths are critical for estimating coverage, evaluating survey effectiveness and quality, and planning transect intervals.
... Archaeological research has moved towards the adoption of open data, open science policies, free data accessibility and transferrable workflows that allow analyses to be replicated, comparing different regional and chronological settings and site compositions [8][9][10][11]. In this context, we advocate computational reproducibility to enhance transparency of analysis and its results [8,12], as well as empirical reproducibility that allows the transferability of code to new data and to other domains for achieving similar results [12][13][14]. Our aim is also to promote interdisciplinary collaboration and the exploration of synergies across archaeology and environmental sciences-and the humanities and natural sciences in general-through shared computational methods. ...
... Archaeological research has moved towards the adoption of open data, open science policies, free data accessibility and transferrable workflows that allow analyses to be replicated, comparing different regional and chronological settings and site compositions [8][9][10][11]. In this context, we advocate computational reproducibility to enhance transparency of analysis and its results [8,12], as well as empirical reproducibility that allows the transferability of code to new data and to other domains for achieving similar results [12][13][14]. Our aim is also to promote interdisciplinary collaboration and the exploration of synergies across archaeology and environmental sciences-and the humanities and natural sciences in general-through shared computational methods. ...
Article
Full-text available
Point Pattern Analysis (PPA) has gained momentum in archaeological research, particularly in site distribution pattern recognition compared to supra-regional environmental variables. While PPA is now a statistically well-established method, most of the data necessary for the analyses are not freely accessible, complicating reproducibility and transparency. In this article, we present a fully reproducible methodical framework to PPA using an open access database of archaeological sites located in southwest Germany and open source explanatory covariates to understand site location processes and patterning. The workflow and research question are tailored to a regional case study, but the code underlying the analysis is provided as an R Markdown file and can be adjusted and manipulated to fit any archaeological database across the globe. The Early Iron Age north of the Alps and particularly in southwest Germany is marked by numerous social and cultural changes that reflect the use and inhabitation of the landscape. In this work we show that the use of quantitative methods in the study of site distribution processes is essential for a more complete understanding of archaeological and environmental dynamics. Furthermore, the use of a completely transparent and easily adaptable approach can fuel the understanding of large-scale site location preferences and catchment compositions in archaeological, geographical and ecological research.
... org/ 10. 17605/ osf. io/ dqna8 to enable re-use of materials and improve reproducibility and transparency (Marwick, 2017). All of the figures, tables, and statistical test results presented here can be independently reproduced with the code and data in this repository. ...
Article
Full-text available
The transition from the Early to Late Paleolithic in Korea is characterized by the introduction of blade technology, stemmed points, end scrapers, burins, denticulates, and higher proportions of finer grained materials. Stemmed points have been considered a representative tool that led this set of changes. In this study, we examine the possible role that stemmed points played during this technological transition, as well as throughout the Late Paleolithic period (approx. 40~12 ka). Our main questions are as follows: What were the best-fit ballistic probabilities for the stemmed points if they were hafted as weapon tips? How diverse were their likely uses? What are the temporal and spatial patterns of stemmed point use? We measured tip cross-sectional area (TCSA) to distinguish different likely use classes of projectile points, for example, as poisoned arrow tips or as stabbing spears. We analyzed TCSA with other variables, including raw materials, weight, radiocarbon dates, and locations. Our results show that the stemmed points likely served as javelin tips and stabbing spear tips, with smaller numbers as dart tips and un-poisoned arrow tips. TCSA values were controlled mostly by size rather than raw material types. We found different TCSA ranges of stemmed points at different sites, which could indicate people used stemmed points in different ways depending on the local environment. Some sites show a wide range of TCSA values that represent multi-purpose usage of stemmed points. The temporal pattern of TCSA values is one of little change throughout the Late Paleolithic period, but points were predominantly produced before the Last Glacial Maximum (LGM). We observed that stemmed points were mostly located in certain ecoregions in Korea, but no clear spatial pattern was apparent. We conclude that stemmed points were multi-functional tools, with many likely designed for use as javelin and stabbing spear tips.
... While data collection and statistical processing reliant on high precision equipment and ML models is increasingly accessible Mora et al., 2022), there is still much work to do to address the multiple potential sources of bias in the (re)production of this type of research outside the highly specialized research centers that have supported these approaches. If we add to this, that data production and circulation within archaeology is still yet to improve to a stage past point-and-click spreadsheets to store, share and manage data (Marwick, 2017), we can then assume that these types of highly specialized ML approaches for interpreting BSM information could, at the very least, be cumbersome for many zooarchaeologists to manage, collect or enact. Thereby risking that this new set of statistical approaches could perpetuate common vices associated with misunderstanding statistical methods in zooarchaeology particularly regarding sampling issues or not fully disclosing or understanding assumptions related to most inferential statistics (Wolverton et al., 2016;McPherron et al., 2021;Otárola-Castillo et al., 2022). ...
Article
Full-text available
The taphonomic evaluation of zooarchaeological assemblages is paramount for compelling interpretations of past subsistence strategies, palaeoecological interactions and overall biasing of the anthropogenic bone deposition and modification by non-anthropogenic agents. A recent trend in this field, has focused on bone surface modifications (BSM) to successfully generate reference frameworks for critical evaluation of the interpretive potential of selected zooarchaeological samples by examining BSM patterns produced by known agents in experimental settings. These approaches compare anthropogenic and non-human generated patterns using different machine learning (ML) algorithms, classifying and assigning sets of traces to specific processes such as gnawing, scoring or intentional fracture. We propose a simplification in two key aspects to make this new arising area of ML assisted taphonomic interpretation more accessible for zooarchaeologists. First, we encourage a trace oriented multivariate data recording protocol for taphonomic analysis including the most commonly recorded BSM plus others such as shape modification, penetrative modification and element or tissue loss modifications thus expanding comparative potential. Second, we offer a ML approach based on Naïve Bayes Classifier (NBC) that discriminates each given observation by comparing to actualistic taphonomic patterns to determine the average percentage of non-anthropogenic modification within zooarchaeological samples. To illustrate the potential value of these proposals for taphonomical interpretation, we analyzed seven different Mid to Late Holocene assemblages containing avian and mammal remains from the coastal platform in the hyperarid core of the Atacama Desert. We compared their alteration with a local sample of actualistic data recovered to characterize regional taphonomic patterns. A NBC was trained with a 75% random selection of the total archaeological and actualistic samples. Afterwards we tested the remaining sample, achieving over 90% accuracy in determining whether a random specimen from any sample corresponded to an actualistic or an archaeological context implying key differences in taphonomic trace patterns.
... It is widely acknowledged that open science and open archaeology have been steadily developing worldwide in recent years (Kansa and Kansa 2013;Lake 2012;Marwick et al. 2017). Digital archaeology has undoubtedly facilitated progress in discussions about sharing the knowledge generated, not only through publications but also through databases and methodologies that enable better reproducibility of results (Marwick 2017;Marwick and Birch 2018;Marwick and Schmidt 2020). This has allowed more and more professionals and those interested in the discipline to start sharing data that contribute to a better understanding of our past and present. ...
Article
BaDACor is a database that contains a comprehensive inventory of archaeological sites located in the province of Córdoba, Argentina. The creation of this database was the result of a top-down approach, which involved the collaboration of decision-makers and professionals from the academic and state-governmental sectors. Furthermore, the database has also been utilised in a bottom-up approach, whereby interest groups and citizens concerned with heritage preservation have made use of it. This has been particularly important in light of the construction of Highway 38, which has resulted in damage to natural habitats and the destruction of territories of communities with traditional ways of life. Additionally, the construction of the highway has also endangered the integrity of ancestral territories loaded with symbolism for aboriginal communities. BaDACor has been employed in legal claims in cases of conflict with the state, and has proved to be an invaluable tool for heritage management. This is especially significant for local communities and indigenous groups who have historically had their heritage desecrated, destroyed, and hidden. The availability of BaDACor on different platforms has facilitated better access to information while also ensuring the preservation of digital data. The use of digital media has been reinforced through talks, conferences, and meetings with stakeholders to ensure that the voices of affected communities are heard in decision-making processes.
... IO/FP7TA). To produce those files, we followed the procedures described by Marwick [99] for the creation of research compendiums to enhance the reproducibility of research. The files provided contain all the raw data used in our analysis as well as a custom R project [100] holding the code to produce all tables and figures. ...
... The files provided contain all the raw data used in our analysis as well as a custom R project [100] holding the code to produce all tables and figures. To enable maximum reuse, code is released under the MIT license, data as CC-0, and figures as CC-BY (for more information, see [99]). ...
Article
Full-text available
Southwestern Iberia has played a key role in characterizing Late Pleistocene human ecodynamics. Among other aspects of human behavior, chert procurement and management studies in this region have received increasing attention in the past two decades, especially focusing on the sites showing repeated human occupation, such as the case of Vale Boi (Southern Portugal). However, these studies have been very limited in their geographical scope, and mostly focused on brief macroscopic descriptions of the raw materials. To further our knowledge of the relationship between regional availability of raw materials and its impact on human adaptations and mobility, a more detailed approach to characterizing geological sources is needed. This paper characterizes chert raw materials location, diversity, and availability in a geologically well-defined region of southern Portugal ‐ the Algarve. Through macroscopic and petrographic approaches, we provide a detailed characterization of geological chert sources to build a frame of reference for chert exploitation in the region. Our results show that there are four main chert formations in Algarve, and that despite the within-source variability, sufficient differences at macroscopic and petrographic levels are present to allow clear source attribution. These results provide a baseline for raw material studies in archaeological assemblages across southwestern Iberia, that will be essential to further characterize the dynamics of human behavior in some of the most important eco-cultural niches.
... The data is presented in a clear and understandable manner, ensuring that it can be readily used, converted and reproduced. To ensure proper disclosure, we adhere to a series of basic principles and guidelines outlined in previous works on reproducible research in archaeology [16][17][18]. ...
... In [15], Ben Marwick emphasizes the significance of simulation studies in archaeology, which are conducted through computationally-intensive analyses utilizing mathematical functions based on single-precision floating-point arithmetic. These studies raise concerns about result variations across different operating systems. ...
... Often working independently, scientists predominantly employ a personal computer (i.e., desktop or laptop) using one of these three operating systems, Microsoft Windows, MacOS, or Linux. Their attention is solely dedicated to the scientific predicament [15,21]. In a survey conducted among 60 developers of scientific software, approximately 80% of the participants worked on their applications alone [22]. ...
... Requirements identification. 15 During the analysis step, approximately 60 minutes were dedicated to reviewing and completing the form containing the environmental requirements, with many of them already outlined in [10]. Table 1 displays the form containing the questions and answers utilized in the creation of the container module for the MetaWorks environment. ...
Preprint
Full-text available
Verifying published findings in bioinformatics through independent validation is challenging, mainly when accounting for differences in software and hardware to recreate computational environments. Reproducing a computational environment that closely mimics the original proves intricate and demands a significant investment of time. In this paper, we present a case study on how a recently proposed reproducibility framework named Environment Code-First (ECF) based on the Infrastructure-as-Code approach can improve the implementation and reproduction of computing environments by reducing complexity and manual intervention. We detail the steps needed to implement the computational environment of a bioinformatics pipeline named MetaWorks from the perspective of the scientist who owns the research work. Also, we present the steps taken to recreate the environment from the point of view of one who wants to reproduce the published results of a research work. This exercise compares the manual way of implementing the pipeline and the automated method proposed by the ECF framework, showing real metrics regarding time consumption, efforts, manual intervention, and platform agnosticism.
... The entire R code (R Core Team 2019) used for all the analysis and visualizations contained in this paper is included in the Supplementary Online Materials at https://doi.org/10. 17605/OSF.IO/ZFB36 to enable re-use of materials and improve reproducibility and transparency (Marwick 2017). Also in this version-controlled compendium (Marwick, Boettiger, and Mullen 2018) are the raw data for our study, which are equivalent to the publicly available data in the conference programs but are organized here in a tabular structure convenient for computational analysis. ...