Fig 1 - uploaded by Jonathan Robert Wood
Content may be subject to copyright.
a Map showing Ain Sinu and other sites in the Parthian and Sasanian empires. b Aerial photograph of Ain Sinu from the British Academy, Sir M. Aurel Stein Collection (item: ASA/3/35; Obverse: 15098). Use of photograph permitted by British Academy. c Parthian blue-glazed pottery jar (second-third century CE) recovered in North

a Map showing Ain Sinu and other sites in the Parthian and Sasanian empires. b Aerial photograph of Ain Sinu from the British Academy, Sir M. Aurel Stein Collection (item: ASA/3/35; Obverse: 15098). Use of photograph permitted by British Academy. c Parthian blue-glazed pottery jar (second-third century CE) recovered in North

Source publication
Article
Full-text available
Chemical compositional data sets of archaeological artefacts are often analysed using standard statistical procedures. Adopting a different approach, we examine the major element oxides found in Parthian and Sasanian glazed pottery by identifying statistically important ratios of oxides in conjunction with the expert knowledge of the archaeological...

Context in source publication

Context 1
... was shown recently that glazes on Parthian pottery recovered at the early 3rd century CE Roman military outpost of Ain Sinu in northern Iraq ( Fig. 1) had compositions similar to Roman glass, suggesting that Roman glass was recycled and reapplied as a glaze by Parthian potters (Wood and Hsu 2020). More significantly, this practice indicates that glass and glazes were regarded as the same material in Mesopotamia and that the presence of Roman glass reapplied as a glaze on Parthian ...

Citations

... Furthermore, since there are always several logratios that compete closely in this search, expert knowledge can be incorporated to select certain logratios that have more geochemical meaning. This strategy has been successful in two studies, in biochemistry (Graeve and Greenacre, 2020) and archaeology (Wood and Greenacre, 2021), leading to a reduction in the number of compositional parts that need to be considered and a consequent simplification of the results. Fig. 7. On the left, the proportion of between-cluster sum of squares (BSS) out of total sum of squares (TSS) for increasing numbers of clusters in k-means non-hierarchical clustering. ...
... Several stopping criteria are possible for these three variants, optimising information measures or ensuring significance of the logratios with the Bonferroni correction. All variants allow the researcher to see the explanatory power of several candidate logratios and modify the LR entered at a given step from his or her expert knowledge, as incorporated into the selection process in [20,48,55]. This includes the possibility to force the inclusion of certain logratios or non-compositional variables from the start. ...
... With respect to the last of the above-mentioned options, there are already three published studies [20,48,55] where the expert with domain knowledge has interacted with the statistical algorithm to make choices of LRs from a list of those competing to enter. The idea in the present context is to present the expert with the 'top 20' LRs, say, in decreasing order of importance in the modelling, that is increasing order of −2logLik. ...
... The possibility to take advantage of the user's judgement in order to select meaningful albeit statistically suboptimal LRs has already been developed for unsupervised learning [22,23]. In supervised learning, this can also include forcing non-compositional controls into the model and can be a way out of the limitations of purely data-driven approaches [20,48,55]. The possibility has been shown in the application section by forcing in a part that had been found to be relevant in the previous study by [49]. ...
Article
Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in compositional data analysis, and include the well-known additive logratios as particular cases. When the number of parts is large (sometimes even larger than the number of cases), some form of logratio selection is needed. In this article, we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that K−1 selected logratios involve a K-part subcomposition. Our approach allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an application on a dataset from a study predicting Crohn's disease.
... Rather than doing a dimension reduction, where all parts participate, as in Figures 7 and 8, the idea of reducing the number of variables is an alternative [59]. The question of selecting a small set of LRs that satisfy a practical objective has been addressed in [30,41,92]. The idea is very simple and involves choosing LRs in a stepwise fashion, with the objective of explaining the maximum amount of variance in the compositional data set. ...
... In all pairwise logratio selection strategies, the statistical optimality criteria can be juxtaposed with domain knowledge for the choice of a set of LRs that satisfies both statistical criteria and substantive relevance to the research question. This approach has already been successfully implemented in three studies, two in biochemistry and one in archaeology [30,80,92], where at each step a list of the top 20, say, LRs according to the statistical criteria were consulted by the researcher, who then selected an LR according to expert knowledge. Usually the top LRs are very close to one another in terms of statistical optimality, so very little is sacrificed by choosing a slightly less optimal LR that has a clearer interpretation. ...
... Furthermore, they assist with the zeros problem in CoDA, since some components with data zeros can be incorporated into a summed component, whereas zeros are still problematic in a geometric mean. Fortunately, researchers who use logratio transformations are starting to use amalgamations again in ratios, for example, [42,45,75,92]. ...
... Although 'standard' techniques using the original compositional values are often considered sufficient in archaeology (Baxter & Freestone, 2006), the approach advocated involves logarithmically transformed ratios, or logratios (Aitchison, 1986(Aitchison, , 2005Aitchison & Greenacre, 2002;Baxter, 1989;Buxeda i Garrigós, 1999, 2008Pawlowsky-Glahn & Buccianti, 2011;Greenacre, 2018Greenacre, , 2021, as it is considered that this approach is more mathematically rigorous than traditional ones. Furthermore, it is considered that the analyses conducted here, which incorporate the expert knowledge of the archaeological scientist during, rather than after, the statistical investigation, can improve transparency of the stages in the analyses (Wood & Greenacre, 2020). ...
Article
Full-text available
Research into ancient Chinese metallurgy has flourished over recent years with the accumulation of analytical data reflecting the needs of so many archaeological finds. However, the relationship between technology and society is unlikely to be revealed simply by analysing more artefacts. This is particularly evident in the debates over the sources of metals used to manufacture the Chinese ritual bronzes of the Shang (c. 1500-1046 BCE), Western Zhou (c. 1046–771 BCE) and Eastern Zhou (c. 771–256 BCE) dynasties. This article recognises that approaches to analytical data often fail to provide robust platforms from which to investigate metallurgical technology within its wider social and cultural contexts. To address this issue, a recently developed multivariate approach is applied to over 300 Chinese ritual bronzes from legacy data sets and nearly 100 unearthed copper-based objects from Anyang and Hanzhong. Unlike previous investigations that have relied predominantly on interpreting lead isotope signatures, the compositional analyses presented here indicate that copper and lead used to manufacture the bronzes are derived from mining progressively deeper ores in the same deposits rather than seeking out new sources. It is proposed that interpretations of social, cultural and technological change predicated on the acquisition of metals from disparate regions during the Chinese Bronze Age may need to be revised.
... Furthermore, they assist with the zeros problem in CoDA, since some components with data zeros can be incorporated into a summed component, whereas zeros are still problematic in a geometric mean. Fortunately, researchers who use logratio transformations are starting to use amalgamations again in ratios, for example [69,43,46,84]. ...
... In all pairwise logratio selection strategies the statistical optimality criteria can be juxtaposed with domain knowledge for the choice of a set of logratios that satisfies both statistical criteria and substantive, research-oriented, relevance. This approach has already been successfully implemented in three studies, two in biochemistry and one in archaeology [29,73,84], where at each step a list of the top 20, say, LRs according to the statistical criteria were consulted by the researcher, who then selected an LR according to expert knowledge. Usually the top LRs are very close to one another in terms of statistical optimality, so very little is sacrificed by choosing a slightly less optimal LR that has a clearer interpretation. ...
Preprint
Full-text available
The development of John Aitchison's approach to compositional data analysis is followed since his paper read to the Royal Statistical Society in 1982. Aitchison's logratio approach, which was proposed to solve the problematic aspects of working with data with a fixed sum constraint, is summarized and reappraised. It is maintained that the principles on which this approach was originally built, the main one being subcompositional coherence, are not required to be satisfied exactly -- quasi-coherence is sufficient in practice. This opens up the field to using simpler data transformations with easier interpretations and also for variable selection to be possible to make results parsimonious. The additional principle of exact isometry, which was subsequently introduced and not in Aitchison's original conception, imposed the use of isometric logratio transformations, but these have been shown to be problematic to interpret. If this principle is regarded as important, it can be relaxed by showing that simpler transformations are quasi-isometric. It is concluded that the isometric and related logratio transformations such as pivot logratios are not a prerequisite for good practice, and this conclusion is fully supported by a case study in geochemistry provided as an appendix.
... The confidence ellipses were determined by bootstrapping, which allows the computation of a region enclosing a multivariate mean with prescribed confidence, in this case 95%. In effect, these confidence plots show the variability in the mean rather than the variability in individual values, that is, there is a 95% chance that the ellipse includes the true mean (for more details and the R code on how Figure 1 and the confidence ellipses around the group means were constructed (see Wood & Greenacre, 2021). (Table A2) from Parthian pottery recovered at the Roman military outpost of Ain Sinu in northern Mesopotamia (modern-day Iraq) excavated in the 1950s by Professor David Oates and Dr. Joan Oates. ...
Article
Full-text available
Any archaeological artefact made from recyclable material may have been recycled before deposition. Three approaches are presented which have identified recycling in the archaeological record: 1) the application of logratio analyses to investigate compositional data indicates that Roman glass was recycled and reapplied as a glaze on Parthian pottery, thereby suggesting that the paucity of Parthian and Sasanian glass in the archaeological record is due to recycling; 2) linear mixing lines on plots which combine compositional and isotopic data suggest that most silver found in the Iron Age hoards of the southern Levant was mixed, with vertical mixing lines indicating that some of it was melted down hastily in times of unrest; and 3) histograms of compositional data provide evidence of recycling accompanied by dilution of cobalt-blue glass in New Kingdom Egypt, potentially because the colourant was not available in later periods, thereby questioning the accepted provenance of the cobalt source. It is considered that application of these approaches can contribute to a better understanding of the motivations behind recycling in prehistory.
... Several stopping criteria are possible, optimising information measures or ensuring significance of the logratios with the Bonferroni correction. All variants allow the researcher to modify the variables entered at a given step from his or her expert knowledge, as incorporated into the selection process in [15,42,35]. ...
... With respect to the last of the above-mentioned options, there are already three published studies [15,42,35] where the expert with domain knowledge has interacted with the statistical algorithm to make choices of LRs from a list of those competing to enter. The idea is to present the expert with the "top 20" LRs, say, in decreasing order of importance in the modelling, that is increasing order of -2logLik. ...
... The possibility to take advantage of the user's judgement in order to select meaningful albeit statistically suboptimal LRs has already been developed for unsupervised learning [17,18]. In supervised learning this can also include forcing non-compositional controls into the model and can be a way out of the limitations of purely data-driven approaches [15,42,35]. The possibility has been shown in the application section by forcing in a part which had been found to be relevant in the previous study by [36]. ...
Article
Full-text available
The common approach to compositional data analysis is to transform the data by means of logratios. Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in many research problems, and include the well-known additive logratios as particular cases. When the number of parts is large (sometimes even larger than the number of cases), some form of logratio selection is a must, for instance by means of an unsupervised learning method based on a stepwise selection of the pairwise logratios that explain the largest percentage of the logratio variance in the compositional dataset. In this article we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that K − 1 selected logratios involve exactly K parts. This method in fact searches for the subcomposition with the highest explanatory power. Once the subcomposition is identified, the researcher's favourite logratio representation may be used in subsequent analyses, not only pairwise logratios. Our methodology allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an illustration of the three approaches on a dataset from a study predicting Crohn's disease. The first method excels in terms of predictive power, and the other two in interpretability.
... Several stopping criteria are possible, optimising information measures or ensuring significance of the logratios with the Bonferroni correction. All variants allow the researcher to modify the variables entered at a given step from his or her expert knowledge, as incorporated into the selection process in [15,42,35]. ...
... With respect to the last of the above-mentioned options, there are already three published studies [15,42,35] where the expert with domain knowledge has interacted with the statistical algorithm to make choices of LRs from a list of those competing to enter. The idea is to present the expert with the "top 20" LRs, say, in decreasing order of importance in the modelling, that is increasing order of -2logLik. ...
... The possibility to take advantage of the user's judgement in order to select meaningful albeit statistically suboptimal LRs has already been developed for unsupervised learning [17,18]. In supervised learning this can also include forcing non-compositional controls into the model and can be a way out of the limitations of purely data-driven approaches [15,42,35]. The possibility has been shown in the application section by forcing in a part which had been found to be relevant in the previous study by [36]. ...
Preprint
Full-text available
The common approach to compositional data analysis is to transform the data by means of logratios. Logratios between pairs of compositional parts (pairwise logratios) are the easiest to interpret in many research problems. When the number of parts is large, some form of logratio selection is a must, for instance by means of an unsupervised learning method based on a stepwise selection of the pairwise logratios that explain the largest percentage of the logratio variance in the compositional dataset. In this article we present three alternative stepwise supervised learning methods to select the pairwise logratios that best explain a dependent variable in a generalized linear model, each geared for a specific problem. The first method features unrestricted search, where any pairwise logratio can be selected. This method has a complex interpretation if some pairs of parts in the logratios overlap, but it leads to the most accurate predictions. The second method restricts parts to occur only once, which makes the corresponding logratios intuitively interpretable. The third method uses additive logratios, so that $K-1$ selected logratios involve exactly $K$ parts. This method in fact searches for the subcomposition with the highest explanatory power. Once the subcomposition is identified, the researcher's favourite logratio representation may be used in subsequent analyses, not only pairwise logratios. Our methodology allows logratios or non-compositional covariates to be forced into the models based on theoretical knowledge, and various stopping criteria are available based on information measures or statistical significance with the Bonferroni correction. We present an illustration of the three approaches on a dataset from a study predicting Crohn's disease. The first method excels in terms of predictive power, and the other two in interpretability.
... The proponents of these complex transformations take isometry as a type of "gold standard" for the analysis of compositional data, and the strict adherence to this mathematical ideal has been to the detriment of using simpler transformations such as the ALRs, or a subset of pairwise logratios. In a series of papers by Greenacre (2019), Graeve and Greenacre (2020), and Wood and Greenacre (2021) it is shown in a variety of contexts that a set of simple pairwise logratios can satisfactorily approximate the logratio geometry, coming sufficiently close to being isometric for all practical purposes. A tiny loss of isometry is thus traded off in favor of the benefit of the simpler and clearer interpretation of the logratio variables. ...
... In short, the goal is to measure the deviation of the ALRtransformed data from the ideal of isometry. This way of measuring the proximity by the Procrustes correlation between two configurations in multidimensional space has already been used to select a subset of pairwise logratios that engenders a Euclidean geometry close to the exact one (Greenacre, 2019;Graeve and Greenacre, 2020;Wood and Greenacre, 2021). This idea was inspired by the selection of variables in PCA by Krzanowski (1987), and the same idea will be used here to select a reference in order to define a set of ALRs. ...
Article
Full-text available
Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc.). These data are generally regarded as compositional since the total number of counts identified within a sample is irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric, that is they do not reproduce the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. On each of three high-dimensional omics datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria. For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9974, and 0.9902, respectively. We thus demonstrate, for high-dimensional compositional data, that additive logratios can provide a valid choice as transformed variables, which (a) are subcompositionally coherent, (b) explain 100% of the total logratio variance and (c) come measurably very close to being isometric. The interpretation of additive logratios is much simpler than the complex isometric alternatives and, when the variance of the log-transformed reference is very low, it is even simpler since each additive logratio can be identified with a corresponding compositional component.
... The proponents of these complex transformations take isometry as a type of "gold standard" for the analysis of compositional data, and the strict adherence to this mathematical ideal has been to the detriment of using simpler transformations such as the ALRs, or a subset of pairwise logratios. In a series of papers (11)(12)(13) it is shown in a variety of contexts that a set of simple pairwise logratios can satisfactorily approximate the logratio geometry, coming close to being isometric for all practical purposes. A small loss of isometry is thus traded off in favour of the benefit of a simpler and clearer interpretation of the logratio variables. ...
... In short, the goal is to measure the deviation of the ALRtransformed data from the ideal of isometry. This way of measuring the proximity between two configurations in multidimensional space by the Procrustes correlation has already been used to select a subset of pairwise logratios that engenders a Euclidean geometry close to the exact one (11)(12)(13). This idea was inspired by the selection of variables in PCA (26), and the same idea will be used here to select a reference in order to define a set of ALRs. ...
Preprint
Full-text available
Background: Microbiome and omics datasets are, by their intrinsic biological nature, of high dimensionality, characterized by counts of large numbers of components (microbial genes, operational taxonomic units, RNA transcripts, etc...). These data are generally regarded as compositional since the total number of counts identified within a sample are irrelevant. The central concept in compositional data analysis is the logratio transformation, the simplest being the additive logratios with respect to a fixed reference component. A full set of additive logratios is not isometric in the sense of reproducing the geometry of all pairwise logratios exactly, but their lack of isometry can be measured by the Procrustes correlation. The reference component can be chosen to maximize the Procrustes correlation between the additive logratio geometry and the exact logratio geometry, and for high-dimensional data there are many potential references. As a secondary criterion, minimizing the variance of the reference component's log-transformed relative abundance values makes the subsequent interpretation of the logratios even easier. Finally, it is preferable that the reference component not be a rare component but well populated, and substantive biological reasons might also guide the choice if several reference candidates are identified. Results: On each of three high-dimensional datasets the additive logratio transformation was performed, using references that were identified according to the abovementioned criteria.For each dataset the compositional data structure was successfully reproduced, that is the additive logratios were very close to being isometric. The Procrustes correlations achieved for these datasets were 0.9991, 0.9977 and 0.9997, respectively. In the third case, where the objective was to distinguish between three groups of samples, the approximation was made to the restricted logratio space of the between-group variance. Conclusions: We show that for high-dimensional compositional data additive logratios can provide a valid choice as transformed variables that are (1) subcompositionally coherent, (2) explaining 100% of the total logratio variance and (3) coming measurably very close to being isometric, that is approximating almost perfectly the exact logratio geometry. The interpretation of additive logratios is simple and, when the variance of the log-transformed reference is very low, it is made even simpler since each additive logratio can be identified with a corresponding compositional component.