Article

The surname structure of Trentino (Italy) and its relationship with dialects and genes

Taylor & Francis
Annals of Human Biology
Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Background Thanks to the availability of rich surname, linguistic and genetic information, together with its geographic and cultural complexity, Trentino (North-Eastern Italy) is an ideal place to test the relationships between genetic and cultural traits. Aim We provide a comprehensive study of population structures based on surname and dialect variability and evaluate their relationships with genetic diversity in Trentino. Subjects and methods Surname data were collected for 363 parishes, linguistic data for 57 dialects and genetic data for different sets of molecular markers (Y-chromosome, mtDNA, autosomal) in 10 populations. Analyses relied on different multivariate methods and correlation tests. Results Besides the expected isolation-by-distance-like patterns (with few local exceptions, likely related to sociocultural instances), we detected a significant and geography-independent association between dialects and surnames. As for molecular markers, only Y-chromosomal STRs seem to be associated with the dialects, although no significant result was obtained. No evidence for correlation between molecular markers and surnames was observed. Conclusion Surnames act as cultural markers as do other words, although in this context they cannot be used as reliable proxies for genetic variability at a local scale.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... The networks obtained for the R1b haplogroup in the Castilla and Spaniards groups indicate that with the present results, it is not possible to determine whether this haplogroup is the founding haplotype/haplogroup of the surname. Other studies of the Spanish general population (Martínez-Cadenas et al. 2016) and Catalan surnames (Solé-Morata et al. 2015) seem to be able to define a minimum number of founders in infrequent surnames, although not in Andalusian surnames, where there is a low correlation between surnames and Y chromosome markers (Calderón et al. 2015) or Italian surnames (Boattini et al. 2021). ...
Article
Full-text available
In most Western European societies, surnames pass from generation to generation and in cases where surnames are shared by fathers to children, the Y chromosome passes down from fathers to male offspring in the same way as surnames do. The aim of this study was to ascertain the patrilineal relationship between individuals with the surname “Castilla” and their respective Y-chromosome haplotypes. The toponymic surname “Castilla” is part of the Spanish royal family. Genealogical studies of this surname have allowed the formulation of different hypotheses about its origin, most of which were centered in Burgos. To shed some light on the origin of the surname Castilla and to investigate the possible co-ancestry behind the living carriers of this surname, markers located in the Y chromosome-specific region were analyzed in a sample of 102 men whose paternal surname was Castilla. The study aimed to establish the minimum number of founders and the expansion time of the lineages from our sample. Two major haplogroups were identified: R1b and E1b1b-M81. The high frequency of the E1b1b-M81 haplogroup in comparison to that of the general Spanish population, its low haplotype diversity, and its young TMRCA (323+/− 255 years CE) are compatible with the historical timing of the obligation to use surnames. However, the coincidence of the most common haplogroup in the Castilla sample and the most frequent haplogroup in the Spanish general population, R1b, makes it difficult to identify founder haplotypes/haplogroups in the history of the Castilla surname.
... The paper by Boattini et al. (2021) performs a comparison between surnames, dialects, and genetics in Trentino, a region in northern Italy that is well-suited to perform this type of study thanks to the large availability of such comparative data. Multivariate methods and correlation tests were implemented and identified a significant association between surnames and dialects in Trentino. ...
Article
Full-text available
As an initial step in more extensive research into the links between biological and cultural diversity in present-day Italy, we reviewed Biocultural Diversity studies that explore the relationship between biological and cultural patterns of diversity to determine whether any direct causal relationships or common drivers could be inferred. We found no significant attempts to quantitatively measure biocultural diversity in the country as a whole. Italy shows a high number of mutual interactions, but common drivers and patterns between biological and cultural diversity were not evident. This could be either a problem of quantification due perhaps to an inherent incommensurability between the two dimensions, or different causative patterns that drive biological and cultural diversity.
Article
Full-text available
Paternity testing using genetic markers has shown that extra-pair paternity (EPP) is common in many pair-bonded species [1, 2]. Evolutionary theory and empirical data show that extra-pair copulations can increase the fitness of males as well as females [3, 4]. This can carry a significant fitness cost for the social father, who then invests in rearing offspring that biologically are not his own [5]. In human populations, the incidence and correlates of extra-pair paternity remain highly contentious [2, 6, 7]. Here, we use a population-level genetic genealogy approach [6, 8] to reconstruct spatiotemporal patterns in human EPP rates. Using patrilineal genealogies from the Low Countries spanning a period of over 500 years and Y chromosome genotyping of living descendants, our analysis reveals that historical EPP rates, while low overall, were strongly impacted by socioeconomic and demographic factors. Specifically, we observe that estimated EPP rates among married couples varied by more than an order of magnitude, from 0.4% to 5.9%, and peaked among families with a low socioeconomic background living in densely populated cities of the late 19th century. Our results support theoretical predictions that social context can strongly affect the outcomes of sexual conflict in human populations by modulating the incentives and opportunities for engaging in extra-pair relationships [9, 10, 11]. These findings show how contemporary genetic data combined with in-depth genealogies open up a new window on the sexual behavior of our ancestors.
Article
Full-text available
The Y-chromosome is a widely studied and useful small part of the genome providing different applications for interdisciplinary research. In many (Western) societies, the Y-chromosome and surnames are paternally co-inherited, suggesting a corresponding Y-haplotype for every namesake. While it has already been observed that this correlation may be disrupted by a false-paternity event, adoption, anonymous sperm donor or the co-founding of surnames, extensive information on the strength of the surname match frequency (SMF) with the Y-chromosome remains rather unknown. For the first time in Belgium and the Netherlands, we were able to study this correlation using 2,401 males genotyped for 46 Y-STRs and 183 Y-SNPs. The SMF was observed to be dependent on the number of Y-STRs analyzed, their mutation rates and the number of Y-STR differences allowed for a kinship. For a perfect match, the Yfiler® Plus and our in-house YForGen kit gave a similar high SMF of 98%, but for non-perfect matches, the latter could overall be identified as the best kit. The SMF generally increased due to less mismatches when encountering (1) deep Y-subhaplogroups, (2) less frequently occurring surnames, and (3) small geographical distances between relatives. This novel information enabled the design of a surname prediction model based on genetic and geographical distances of a kinship. The prediction model has an area under the curve (AUC) of 0.9 and is therefore useable for DNA kinship priority listing in estimation applications like forensic familial searching.
Article
Full-text available
Objective Evolutionary theory has shown that seeking out extrapair paternity (EPP) can be a viable reproductive strategy for both sexes in pair-bonded species, also in humans. As yet, estimates of the contemporary or historical EPP rate in human population are still rare. In the present study, we estimated the historical EPP rate in the Dutch population over the last 400 years and compared the rate with those obtained for other human populations to determine the evolutionary, cultural, and socio-demographic factors that influence human cuckoldry behavior. Methods We estimated the historical EPP rate for the Dutch population via the “genealogical pair method”, in which the EPP rate is derived from Y-chromosome mismatches between pairs of individuals that, based on genealogical evidence, share a common paternal ancestor. Results Based on the analysis of 68 representative genealogical pairs, separated by a total of 1013 fertilization events, we estimated that the historical EPP rate for the Dutch population over the last 400 years was 0.96% per generation (95% confidence interval 0.46%-1.76%). Conclusion The Dutch EPP rate fits perfectly within the range reported for other contemporary and historical populations in Western Europe and was highly congruent with that estimated for neighboring Flanders, despite the socio-economic and religious differences between both populations. The estimated low EPP rate challenges the “dual mating strategy hypothesis” that states that women could obtain fitness benefits by securing investment from one man while cuckolding him to obtain good genes from an affair partner.
Article
Full-text available
Where did pottery first appear in the Old World? Statistical modelling of radiocarbon dates suggests that ceramic vessel technology had independent origins in two different hunter-gatherer societies. Regression models were used to estimate average rates of spread and geographic dispersal of the new technology. The models confirm independent origins in East Asia (c. 16000 cal BP) and North Africa (c. 12000 cal BP). The North African tradition may have later influenced the emergence of Near Eastern pottery, which then flowed west into Mediterranean Europe as part of a Western Neolithic, closely associated with the uptake of farming.
Article
Full-text available
In most societies, surnames are passed down from fathers to sons, just like the Y chromosome. It follows that, theoretically, men sharing the same surnames would also be expected to share related Y chromosomes. Previous investigations have explored such relationships, but so far, the only detailed studies that have been conducted are on samples from the British Isles. In order to provide additional insights into the correlation between surnames and Y chromosomes, we focused on the Spanish population by analysing Y chromosomes from 2121 male volunteers representing 37 surnames. The results suggest that the degree of coancestry within Spanish surnames is highly dependent on surname frequency, in overall agreement with British but not Irish surname studies. Furthermore, a reanalysis of comparative data for all three populations showed that Irish surnames have much greater and older surname descent clusters than Spanish and British ones, suggesting that Irish surnames may have considerably earlier origins than Spanish or British ones. Overall, despite closer geographical ties between Ireland and Britain, our analysis points to substantial similarities in surname origin and development between Britain and Spain, while possibly hinting at unique demographic or social events shaping Irish surname foundation and development.European Journal of Human Genetics advance online publication, 22 April 2015; doi:10.1038/ejhg.2015.75.
Article
Full-text available
This report describes software written to facilitate the compilation and analysis of fishery data, particularly data referenced by spatial coordinates. Our research stems from experiences with information on Canada?s Pacific groundfish fisheries compiled at the Pacific Biological Station (PBS). Despite its origins in fishery data analysis, our software has broad applicability. The library PBS Mapping extends the languages R and S-PLUS to include two-dimensional plotting features similar to those commonly available in a Geographic Information System (GIS). Embedded C code speeds algorithms from computational geometry, such as finding polygons that contain specified point events or converting between longitude-latitude and Universal Transverse Mercator (UTM) coordinates. We also present a number of convenient utilities for the Microsoft Windows operating systems, including commands that support computational geometry outside the framework of R or S-PLUS. Tools to construct most of our software come freely from the Internet, as documented here in a guide to the packages available. Furthermore, we provide quick tutorials that address key technical issues relevant to our work, such as embedding C code into an R package and writing documentation that meets the R standard. Our results, which depend significantly on the work of students, illustrate the convergence of goals between academic training and applied research. CRAN: https://cran.r-project.org/web/packages/PBSmapping/index.html GitHub: https://github.com/pbs-software/pbs-mapping
Article
Full-text available
Social and cultural factors had a critical role in determining the genetic structure of Europe. Therefore, socially stratified populations may help to focus on specific episodes of European demographic history. In this study, we use uniparental markers to analyse the genetic structure of Partecipanza in San Giovanni in Persiceto (Northern Italy), a peculiar institution whose origins date back to the Middle Ages and whose members form the patrilineal descent of a group of founder families. From a maternal point of view (mtDNA), Partecipanza is genetically homogeneous with the rest of the population. However, we observed a significant differentiation for Y-chromosomes. In addition, by comparing 17 Y-STR profiles with deep-rooted paternal pedigrees, we estimated a Y-STR mutation rate equal to 3.90 * 10(-3) mutations per STR per generation and an average generation duration time of 33.38 years. When we used these values for tentative dating, we estimated 1300-600 years ago for the origins of the Partecipanza. These results, together with a peculiar Y-chromosomal composition and historical evidence, suggest that Germanic populations (Lombards in particular) settled in the area during the Migration Period (400-800 AD, approximately) and may have had an important role in the foundation of this community.Heredity advance online publication, 10 September 2014; doi:10.1038/hdy.2014.77.
Article
Full-text available
The animal and plant biodiversity of the Italian territory is known to be one of the richest in the Mediterranean basin and Europe as a whole, but does the genetic diversity of extant human populations show a comparable pattern? According to a number of studies, the genetic structure of Italian populations retains the signatures of complex peopling processes which took place from the Paleolithic to modern era. Although the observed patterns highlight a remarkable degree of genetic heterogeneity, they do not, however, take into account an important source of variation. In fact, Italy is home to numerous ethnolinguistic minorities which have yet to be studied systematically. Due to their difference in geographical origin and demographic history, such groups not only signal the cultural and social diversity of our country, but they are also potential contributors to its bio-anthropological heterogeneity. To fill this gap, research groups from four Italian Universities (Bologna, Cagliari, Pisa and Roma Sapienza) started a collaborative study in 2007, which was funded by the Italian Ministry of Education, University and Research and received partial support by the Istituto Italiano di Antropologia. In this paper, we present an account of the results obtained in the course of this initiative. Four case-studies relative to linguistic minorities from the Eastern Alps, Sardinia, Apennines and Southern Italy are first described and discussed, focusing on their micro-evolutionary and anthropological implications. Thereafter, we present the results of a systematic analysis of the relations between linguistic, geographic and genetic isolation. Integrating the data obtained in the course of the long-term study with literature and unpublished results on Italian populations, we show that a combination of linguistic and geographic factors is probably responsible for the presence of the most robust signatures of genetic isolation. Finally, we evaluate the magnitude of the diversity of Italian populations in the European context. The human genetic diversity of our country was found to be greater than observed throughout the continent at short (0-200 km) and intermediate (700-800km) distances, and accounted for most of the highest values of genetic distances observed at all geographic ranges. Interestingly, an important contribution to this pattern comes from the “linguistic islands” (e.g. German speaking groups of Sappada and Luserna from the Eastern Italian Alps), further proof of the importance of considering social and cultural factors when studying human genetic variation.
Article
Full-text available
Great European mountain ranges have acted as barriers to gene flow for resident populations since prehistory and have offered a place for the settlement of small, and sometimes culturally diverse, communities. Therefore, the human groups that have settled in these areas are worth exploring as an important potential source of diversity in the genetic structure of European populations. In this study, we present new high resolution data concerning Y chromosomal variation in three distinct Alpine ethno-linguistic groups, Italian, Ladin and German. Combining unpublished and literature data on Y chromosome and mitochondrial variation, we were able to detect different genetic patterns. In fact, within and among population diversity values observed vary across linguistic groups, with German and Italian speakers at the two extremes, and seem to reflect their different demographic histories. Using simulations we inferred that the joint effect of continued genetic isolation and reduced founding group size may explain the apportionment of genetic diversity observed in all groups. Extending the analysis to other continental populations, we observed that the genetic differentiation of Ladins and German speakers from Europeans is comparable or even greater to that observed for well known outliers like Sardinian and Basques. Finally, we found that in south Tyroleans, the social practice of Geschlossener Hof, a hereditary norm which might have favored male dispersal, coincides with a significant intra-group diversity for mtDNA but not for Y chromosome, a genetic pattern which is opposite to those expected among patrilocal populations. Together with previous evidence regarding the possible effects of "local ethnicity" on the genetic structure of German speakers that have settled in the eastern Italian Alps, this finding suggests that taking socio-cultural factors into account together with geographical variables and linguistic diversity may help unveil some yet to be understood aspects of the genetic structure of European populations.
Article
Full-text available
In the present study, we show how, through time, an ethnic mosaic and a changing social and economic context translated into intrapopulation differentiation and a change in genetic barriers between populations. Surname analysis was applied to a sample drawn from two centuries of marriage records in ten Arbereshe and nine Italian villages of southern Italy to evaluate the evolution of internal differentiation and changes in genetic relationships between populations. Marital Isonymy and subdivision into subpopulations was higher in the Arbereshe. Genetic barriers coinciding with ethnic boundaries characterized the 1800s. In the second half of the 1900s, ethnic differentiation disappeared. We hypothesize that socioeconomic changes, such as increased outmigration and regional mobility, were the forces that progressively eliminated the ethnic-related genetic differentiation in the region. This study has important implications for an understanding of the relationship between genetic evolution and the cultural milieu involving enforcement of ethnic differences.
Article
Full-text available
Abstract We analyze the geographic location of 77,451 different Italian surnames (17,579,891 individuals) obtained from the lists of telephone subscribers of the year 1993. By using a specific neural network analysis (Self-Organizing Maps, SOMs), we automatically identify the geographic origin of 49,117 different surnames. To validate the methodology, we compare the results to a study, previously conducted, on the same database, with accurate supervised methods. By comparing the results, we find an overlap of 97%, meaning that the SOMs methodology is highly reliable and well traces back the geographic origin of surnames at the time of their introduction (Late Middle Ages/Renaissance in Italy). SOMs results enables one to distinguish monophyletic surnames from polyphyletic ones, that is surnames having had a single geographic and historic origin from those that started to be in use, with an identical spelling, in different locations (respectively, 76.06% and 21.05% of the total). As we are interested in geographic origins, polyphyletic surnames are excluded from further analyses. By comparing the present location of each monophyletic surname to its inferred geographic origin in late Middle Ages/Renaissance, we measure the extent of the migrations having occurred in Italy since that time. We find that the percentage of individuals presently living in the very area where their surname started to be in use centuries ago is extremely variable (ranging from 22.77% to 77.86% according to the province), thus meaning that self-assessed regional identities seldom correspond to the "autochthony" they imply. For example the upper part of the Thyrennian coast (Northern Latium, Tuscany) has a strong identity but few "autochthonous" inhabitants (∼28%) having been a passageway from the North to the South of Italy.
Article
Full-text available
A recent workshop entitled "The Family Name as Socio-Cultural Feature and Genetic Metaphor: From Concepts to Methods" was held in Paris in December 2010, sponsored by the French National Centre for Scientific Research (CNRS) and by the journal Human Biology. This workshop was intended to foster a debate on questions related to the family names and to compare different multidisciplinary approaches involving geneticists, historians, geographers, sociologists and social anthropologists. This collective paper presents a collection of selected communications.
Article
Full-text available
Our focus in this paper is the analysis of surnames, which have been proven to be reliable genetic markers because in patrilineal systems they are transmitted along generations virtually unchanged, similarly to a genetic locus on the Y chromosome. We compare the distribution of surnames to the distribution of dialect pronunciations, which are clearly culturally transmitted. Because surnames, at the time of their introduction, were words subject to the same linguistic processes that otherwise result in dialect differences, one might expect their geographic distribution to be correlated with dialect pronunciation differences. In this paper we concentrate on the Netherlands, an area of only 40,000 km2, where two official languages are spoken, Dutch and Frisian. We analyze 19,910 different surnames, sampled in 226 locations, and 125 different words, whose pronunciation was recorded in 252 sites. We find that, once the collinear effects of geography on both surname and cultural transmission are taken into account, there is no statistically significant association between the two, suggesting that surnames cannot be taken as a proxy for dialect variation, even though they can be safely used as a proxy for Y-chromosome genetic variation. We find the results historically and geographically insightful, hopefully leading to a deeper understanding of the role that local migrations and cultural diffusion play in surname and dialect diversity.
Article
Full-text available
Historical inference is at its most powerful when independent lines of evidence can be integrated into a coherent account. Dating linguistic and cultural lineages can potentially play a vital role in the integration of evidence from linguistics, anthropology, archaeology and genetics. Unfortunately, although the comparative method in historical linguistics can provide a relative chronology, it cannot provide absolute date estimates and an alternative approach, called glottochronology, is fundamentally flawed. In this paper we outline how computational phylogenetic methods can reliably estimate language divergence dates and thus help resolve long-standing debates about human prehistory ranging from the origin of the Indo-European language family to the peopling of the Pacific.
Article
Full-text available
Most heritable surnames, like Y chromosomes, are passed from father to son. These unique cultural markers of coancestry might therefore have a genetic correlate in shared Y chromosome types among men sharing surnames, although the link could be affected by mutation, multiple foundation for names, nonpaternity, and genetic drift. Here, we demonstrate through an analysis of 1,678 Y-chromosomal haplotypes within 40 British surnames a remarkably high degree of coancestry that generally increases as surnames become rarer. On average, the proportion of haplotypes lying within descent clusters is 62% but ranges from 0% to 87%. The shallow time depth of many descent clusters within names, the lack of a detectable effect of surname derivation on diversity, and simulations of surname descent suggest that genetic drift through variation in reproductive success is important in structuring haplotype diversity. Modern patterns therefore provide little reliable information about the original founders of surnames some 700 years ago. A comparative analysis of published data on Y diversity within Irish surnames demonstrates a relative lack of surname frequency dependence of coancestry, a difference probably mediated through distinct Irish and British demographic histories including even more marked genetic drift in Ireland.
Article
Full-text available
Genome-wide data provide a powerful tool for inferring patterns of genetic variation and structure of human populations. In this study, we analysed almost 250,000 SNPs from a total of 945 samples from Eastern and Western Finland, Sweden, Northern Germany and Great Britain complemented with HapMap data. Small but statistically significant differences were observed between the European populations (F(ST) = 0.0040, p<10(-4)), also between Eastern and Western Finland (F(ST) = 0.0032, p<10(-3)). The latter indicated the existence of a relatively strong autosomal substructure within the country, similar to that observed earlier with smaller numbers of markers. The Germans and British were less differentiated than the Swedes, Western Finns and especially the Eastern Finns who also showed other signs of genetic drift. This is likely caused by the later founding of the northern populations, together with subsequent founder and bottleneck effects, and a smaller population size. Furthermore, our data suggest a small eastern contribution among the Finns, consistent with the historical and linguistic background of the population. Our results warn against a priori assumptions of homogeneity among Finns and other seemingly isolated populations. Thus, in association studies in such populations, additional caution for population structure may be necessary. Our results illustrate that population history is often important for patterns of genetic variation, and that the analysis of hundreds of thousands of SNPs provides high resolution also for population genetics.
Article
Full-text available
The genetic information for this work came from a very large collection of gene frequencies for "classical" (non-DNA) polymorphisms of the world aborigines. The data were grouped in 42 populations studied for 120 alleles. The reconstruction of human evolutionary history thus generated was checked with statistical techniques such as "boot-strapping". It changes some earlier conclusions and is in agreement with more recent ones, including published and unpublished DNA-marker results. The first split in the phylogenetic tree separates Africans from non-Africans, and the second separates two major clusters, one corresponding to Caucasoids, East Asians, Arctic populations, and American natives, and the other to Southeast Asians (mainland and insular), Pacific islanders, and New Guineans and Australians. Average genetic distances between the most important clusters are proportional to archaeological separation times. Linguistic families correspond to groups of populations with very few, easily understood overlaps, and their origin can be given a time frame. Linguistic superfamilies show remarkable correspondence with the two major clusters, indicating considerable parallelism between genetic and linguistic evolution. The latest step in language development may have been an important factor determining the rapid expansion that followed the appearance of modern humans and the demise of Neanderthals.
Article
Full-text available
Languages, like genes, provide vital clues about human history. The origin of the Indo-European language family is "the most intensively studied, yet still most recalcitrant, problem of historical linguistics". Numerous genetic studies of Indo-European origins have also produced inconclusive results. Here we analyse linguistic data using computational methods derived from evolutionary biology. We test two theories of Indo-European origin: the 'Kurgan expansion' and the 'Anatolian farming' hypotheses. The Kurgan theory centres on possible archaeological evidence for an expansion into Europe and the Near East by Kurgan horsemen beginning in the sixth millennium BP. In contrast, the Anatolian theory claims that Indo-European languages expanded with the spread of agriculture from Anatolia around 8,000-9,500 years bp. In striking agreement with the Anatolian hypothesis, our analysis of a matrix of 87 languages with 2,449 lexical items produced an estimated age range for the initial Indo-European divergence of between 7,800 and 9,800 years bp. These results were robust to changes in coding procedures, calibration points, rooting of the trees and priors in the bayesian analysis.
Article
Full-text available
Equilibrium models of isolation by distance predict an increase in genetic differentiation with geographic distance. Here we find a linear relationship between genetic and geographic distance in a worldwide sample of human populations, with major deviations from the fitted line explicable by admixture or extreme isolation. A close relationship is shown to exist between the correlation of geographic distance and genetic differentiation (as measured by F ST) and the geographic pattern of heterozygosity across populations. Considering a worldwide set of geographic locations as possible sources of the human expansion, we find that heterozygosities in the globally distributed populations of the data set are best explained by an expansion originating in Africa and that no geographic origin outside of Africa accounts as well for the observed patterns of genetic diversity. Although the relationship between F ST and geographic distance has been interpreted in the past as the result of an equilibrium model of drift and dispersal, simulation shows that the geographic pattern of heterozygosities in this data set is consistent with a model of a serial founder effect starting at a single origin. Given this serial-founder scenario, the relationship between genetic and geographic distance allows us to derive bounds for the effects of drift and natural selection on human genetic variation. • genetic distance • genetic drift • HGDP-CEPH • human origins • microsatellites
Article
Background. Southern Italy and Sicily played a key role in the peopling history of the Mediterranean. While genetic research showed the remarkable homogeneity of these regions, surname-based studies instead suggested low population mobility, hence potential structuring. Aim. In order to better understand these different patterns, we (1) thoroughly analysed the surname structure of Sicily and Southern Italy, and (2) tested its relationships with a wide set of molecular markers. Subjects and methods. Surname data were collected from 1,213 municipalities and compared to uniparental and autosomal genetic markers typed in ∼300 individuals from 8 to 10 populations. Surname analyses were performed using different multivariate methods, while comparisons with genetic data relied on correlation tests. Results. Surnames were clearly structured according to regional geographic patterns, which likely emerged because of recent isolation-by-distance-like population dynamics. In general, genetic markers, hinting at a pervasive homogeneity, did not correlate with surname distribution. However, long autosomal haplotypes (> 5 cM) that compared to genotypic (SNPs) data identify more ‘recent’ relatedness, showed a clear association with surname patterns. Conclusion. The apparent contradiction between surname structure and genetic homogeneity was resolved by figuring surnames as recent ‘ripples’ deposited on a vast and ancient homogeneous genetic ‘surface’.
Article
Significance This paper presents unprecedented evidence on the transmission mechanism underlying the spread of a broad cross-cultural assemblage of folktales in Eurasia and Africa. State-of-the-art genomic evidence is used to directly assess the relevance of demic diffusion processes, in particular on the distribution of Old World folktales at intermediate geographic scales, and identify individual stories that are more likely to be transmitted through population movement and replacement. The results provide an empirical solution to operate with linguistic barriers and highlight the impossibility of disentangling genetic from geographic relationships at a cross-continental scale, warning against the direct use of extant genetic variability to infer processes of long-range cultural transmission.
Article
It is shown that for allele frequency data a useful measure of the extent of gene flow between a pair of populations is M∘=(1/FST-1)/4, which is the estimated level of gene flow in an island model at equilibrium. For DNA sequence data, the same formula can be used if FST is replaced by NST . In a population with restricted dispersal, analytic theory shows that there is a simple relationship between M̂ and geographic distance in both equilibrium and non-equilibrium populations and that this relationship is approximately independent of mutation rate when the mutation rate is small. Simulation results show that with reasonable sample sizes, isolation by distance can indeed be detected and that, at least in some cases, non-equilibrium patterns can be distinguished. This approach to analyzing isolation by distance is used for two allozyme data sets, one from gulls and one from pocket gophers.
Book
A guide to using S environments to perform statistical analyses providing both an introduction to the use of S and a course in modern statistical methods. The emphasis is on presenting practical problems and full analyses of real data sets.
Article
Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing; an individual’s DNA can be used to infer their geographic origin with surprising accuracy—often to within a few hundred kilometres.
Chapter
It has always been obvious that organisms vary, even to those pre-Darwinian idealists who saw most individual variation as distorted shadows of an ideal. It has been equally apparent, even to those post-Darwinians for whom variation between individuals is the central fact of evolutionary dynamics, that variation is nodal, that individuals fall in clusters in the space of phenotypic description, and that those clusters, which we call demes, or races, or species, are the outcome of an evolutionary process acting on the individual variation. What has changed during the evolution of scientific thought, and is still changing, is our perception of the relative importance and extent of intragroup as opposed to intergroup variation. These changes have been in part a reflection of the uncovering of new biological facts, but only in part. They have also reflected general sociopolitical biases derived from human social experience and carried over into “scientific” realms. I have discussed elsewhere (Lewontin, 1968) long-term trends in evolutionary doctrine as a reflection of long-term changes in socioeconomic relations, but even in the present era of Darwinism there is considerable diversity of opinion about the amount or importance of intragroup variation as opposed to the variation between races and species. Muller, for example (1950), maintained that for sexually reproducing species, man in particular, there was very little genetic variation within populations and that most men were homozygous for wild-type genes at virtually all their loci.
Article
The biological behavior of the Y chromosome, which is paternally inherited, implies that males sharing the same surname may also share a similar Y chromosome. However, socio-cultural factors, such as polyphyletism, non-paternity, adoption, or matrilineal surname transmission, may prevent the joint transmission of the surname and the Y chromosome. By genotyping 17 Y-STRs and 68 SNPs in ~2500 male samples that each carried one of the 50 selected Catalan surnames, we could determine sets of descendants of a common ancestor, the population of origin of the common ancestor, and the date when such a common ancestor lived. Haplotype diversity was positively correlated with surname frequency, that is, rarer surnames showed the strongest signals of coancestry. Introgression rates of Y chromosomes into a surname by non-paternity, adoption, and transmission of the maternal surname were estimated at 1.5-2.6% per generation, with some local variation. Average ages for the founders of the surnames were estimated at ~500 years, suggesting a delay between the origin of surnames (twelfth and thirteenth centuries) and the systematization of their paternal transmission. We have found that, in general, a foreign etymology for a surname does not often result in a non-indigenous origin of surname founders; however, bearers of some surnames with an Arabic etymology show an excess of North African haplotypes. Finally, we estimate that surname prediction from a Y-chromosome haplotype, which may have interesting forensic applications, has a ~60% sensitivity but a 17% false discovery rate.European Journal of Human Genetics advance online publication, 18 February 2015; doi:10.1038/ejhg.2015.14.
Article
Cluster analysis is the automated search for groups of related observations in a dataset. Most clustering done in practice is based largely on heuristic but intuitively reasonable procedures, and most clustering methods available in commercial software are also of this type. However, there is little systematic guidance associated with these methods for solving important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled. We review a general methodology for model-based clustering that provides a principled statistical approach to these issues. We also show that this can be useful for other problems in multivariate analysis, such as discriminant analysis and multivariate density estimation. We give examples from medical diagnosis, minefield detection, cluster recovery from noisy data, and spatial density estimation. Finally, we mention limitations of the methodology and discuss recent developments in model-based clustering for non-Gaussian data, high-dimensional datasets, large datasets, and Bayesian estimation.
Article
Distance correlation is extended to the problem of testing the independence of random vectors in high dimension. Distance correlation characterizes independence and determines a test of multivariate independence for random vectors in arbitrary dimension. In this work, a modified distance correlation statistic is proposed, such that under independence the distribution of a transformation of the statistic converges to Student t, as dimension tends to infinity. Thus we obtain a distance correlation tt-test for independence of random vectors in arbitrarily high dimension, applicable under standard conditions on the coordinates that ensure the validity of certain limit theorems. This new test is based on an unbiased estimator of distance covariance, and the resulting tt-test is unbiased for every sample size greater than three and all significance levels. The transformed statistic is approximately normal under independence for sample size greater than nine, providing an informative sample coefficient that is easily interpretable for high dimensional data.
Article
MCLUST is a contributed R package for normal mixture modeling and model-based clustering. It provides functions for parameter estimation via the EM algorithm for normal mixture models with a variety of covariance structures, and functions for simulation from these models. Also included are functions that combine model-based hierarchical clustering, EM for mixture estimation and the Bayesian Information Criterion (BIC) in comprehensive strategies for clustering, density estimation and discriminant analysis. There is additional functionality for displaying and visualizing the models along with clustering and classification results. A number of features of the software have been changed in this version, and the functionality has been expanded to include regularization for normal mixture models via a Bayesian prior. MCLUST is licensed by the University of Washington and distributed through
Article
The study of geographically and/or linguistically isolated populations could represent a potential area of interaction between population and forensic genetics. These investigations may be useful to evaluate the suitability of loci which have been selected using forensic criteria for bio-anthropological studies. At the same time, they give us an opportunity to evaluate the efficiency of forensic tools for parentage testing in groups with peculiar allele frequency profiles. Within the frame of a long-term project concerning Italian linguistic isolates, we studied 15 microsatellite loci (Identifiler kit) comprising the CODIS panel in 11 populations from the north-eastern Italian Alps (Veneto, Trentino and Friuli Venezia Giulia regions). All our analyses of inter-population differentiation highlight the genetic distinctiveness of most Alpine populations comparing them either to each other or with large and non-isolated Italian populations. Interestingly, we brought to light some aspects of population genetic structure which cannot be detected using unilinear polymorphisms. In fact, the analysis of genotypic disequilibrium between loci detected signals of population substructure when all the individuals of Alpine populations are pooled in a single group. Furthermore, despite the relatively low number of loci analyzed, genetic differentiation among Alpine populations was detected at individual level using a Bayesian method to cluster multilocus genotypes. Among the various populations studied, the four linguistic minorities (Fassa Valley, Luserna, Sappada and Sauris) showed the most pronounced diversity and signatures of a peculiar genetic ancestry. Finally, we show that database replacement may affect estimates of probability of paternity even when the local database is replaced by another based on populations which share a common genetic background but which differ in their demographic history. These findings point to the importance of considering the demographic and cultural profile of populations in forensic applications, even in a context of substantial genetic homogeneity such as that of European populations.
Article
Although essential for the fine-scale reconstruction of genetic structure, only a few micro-geographic studies have been carried out in European populations. This study analyzes mitochondrial variation (651 bp of the hypervariable region plus 17 single-nucleotide polymorphisms) in 393 samples from nine populations from Trentino (Eastern Italian Alps), a small area characterized by a complex geography and high linguistic diversity. A high level of genetic variation, comparable to geographically dispersed European groups, was observed. We found a difference in the intensity of peopling processes between two longitudinal areas, as populations from the west-central part of the region show stronger signatures of expansion, whereas those from the eastern area are closer to the expectations of a stationary demographic state. This may be explained by geomorphological factors and is also supported by archeological data. Finally, our results reveal a striking difference in the way in which the two linguistically isolated populations are genetically related to the neighboring groups. The Ladin speakers were found to be genetically close to the Italian-speaking populations and differentiated from the other Dolomitic Ladins, whereas the German-speaking Cimbri behave as an outlier, showing signatures of founder effects and low growth rate.
Article
This article presents a probability table for the evaluation of stress values generated by multidimensional scaling (MDS) procedures employing stress formula 1. This table is based on the probability distribution of stress values from 587,200 random similarity matrices of different sizes processed to yield results for several dimensions.
Article
The choice of criteria for correct DNA sampling in isolated populations is often affected by ambiguities, despite its importance in medical and anthropological genetics. We propose a novel biodemographic approach to the study of isolates based on surname analysis and migration matrices, and we apply it to a candidate isolated population: the Val di Scalve (Italian Pre-Alps). Kinship matrices and self-organizing maps (SOMs) were applied to information extracted from 2870 marriage records relative to the years 1866-1935. The Val di Scalve shows the typical genetic trademarks of an isolate at least up to the first half of the 20th century. Furthermore, the area was characterized by differential mobility patterns between males and females, consistent with the virilocal migration model. These data suggest reliable criteria for an efficient DNA sampling design by (a) detecting the units of analysis to be investigated (internal population subdivisions); (b) maximizing the number of paternal lineages in the sample for Y-chromosome studies (surnames); and (c) calculating the most convenient sample size. The surname-based sampling procedure can be exported and applied to larger and non-isolated populations.
Article
How do biological, psychological, sociological, and cultural factors combine to change societies over the long run? Boyd and Richerson explore how genetic and cultural factors interact, under the influence of evolutionary forces, to produce the diversity we see in human cultures. Using methods developed by population biologists, they propose a theory of cultural evolution that is an original and fair-minded alternative to the sociobiology debate.
Article
Heritable surnames are highly diverse cultural markers of coancestry in human populations. A patrilineal surname is inherited in the same way as the non-recombining region of the Y chromosome and there should, therefore, be a correlation between the two. Studies of Y haplotypes within surnames, mostly of the British Isles, reveal high levels of coancestry among surname cohorts and the influence of confounding factors, including multiple founders for names, non-paternities and genetic drift. Combining molecular genetics and surname analysis illuminates population structure and history, has potential applications in forensic studies and, in the form of 'genetic genealogy', is an area of rapidly growing interest for the public.
Article
In the fifteenth century, after the Turkish conquest of the Balkan area, Albanian communities migrated to Southern Italy. I investigated temporal trends in isolation from 1820 to 1982 in one of these communities, the population of S. Paolo Albanese, Basilicata, which still uses the original language and religious rites. Marital structure is characterized by a high average frequency of village endogamy (75.2%). Among the exogamous marriages there is a preference for mates from Italo-Albanian settlements, with higher values in the 1800s. The distribution of marital distances reflects the positive assortative mating by ethnic community. The mean frequency of isonymous marriages was 9.01% from 1820 to 1982. These results indicate that total inbreeding from isonymy is a reliable indicator of isolation, showing temporal trends related to changes in endogamy. Fr accounts for the greater percentage of Ft in relation to the small population size and regularly decreases with time. The breakdown of isolation, as documented by the decrease in population size, endogamy, and inbreeding, is a recent feature (since 1960).
Article
Data on recalled month of menarche and month of birth have been collected in a sample of 1505 secondary-school students resident in three provinces of North Italy, to examine the monthly distribution of menarche and the relationship between season of birth, season of menarche and age at menarche. Menarche occurrence showed peaks of frequency in January and July-September, and troughs in October-November and February-May. This pattern was consistent with those reported in other Italian areas. Cases of coincidence between month of menarche and month of birth were significantly more frequent than expected at random. Mean age at menarche varied significantly according to the season at which menarche occurred, and a different pattern in the monthly distribution of menarche has been shown in early- and late-maturing girls. The distribution pattern of menarche occurrence observed in North Italy appears to be related to the rhythm of schoolwork activity. The possible influence of psychosocial stress on the trends in evidence is discussed.
Article
The study is part of a research project on the marital structure of mountain populations from the Eastern Italian Alps. Little is known about marriage patterns in this Alpine area. The aim of the study is to evaluate the extent of reproductive isolation in some communities of the Non Valley (Trentino, Italy) and to investigate its microgeographic and temporal changes over the period 1825-1923. 4518 microfilmed marriage records from registers of seven parishes of the Non Valley were used to analyse the following: endogamy rate, inbreeding calculated both from dispensations and from isonymy, repeating pairs of surnames in marriages, isonymic relationships. The results show notable variability among parishes in the levels of endogamy (40-73%), inbreeding (alpha: 1.9-4.57; Ft: 0.0073-0.019) and subdivision (RPr/RP: 0.5-1.3). The values are relatively stable over the course of a century, apart from a rise in inbreeding indicated by dispensations and a slight decrease of endogamy at the beginning of the 20th century. Isonymic relationships reflect geographic proximity between populations, with minimum changes through time. Variations in the level of reproductive isolation within the Non Valley are consistent with the different geographic characteristics and population sizes of the settlements. Comparison with data obtained from previous studies in the Eastern Italian Alps shows that the values of the investigated biodemographic indicators are in line with the geography and altitude of the area. The slight differences in temporal trend of endogamy and inbreeding can be correlated with different migration patterns.
Article
Marital structure and inbreeding coefficients were analyzed in La Cabrera, an isolated mountain region in northwestern Spain. A total of 5,714 marriages were celebrated from 1880 to 1989 in the 37 parishes of the area. The total frequency of consanguineous marriages (up to the fourth degree) is 23.05%; multiple consanguineous marriages are remarkably common, reaching 5.43% of the total. The first cousin/second cousin ratio (referred to as kinship-type frequencies) is 0.43. The inbreeding values are the highest recorded in Spain and in Europe: alpha3 is 4.82 x 10(-3) for the whole period and alpha4 is 6.78 x 10(-3) for 1880--1919. The temporal trend of inbreeding shows high values (alpha3 > 4.5 x 10(-3)) for a particularly long period (1900--1959) and a rapid decline from 1960 onward. This historical inbreeding trend is clearly related to changes in population size. The frequencies of multiple consanguineous marriages and the analysis of isonymy show that the inbreeding structure is related to geographic and demographic factors. Comparing the results at two hierarchical levels (La Cabrera as a whole and the 37 parishes individually), we conclude that the inbreeding values are affected by internal geographic subdivision of the population (Wahlund effect). Social and cultural factors, such as avoidance of or preference for consanguineous marriages, are less important but depend on the kinship type involved.
Article
Several studies showed that surnames are good markers to infer patrilineal genetic structures of populations, both on regional and microregional scales. As a case study, the spatial patterns of the 9,929 most common surnames of the Netherlands were analyzed by a clustering method called self-organizing maps (SOMs). The resulting clusters grouped surnames with a similar geographic distribution and origin. The analysis was shown to be in agreement with already known features of Dutch surnames, such as 1) the geographic distribution of some well-known locative suffixes, 2) historical census data, 3) the distribution of foreign surnames, and 4) polyphyletic surnames. Thus, these results validate the SOM clustering of surnames, and allow for the generalization of the technique. This method can be applied as a new strategy for a better Y-chromosome sampling design in retrospective population genetics studies, since the idenfication of surnames with a defined geographic origin enables the selection of the living descendants of those families settled, centuries ago, in a given area. In other words, it becomes possible to virtually sample the population as it was when surnames started to be in use. We show that, in a given location, the descendants of those individuals who inhabited the area at the time of origin of surnames can be as low as approximately 20%. This finding suggests 1) the major role played by recent migrations that are likely to have distorted or even defaced ancient genetic patterns, and 2) that standard-designed samplings can hardly portray a reliable picture of the ancient Y-chromosome variability of European populations.
Article
Ireland has one of the oldest systems of patrilineal hereditary surnames in the world. Using the paternal co-inheritance of Y-chromosome DNA and Irish surnames, we examined the extent to which modern surname groups share a common male-line ancestor and the general applicability of Y-chromosomes in uncovering surname origins and histories. DNA samples were collected from 1,125 men, bearing 43 different surnames, and each was genotyped for 17 Y-chromosome short tandem repeat (STR) loci. A highly significant proportion of the observed Y-chromosome diversity was found between surnames demonstrating their demarcation of real and recent patrilineal kinship. On average, a man has a 30-fold increased chance of sharing a 17 STR Y-chromosome haplotype with another man of the same surname but the extent of congruence between the surname and haplotype varies widely between surnames and we attributed this to differences in the number of early founders. Some surnames such as O'Sullivan and Ryan have a single major ancestor, whereas others like Murphy and Kelly have numerous founders probably explaining their high frequency today. Notwithstanding differences in their early origins, all surnames have been extensively affected by later male introgession. None examined showed more than about half of current bearers still descended from one original founder indicating dynamic and continuously evolving kinship groupings. Precisely because of this otherwise cryptic complexity there is a substantial role for the Y-chromosome and a molecular genealogical approach to complement and expand existing sources.
Article
Biodemographic methods are widely used to infer the genetic structure of human populations. In this study, we revise and standardize the procedures required by the migration matrix model of Malécot ([1950] Ann Univ Lyon Sci [A] 13:37-60), testing it in large historical-demographic databases of 85 populations from three mountain valleys with different degrees of isolation: Val di Lima (Italian Apennines, 21 parishes), Val di Sole, (Italian Alps, 27 parishes), and La Cabrera (Spain, 37 parishes). An add-on package (Biodem) for the R program is proposed to perform all calculations. Results from migration matrices are compared with those obtained from isonymic relationships. Migration and isonymy matrices are derived from 22,781 marriage records. Matrices are analyzed using a nonlinear isolation-by-distance (IBD) model and multivariate techniques (multidimensional scaling, Procrustes rotation, and cluster analysis). Microdifferentiation levels (F(ST)) from the migration data agree with the observed inbreeding values: higher values are found in La Cabrera (F(ST) = 0.0082), the most isolated population; Val di Lima (F(ST) = 0.0015) and Val di Sole (F(ST) = 0.0012) have lower values due to the larger parish population sizes and greater mobility. Temporal changes of F(ST) and IBD are analyzed using the migration matrix approach. The populations show a marked decline in F(ST) values in time, together with increased population mobility and emigration rates. In all three valleys, marital migration and isonymy yield similar results, suggesting that geographic distance is the most important factor structuring the populations. However, isonymy shows a lower correlation with geographic distance than migration matrices do. This difference can be attributed to the differing sensitivity of the methods for past migration events, and to genetic drift.