Sites linked to a balanced trans-species polymorphism have unusual genealogies. (A) The trees are ordered by the distance from the selected site. Blue and red lines represent lineages from the two allelic classes, respectively. (B) The state of each segment in ten simulation replicates. Each bar represents a 10 kb region centered on a trans-species balanced polymorphism (red dotted line). The color of the bar indicates the genealogical state of each segment (same as in A). Parameters were chosen to be plausible for humans and chimpanzees: N = 10,000, Na = 50,000, T = 160,000, p = 0.5, and r = 1.25 cM/Mb (see Supporting Information). (C) Summary of the coalescent time for a single realization of a segment carrying a balanced trans-species polymorphism. The sample consists of 20 lineages in total, five from each allelic class in each species. Each plot shows the time to the MRCA for a specific subset of the 20 lineages indicated on the top and left. The selected site is located in the center of the segment.

Sites linked to a balanced trans-species polymorphism have unusual genealogies. (A) The trees are ordered by the distance from the selected site. Blue and red lines represent lineages from the two allelic classes, respectively. (B) The state of each segment in ten simulation replicates. Each bar represents a 10 kb region centered on a trans-species balanced polymorphism (red dotted line). The color of the bar indicates the genealogical state of each segment (same as in A). Parameters were chosen to be plausible for humans and chimpanzees: N = 10,000, Na = 50,000, T = 160,000, p = 0.5, and r = 1.25 cM/Mb (see Supporting Information). (C) Summary of the coalescent time for a single realization of a segment carrying a balanced trans-species polymorphism. The sample consists of 20 lineages in total, five from each allelic class in each species. Each plot shows the time to the MRCA for a specific subset of the 20 lineages indicated on the top and left. The selected site is located in the center of the segment.

Source publication
Article
Full-text available
When long-lasting, balancing selection can lead to “trans-species” polymorphisms that are shared by two or more species identical by descent. In such cases, the gene genealogy at the selected site clusters by allele instead of by species, and nearby neutral sites also have unusual genealogies because of linkage. While this scenario is expected to l...

Citations

... The main feature is that haplotypes from different species cluster by allele, rather than species, i.e., they produce allelic trees, and not trees resembling the species tree. These haplotypes include polymorphisms that predate species splits, while neutral recurrent shared polymorphisms are expected to generate species trees 72 . Because of the long-term effects of recombination, we expect this signature to be restricted to a short genomic region around the putative TSP 1 . ...
... A molecular signature of (long-term) balancing selection is high genetic diversity in correspondence with TSPs, since polymorphisms near these loci should be more ancient than the average genome-wide coalescent time 72 . We, therefore, calculated nucleotide diversity (π) for the regions surrounding the TSPs in non-overlapping sliding windows of 5000 bp, 2500 bp, and 1000 bp using VCFTOOLS 0.1.16 ...
Article
Full-text available
Balancing selection is an evolutionary process that maintains genetic polymorphisms at selected loci and strongly reduces the likelihood of allele fixation. When allelic polymorphisms that predate speciation events are maintained independently in the resulting lineages, a pattern of trans-species polymorphisms may occur. Trans-species polymorphisms have been identified for loci related to mating systems and the MHC, but they are generally rare. Trans-species polymorphisms in disease loci are believed to be a consequence of long-term host-parasite coevolution by balancing selection, the so-called Red Queen dynamics. Here we scan the genomes of three crustaceans with a divergence of over 15 million years and identify 11 genes containing identical-by-descent trans-species polymorphisms with the same polymorphisms in all three species. Four of these genes display molecular footprints of balancing selection and have a function related to immunity. Three of them are located in or close to loci involved in resistance to a virulent bacterial pathogen, Pasteuria, with which the Daphnia host is known to coevolve. This provides rare evidence of trans-species polymorphisms for loci known to be functionally relevant in interactions with a widespread and highly specific parasite. These findings support the theory that specific antagonistic coevolution is able to maintain genetic diversity over millions of years.
... tree. However, it is important to note that if only a single trans-specific polymorphism is the 515 target of balancing selection that arose prior to the species split, then recombination could have 516 eroded the signal of linked ancient polymorphism and the allele trees will be concordant with the 517 species tree (Gao et al., 2015). Thus, our analysis cannot accurately separate convergence 518 from trans-specificity in all cases, but can identify genes that have multiple, linked shared trans-519 specific polymorphisms that could be the target of long-term balancing selection. ...
Preprint
Full-text available
The patterns of genetic variation within and between related taxa represent the genetic history of a species. Shared polymorphisms, loci with identical alleles across species, are of unique interest as they may represent cases of ancient selection maintaining functional variation post-speciation. In this study, we investigate the abundance of shared polymorphism in the Daphnia pulex species complex. We test whether shared mutations are consistent with the action of balancing selection or alternative hypotheses such as hybridization, incomplete lineage sorting, or convergent evolution. We analyzed over 2,000 genomes from North American and European D. pulex and several outgroup species to examine the prevalence and distribution of shared alleles between the focal species pair, North American and European D. pulex . We show that while North American and European D. pulex diverged over ten million years ago, they retained tens of thousands of shared alleles. We found that the number of shared polymorphisms between North American and European D. pulex cannot be explained by hybridization or incomplete lineage sorting alone. Instead, we show that most shared polymorphisms could be the product of convergent evolution, that a limited number appear to be old trans-specific polymorphisms, and that balancing selection is affecting young and ancient mutations alike. Finally, we provide evidence that a blue wavelength opsin gene with trans-specific polymorphisms has functional effects on behavior and fitness in the wild. Ultimately, our findings provide insights into the genetic basis of adaptation and the maintenance of genetic diversity between species.
... Balancing selection on myb114 over a relatively long time in M. normale and short generation time of Melastoma species (two years) can account for this. A small, highly differentiated region is consistent with the observations in humans, where signatures of long-term balancing selection are confined to regions of at most a few kilobases [71]. ...
Article
Full-text available
Background The factors that maintain phenotypic and genetic variation within a population have received long-term attention in evolutionary biology. Here the genetic basis and evolution of the geographically widespread variation in twig trichome color (from red to white) in a shrub Melastoma normale was investigated using Pool-seq and evolutionary analyses. Results The results show that the twig trichome coloration is under selection in different light environments and that a 6-kb region containing an R2R3 MYB transcription factor gene is the major region of divergence between the extreme red and white morphs. This gene has two highly divergent groups of alleles, one of which likely originated from introgression from another species in this genus and has risen to high frequency (> 0.6) within each of the three populations under investigation. In contrast, polymorphisms in other regions of the genome show no sign of differentiation between the two morphs, suggesting that genomic patterns of diversity have been shaped by homogenizing gene flow. Population genetics analysis reveals signals of balancing selection acting on this gene, and it is suggested that spatially varying selection is the most likely mechanism of balancing selection in this case. Conclusions This study demonstrate that polymorphisms on a single transcription factor gene largely confer the twig trichome color variation in M. normale, while also explaining how adaptive divergence can occur and be maintained in the face of gene flow.
... Several studies have used true shared SNPs between humans and chimpanzees and suggested 9 that ULTBS in humans is rare(Leffler et al. 2013;Teixeira et al. 2015;Gao et al. 2015), at least 10 insofar as shared trPolym can be detected. The term "true" is critical here as most polymorphisms 11 shared among species are not inherited from a common ancestor and thus not trPolym but 12 recurrent mutations or sequencing errors.13 ...
Article
Full-text available
The identification of genomic regions and genes that have evolved under natural selection is a fundamental objective in the field of evolutionary genetics. While various approaches have been established for the detection of targets of positive selection, methods for identifying targets of balancing selection, a form of natural selection that preserves genetic and phenotypic diversity within populations, have yet to be fully developed. Despite this, balancing selection is increasingly acknowledged as a significant driver of diversity within populations, and the identification of its signatures in genomes is essential for understanding its role in evolution. In recent years, a plethora of sophisticated methods have been developed for the detection of patterns of linked variation produced by balancing selection, such as high levels of polymorphism, altered allele frequency distributions, and polymorphism sharing across divergent populations. In this review, we provide a comprehensive overview of classical and contemporary methods, offer guidance on the choice of appropriate methods, and discuss the importance of avoiding artifacts and of considering alternative evolutionary processes. The increasing availability of genome-scale datasets holds the potential to assist in the identification of new targets and the quantification of the prevalence of balancing selection, thus enhancing our understanding of its role in natural populations.
... However, sequencing errors and regions with high mutation rates can create patterns that can be mistaken for LTBS. Further modeling has shown that it is unlikely to observe haplotypes with more than two TSPs in close proximity by chance without balancing selection [2,22]. Despite the importance and prevalence of balancing selection, most of the non-coding haplotypes bearing potential signatures of LTBS (e.g., multiple SPs), have not been functionally characterized. ...
... Based on coalescent theory, this pattern is unlikely to result from neutral processes [4,11], and these SPs are thus candidates for LTBS (Additional file 1: Fig. S1). However, these criteria alone are insufficient to guarantee that the SPs are the result of identity-by-descent and driven by LTBS [22]. ...
... We further filtered the cbSP set to find high-confidence candidate trans-species balanced shared polymorphisms (ctSPs). To achieve this, we first selected all cbSP regions that contain three or more SPs, since this is estimated to substantially reduce the false positive rate [22]. We additionally considered time to more recent common ancestor (TMRCA) predictions for the cbSPs from an ancestral recombination graph method, ARGweaver [23]. ...
Article
Full-text available
Background Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. Variants shared between species in the state of identity-by-descent, hereafter “trans-species polymorphisms”, can result from LTBS, often due to host–pathogen interactions. For instance, the major histocompatibility complex (MHC) locus contains TSPs present across primates. Several hundred candidate LTBS regions have been identified in humans and chimpanzees; however, because many are in non-protein-coding regions of the genome, the functions and potential adaptive roles for most remain unknown. Results We integrated diverse genomic annotations to explore the functions of 60 previously identified regions with multiple shared polymorphisms (SPs) between humans and chimpanzees, including 19 with strong evidence of LTBS. We analyzed genome-wide functional assays, expression quantitative trait loci (eQTL), genome-wide association studies (GWAS), and phenome-wide association studies (PheWAS) for all the regions. We identify functional annotations for 59 regions, including 58 with evidence of gene regulatory function from GTEx or functional genomics data and 19 with evidence of trait association from GWAS or PheWAS. As expected, the SPs associate in humans with many immune system phenotypes, including response to pathogens, but we also find associations with a range of other phenotypes, including body size, alcohol intake, cognitive performance, risk-taking behavior, and urate levels. Conclusions The diversity of traits associated with non-coding regions with multiple SPs support previous hypotheses that functions beyond the immune system are likely subject to LTBS. Furthermore, several of these trait associations provide support and candidate genetic loci for previous hypothesis about behavioral diversity in human and chimpanzee populations, such as the importance of variation in risk sensitivity.
... Balancing selection produces clusters of linked sites with correlated allele frequencies surrounding balanced polymorphisms 51,70,71 . We quantified allele frequency correlation, ß (refs. ...
... In both C. viridis and C. oreganus, genes with elevated genetic diversity in major venom regions show additional signatures of balancing selection, including local concentrations of intermediate-frequency alleles, extended haplotype lengths and clusters of SNPs with correlated allele frequencies ( Fig. 3 and Extended Data Fig. 9). The latter signature, in particular, is expected when balanced polymorphism has been maintained over moderately long evolutionary time periods (long-term balancing selection 42,70,71 ). Model-based inferences further reject neutrality for multiple genes in the SVMP and SVSP clusters in both species (Fig. 3 and Extended Data Figs. ...
Article
Full-text available
The origin of snake venom involved duplication and recruitment of non-venom genes into venom systems. Several studies have predicted that directional positive selection has governed this process. Venom composition varies substantially across snake species and venom phenotypes are locally adapted to prey, leading to coevolutionary interactions between predator and prey. Venom origins and contemporary snake venom evolution may therefore be driven by fundamentally different selection regimes, yet investigations of population-level patterns of selection have been limited. Here, we use whole-genome data from 68 rattlesnakes to test hypotheses about the factors that drive genomic diversity and differentiation in major venom gene regions. We show that selection has resulted in long-term maintenance of genetic diversity within and between species in multiple venom gene families. Our findings are inconsistent with a dominant role of directional positive selection and instead support a role of long-term balancing selection in shaping venom evolution. We also detect rapid decay of linkage disequilibrium due to high recombination rates in venom regions, suggesting that venom genes have reduced selective interference with nearby loci, including other venom paralogues. Our results provide an example of long-term balancing selection that drives trans-species polymorphism and help to explain how snake venom keeps pace with prey resistance. Analysing whole-genome sequences from 68 rattlesnakes, the authors show a role of long-term balancing selection in maintaining diversity of multiple venom gene families and find reduced selective interference of venom genes with neighbouring loci.
... Numerous methods have been developed to detect the signature of balancing selection [4][5][6][7][8][9][10][11][12][13][14][15]. Application of these methods have identified a number of loci subject to balancing selection, largely in the human genome, in which most of this research has taken place. ...
... Here, we introduce a method that is simple to apply and which generates a direct estimate of the number of polymorphisms subject to balancing selection. One signature of balancing selection that has been utilised in several studies is the sharing of polymorphisms between species [5,8,10]. If the species are sufficiently divergent that they are unlikely to share neutral polymorphisms, then shared genetic variation can be attributed to balancing selection. ...
... Our method is unlikely to have much power to detect balancing selection in single genes, because rather than leveraging the effects of balancing selection on patterns of linked polymorphism, our method simply looks for an excess of shared polymorphism; in fact, linkage confounds the signal of balancing selection in our method. This is in contrast to most other methods, which consider patterns of linked polymorphism and can have considerable power to detect balancing selection on single genes [6,7,[9][10][11][13][14][15]. To investigate whether our method has any power to detect balancing selection in single genes, we simulated a locus with structure conforming to the average human gene, in which an ancestral population was split into 2 descendant populations. ...
Article
Full-text available
The role that balancing selection plays in the maintenance of genetic diversity remains unresolved. Here, we introduce a new test, based on the McDonald–Kreitman test, in which the number of polymorphisms that are shared between populations is contrasted to those that are private at selected and neutral sites. We show that this simple test is robust to a variety of demographic changes, and that it can also give a direct estimate of the number of shared polymorphisms that are directly maintained by balancing selection. We apply our method to population genomic data from humans and provide some evidence that hundreds of nonsynonymous polymorphisms are subject to balancing selection.
... However, sequencing errors and regions with high mutation rates can create patterns that can be mistaken for LTBS. Further modeling has shown that it is unlikely to observe haplotypes with more than two TSPs in close proximity by chance without balancing selection 2,22 . ...
... Based on coalescent theory, this pattern is unlikely to result from neutral processes, 4,11 and are thus candidates for LTBS ( Figure S1). However, these criteria alone are insu cient to guarantee that the SPs are the result of identity-by-descent and driven by LTBS 22 . ...
... We further ltered the cbSP set to nd high-con dence candidate trans-species balanced shared polymorphisms (ctSPs). To achieve this, we rst selected all cbSP regions that contain three or more SPs, since this is estimated to substantially reduce the false positive rate 22 . We additionally considered time to more recent common ancestor (TMRCA) predictions for the cbSPs from an ancestral recombination graph method, ARGweaver 23 . ...
Preprint
Full-text available
Background Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. Variants shared between species in the state of identity-by-descent, hereafter “trans-species polymorphisms”, can result from LTBS, often due to host-pathogen interactions. For instance, the major histocompatibility complex (MHC) locus contains TSPs present across primates. Several hundred candidate LTBS regions have been identified in humans and chimpanzees; however, because many are in non-protein-coding regions of the genome, the functions and potential adaptive roles for most remain unknown. Results We integrated diverse genomic annotations to explore the functions of 60 previously identified regions with multiple shared polymorphisms (SPs) between humans and chimpanzees, including 19 with strong evidence of LTBS. We analyzed genome-wide functional assays, expression quantitative trait loci (eQTL), genome-wide association studies (GWAS), and phenome-wide association studies (PheWAS) for all the regions. We identify functional annotations for 59 regions, including 58 with evidence of gene regulatory function from GTEx or functional genomics data and 19 with evidence of trait association from GWAS or PheWAS. As expected, the SPs associate in humans with many immune system phenotypes, including response to pathogens, but we also find associations with a range of other phenotypes, including body size, alcohol intake, cognitive performance, risk-taking behavior, and urate levels. Conclusions The diversity of traits associated with non-coding regions with multiple SPs support previous hypotheses that functions beyond the immune system are likely subject to LTBS. Furthermore, several of these trait associations provide support and candidate genetic loci for previous hypothesis about behavioral diversity in human and chimpanzee populations, such as the importance of variation in risk sensitivity.
... The similarity degree (DS R /DS S ) between R varieties and O. nivara were high in the three regions ( Figure 5C), confirming the ancestral polymorphisms retained from O. nivara to R varieties in these regions. One feature associated with ancient balancing selection is the clustering of ancient balanced polymorphisms from different species by allelotype rather than by species (Leffler et al., 2013;Gao et al., 2014). Here, in a neighbor-joining tree based on SNPs from the whole genome, we observed four distinct clusters corresponding to four variety types, that is, indica, japonica, circum-aus, and circum-basmati ( Figure 5D). ...
Article
Full-text available
Interactions and co‐evolution between plants and herbivorous insects are critically important in agriculture. Brown planthopper (BPH) is the most severe insect of rice, and the biotypes adapt to feed on different rice genotypes. Here, we present genomics analyses on 1,520 global rice germplasms for resistance to three BPH biotypes. Genome‐wide association studies identified 3,502 single nucleotide polymorphisms (SNPs) and 59 loci associated with BPH resistance in rice. We cloned a previously unidentified gene Bph37 that confers resistance to BPH. The associated loci showed high nucleotide diversity. Genome‐wide scans for trans‐species polymorphisms revealed ancient balancing selection at the loci. The secondarily evolved insect biotypes II and III exhibited significantly higher virulence and overcame more rice varieties than the primary biotype I. In response, more SNPs and loci evolved in rice for resistance to biotypes II and III. Notably, three exceptional large regions with high SNP density and resistance‐associated loci on chromosomes 4 and 6 appear distinct between the resistant and susceptible rice varieties. Surprisingly, these regions in resistant rice might have been retained from wild species Oryza nivara. Our findings expand the understanding of long‐term interactions between rice and BPH and provide resistance genes and germplasm resources for breeding durable BPH‐resistant rice varieties.
... Estimates of linkage disequilibrium (LD) from D. melanogaster autosomes indicate that most LD decays within Bold genes in these columns indicate genes in the top 10% of differentially expressed genes within sex. 200 base pairs, and r 2 (the correlation between SNPs) decreases to <0.1 within a 1 kB window (Franssen et al. 2015); therefore, the size of the putatively-introgressed window observed here is well outside the range expected under ILS, as discordant ancestry due to sorting from ancestral variation is expected in small blocks (Hudson and Coyne 2002;Gao et al. 2015). Note that inversions could also be responsible for capturing ancestrally segregating variation in larger genomic regions than expected from LD measures, and populations within this group are known to differ in the presence/absence of inversions, including at the specific genomic region containing these genes (Reis et al. 2018). ...
Article
Full-text available
Because sensory signals often evolve rapidly, they could be instrumental in the emergence of reproductive isolation between species. However, pinpointing their specific contribution to isolating barriers, and the mechanisms underlying their divergence, remains challenging. Here, we demonstrate sexual isolation due to divergence in chemical signals between two populations of Drosophila americana (SC and NE) and one population of D. novamexicana, and dissect its underlying phenotypic and genetic mechanisms. Mating trials revealed strong sexual isolation between Drosophila novamexicana males and SC Drosophila americana females, as well as more moderate bi-directional isolation between D. americana populations. Mating behavior data indicate SC D. americana males have the highest courtship efficiency and, unlike males of the other populations, are accepted by females of all species. Quantification of cuticular hydrocarbon (CHC) profiles—chemosensory signals that are used for species recognition and mate finding in Drosophila—shows that the SC D. americana population differs from the other populations primarily on the basis of compound carbon chain-length. Moreover, manipulation of male CHC composition via heterospecific perfuming—specifically perfuming D. novamexicana males with SC D. americana males—abolishes their sexual isolation from these D. americana females. Of a set of candidates, a single gene—elongase CG17821—had patterns of gene expression consistent with a role in CHC differences between species. Sequence comparisons indicate D. novamexicana and our Nebraska (NE) D. americana population share a derived CG17821 truncation mutation that could also contribute to their shared “short” CHC phenotype. Together, these data suggest an evolutionary model for the origin and spread of this allele and its consequences for CHC divergence and sexual isolation in this group.