Article

Hidden copy number variation in the HapMap population

July 2008
Proceedings of the National Academy of Sciences 105(29):10067-72

July 2008
105(29):10067-72

DOI:10.1073/pnas.0711252105

Source
PubMed

Authors:

Michael White

Institut Pasteur International Network

Simon Tavaré

University of Cambridge

Recently, the extent of copy number variation (CNV) throughout the genome has been shown to be far greater than previously thought. Further, it has been demonstrated that specific copy number variable regions (CNVRs) are associated with particular diseases, suggesting that these genetic variations may have an important biological role. Hence, calling CNVRs and subsequently classifying samples as “losses” or “gains” is of great interest. A number of papers have been published containing classifications of CNVs, and here we show how the presence of pedigree information can be used for assessing the performance of those classification methods. In this article, by examining CNV classifications made in the HapMap samples, we show that estimates of the number of false-positive classifications per individual made by current approaches can be determined. Moreover, commonplace technologies for determining the locations of CNVRs aggregate information across the maternal and paternal chromosomes at the locus of interest. Here, we show that copy number variation on each chromosome can be inferred and, in particular, we discuss the existence of a class of CNVs that are inevitably misclassified and give an estimate of their prevalence. Although our focus is not on the development of calling algorithms per se, we describe and provide an example of how our model might be incorporated into the initial classification procedure to produce more robust results. Finally, we discuss how this methodology might be applied to future studies to obtain better estimates of the extent of CNV across the genome. • array CGH • classification • copy number variation • HapMap Project • pedigree information

A Bayesian segmentation approach to ascertain copy number variations at the population level

Article

Full-text available

May 2009
BIOINFORMATICS

Efficient and accurate ascertainment of copy number variations (CNVs) at the population level is essential to understand the evolutionary process and population genetics, and to apply CNVs in population-based genome-wide association studies for complex human diseases. We propose a novel Bayesian segmentation approach to identify CNVs in a defined population of any size. It is computationally efficient and provides statistical evidence for the detected CNVs through the Bayes factor. This approach has the unique feature of carrying out segmentation and assigning copy number status simultaneously-a desirable property that current segmentation methods do not share. In comparisons with popular two-step segmentation methods for a single individual using benchmark simulation studies, we find the new approach to perform competitively with respect to false discovery rate and sensitivity in breakpoint detection. In a simulation study of multiple samples with recurrent copy numbers, the new approach outperforms two leading single sample methods. We further demonstrate the effectiveness of our approach in population-level analysis of previously published HapMap data. We also apply our approach in studying population genetics of CNVs. R programs are available at http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML

Algorithms and applications of next-generation DNA sequencing : ChIP-Seq, database of human variations, and analysis of mammary ductal carcinomas

Article

Jan 2012

Anthony Peter Fejes

Innovation of prenatal genetic diagnostics in relation to improvement of care

Article

Full-text available

Angelique J A Kooper

Inheritance Model Introduces Differential Bias in CNV Calls Between Parents and Offspring

Article

Jul 2012
GENET EPIDEMIOL

Copy Number Variation (CNV) is increasingly implicated in disease pathogenesis. CNVs are often identified by statistical models applied to data from single nucleotide polymorphism panels. Family information for samples provides additional information for CNV inference. Two modes of PennCNV (the Joint-call and Posterior-call), which are some of the most well-developed family-based CNV calling methods, use a "Joint-model" as a main component. This models all family members' CNV states together with Mendelian inheritance. Methods based on the Joint-model are used to infer CNV calls of cases and controls in a pedigree, which may be compared to each other to test an association. Although benefits from the Joint-model have been shown elsewhere, equality of call rates in parents and offspring has not been evaluated previously. This can affect downstream analyses in studies that compare CNV rates in cases vs. controls in pedigrees. In this paper, we show that the Joint-model can introduce different CNV call rates among family members in the absence of a true difference. We show that the Joint-model may analytically introduce differential CNV calls because of asymmetry of the model. We demonstrate these differential call rates using single-marker simulations. We show that call rates using the two modes of PennCNV also differ between parents and offspring in one multimarker simulated dataset and two real datasets. Our results advise need for caution in use of the Joint-model calls in CNV association studies with family-based datasets.

Genomic characteristics of cattle copy number variations

Article

Full-text available

Feb 2011
BMC GENOMICS

Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits. We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms. We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.

Identification of Genome-wide Copy Number Variations and a Family-based Association Study of Avellino Corneal Dystrophy

Article

Mar 2010

To determine the association of identified copy number variations (CNVs) in whole genome with the risk of Avellino corneal dystrophy (ACD) in a Korean population. Case-control study. A total of 146 patients with ACD and 226 control subjects. A total of 193 trios were genotyped by the Illumina HumanHapCNV370-Duo BeadChip (370,404 markers) (Illumina, Inc., San Diego, CA). The intensity signal (log R ratio) and allelic intensity ratio (B allele frequency) of each marker in all individuals were obtained by Illumina BeadStudio software (Illumina, Inc.). To obtain authentic CNVs in this study, we performed a family-based CNV validation and family-based boundary mapping using the PennCNV algorithm, which incorporates multiple factors, including total log R ratio, B allele frequency, and family information, based on an integrated hidden Markov model. Statistical comparison and identification of CNVs between case and control using family information. We identified 27,267 individual trio CNVs with a median size of 16.2 kb, aggregated in 2245 CNV regions. Most of the identified trio CNVs in this study showed well-defined CNV boundaries and overlapped with those in the Database of Genomic Variants (DGV) (83.4% in number and 79.2% in length). With the common CNV regions (264 CNV regions >5%), we performed a family-based association test with the risk of ACD. Two CNV regions (chr6:29978470-29987783 and chr14:59896944-59916129) were significantly associated with the risk of ACD (P=0.05-0.003 and P=0.008, respectively). This study describes the first results of a genome-wide association analysis of individual CNVs with the risk of ACD and shows that 2 novel CNV loci may be involved in the risk of ACD. The author(s) have no proprietary or commercial interest in any materials discussed in this article.

Modeling genetic inheritance of copy number variations

Article

Full-text available

Nov 2008
NUCLEIC ACIDS RES

Copy number variations (CNVs) are being used as genetic markers or functional candidates in gene-mapping studies. However, unlike single nucleotide polymorphism or microsatellite genotyping techniques, most CNV detection methods are limited to detecting total copy numbers, rather than copy number in each of the two homologous chromosomes. To address this issue, we developed a statistical framework for intensity-based CNV detection platforms using family data. Our algorithm identifies CNVs for a family simultaneously, thus avoiding the generation of calls with Mendelian inconsistency while maintaining the ability to detect de novo CNVs. Applications to simulated data and real data indicate that our method significantly improves both call rates and accuracy of boundary inference, compared to existing approaches. We further illustrate the use of Mendelian inheritance to infer SNP allele compositions in each of the two homologous chromosomes in CNV regions using real data. Finally, we applied our method to a set of families genotyped using both the Illumina HumanHap550 and Affymetrix genome-wide 5.0 arrays to demonstrate its performance on both inherited and de novo CNVs. In conclusion, our method produces accurate CNV calls, gives probabilistic estimates of CNV transmission and builds a solid foundation for the development of linkage and association tests utilizing CNVs.

APP locus duplication causes autosomal dominant early-onset Alzheimer disease with cerebral amyloid angiopathy

Article

Full-text available

Feb 2006
Nat Genet

We report duplication of the APP locus on chromosome 21 in five families with autosomal dominant early-onset Alzheimer disease (ADEOAD) and cerebral amyloid angiopathy (CAA). Among these families, the duplicated segments had a minimal size ranging from 0.58 to 6.37 Mb. Brains from individuals with APP duplication showed abundant parenchymal and vascular deposits of amyloid-beta peptides. Duplication of the APP locus, resulting in accumulation of amyloid-beta peptides, causes ADEOAD with CAA.

Global variation in copy number in the human genome

Article

Full-text available

Dec 2006
NATURE

Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.

The International HapMap ProjectCity

Article

Full-text available

Dec 2003

The goal of the International HapMap Project is to determine the common patterns of DNA sequence variation in the human genome and to make this information freely available in the public domain. An international consortium is developing a map of these patterns across the genome by determining the genotypes of one million or more sequence variants, their frequencies and the degree of association between them, in DNA samples from populations with ancestry from parts of Africa, Asia and Europe. The HapMap will allow the discovery of sequence variants that affect common disease, will facilitate development of diagnostic tools, and will enhance our ability to choose targets for therapeutic intervention.

Kallioniemi A, Kallioniemi OP, Piper J, Tanner M, Stokke T, Chen L, Smith HS, Pinkel D, Gray JW, Waldman FMDetection and mapping of amplified DNA sequences in breast cancer by comparative genomic hybridization. Proc Natl Acad Sci USA 91: 2156-2160

Article

Full-text available

Apr 1994

Comparative genomic hybridization was applied to 5 breast cancer cell lines and 33 primary tumors to discover and map regions of the genome with increased DNA-sequence copy-number. Two-thirds of primary tumors and almost all cell lines showed increased DNA-sequence copy-number affecting a total of 26 chromosomal subregions. Most of these loci were distinct from those of currently known amplified genes in breast cancer, with sequences originating from 17q22-q24 and 20q13 showing the highest frequency of amplification. The results indicate that these chromosomal regions may contain previously unknown genes whose increased expression contributes to breast cancer progression. Chromosomal regions with increased copy-number often spanned tens of Mb, suggesting involvement of more than one gene in each region.

The influence of CCL3L1 gene-containing segmental duplications on HIV-1/AIDS susceptibility

Article

Full-text available

Apr 2005
SCIENCE

Segmental duplications in the human genome are selectively enriched for genes involved in immunity, although the phenotypic consequences for host defense are unknown. We show that there are significant interindividual and interpopulation differences in the copy number of a segmental duplication encompassing the gene encoding CCL3L1 (MIP-1alphaP), a potent human immunodeficiency virus-1 (HIV-1)-suppressive chemokine and ligand for the HIV coreceptor CCR5. Possession of a CCL3L1 copy number lower than the population average is associated with markedly enhanced HIV/acquired immunodeficiency syndrome (AIDS) susceptibility. This susceptibility is even greater in individuals who also possess disease-accelerating CCR5 genotypes. This relationship between CCL3L1 dose and altered HIV/AIDS susceptibility points to a central role for CCL3L1 in HIV/AIDS pathogenesis and indicates that differences in the dose of immune response genes may constitute a genetic basis for variable responses to infectious diseases.

Wang K, Li M, Hadley D, Liu R, Glessner J, Grant SF, Hakonarson H, Bucan MPennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res 17: 1665-1674

Article

Full-text available

Dec 2007
GENOME RES

Comprehensive identification and cataloging of copy number variations (CNVs) is required to provide a complete view of human genetic variation. The resolution of CNV detection in previous experimental designs has been limited to tens or hundreds of kilobases. Here we present PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data. This algorithm incorporates multiple sources of information, including total signal intensity and allelic intensity ratio at each SNP marker, the distance between neighboring SNPs, the allele frequency of SNPs, and the pedigree information where available. We applied PennCNV to genotyping data generated for 112 HapMap individuals; on average, we detected approximately 27 CNVs for each individual with a median size of approximately 12 kb. Excluding common rearrangements in lymphoblastoid cell lines, the fraction of CNVs in offspring not detected in parents (CNV-NDPs) was 3.3%. Our results demonstrate the feasibility of whole-genome fine-mapping of CNVs via high-density SNP genotyping.

Breaking the waves: Improved detection of copy number variation from microarray-based comparative genomic hybridization

Article

Full-text available

Feb 2007

Large-scale high throughput studies using microarray technology have established that copy number variation (CNV) throughout the genome is more frequent than previously thought. Such variation is known to play an important role in the presence and development of phenotypes such as HIV-1 infection and Alzheimer's disease. However, methods for analyzing the complex data produced and identifying regions of CNV are still being refined. We describe the presence of a genome-wide technical artifact, spatial autocorrelation or 'wave', which occurs in a large dataset used to determine the location of CNV across the genome. By removing this artifact we are able to obtain both a more biologically meaningful clustering of the data and an increase in the number of CNVs identified by current calling methods without a major increase in the number of false positives detected. Moreover, removing this artifact is critical for the development of a novel model-based CNV calling algorithm - CNVmix - that uses cross-sample information to identify regions of the genome where CNVs occur. For regions of CNV that are identified by both CNVmix and current methods, we demonstrate that CNVmix is better able to categorize samples into groups that represent copy number gains or losses. Removing artifactual 'waves' (which appear to be a general feature of array comparative genomic hybridization (aCGH) datasets) and using cross-sample information when identifying CNVs enables more biological information to be extracted from aCGH experiments designed to investigate copy number variation in normal individuals.

A Haplotype Map of the Human Genome

Article

Oct 2005

The International HapMap Consortium

Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

A haplotype map of the human genome

Article

Jan 2005

A haplotype map of the human genome The International HapMap Consortium* Inherited genetic variation has a critical but as yet largely uncharacterized role in human disease. Here we report a public database of common variation in the human genome: more than one million single nucleotide polymorphisms (SNPs) for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted. These data document the generality of recombination hotspots, a block-like structure of linkage disequilibrium and low haplotype diversity, leading to substantial correlations of SNPs with many of their neighbours. We show how the HapMap resource can guide the design and analysis of genetic association studies, shed light on structural variation and recombination, and identify loci that may have been subject to natural selection during human evolution.

A haplotype map of the human genome

Article

Jan 2005

A high-resolution survey of deletion polymorphism in the human genome

Article

Feb 2006

Recent work has shown that copy number polymorphism is an important class of genetic variation in human genomes. Here we report a new method that uses SNP genotype data from parent-offspring trios to identify polymorphic deletions. We applied this method to data from the International HapMap Project to produce the first high-resolution population surveys of deletion polymorphism. Approximately 100 of these deletions have been experimentally validated using comparative genome hybridization on tiling-resolution oligonucleotide microarrays. Our analysis identifies a total of 586 distinct regions that harbor deletion polymorphisms in one or more of the families. Notably, we estimate that typical individuals are hemizygous for roughly 30-50 deletions larger than 5 kb, totaling around 550-750 kb of euchromatic sequence across their genomes. The detected deletions span a total of 267 known and predicted genes. Overall, however, the deleted regions are relatively gene-poor, consistent with the action of purifying selection against deletions. Deletion polymorphisms may well have an important role in the genetics of complex traits; however, they are not directly observed in most current gene mapping studies. Our new method will permit the identification of deletion polymorphisms in high-density SNP surveys of trio or other family data.

Diabetes and obesity: The twin epidemics

Article

Feb 2006

Structural Variation in the Human Genome

Article

Mar 2006

The first wave of information from the analysis of the human genome revealed SNPs to be the main source of genetic and phenotypic human variation. However, the advent of genome-scanning technologies has now uncovered an unexpectedly large extent of what we term 'structural variation' in the human genome. This comprises microscopic and, more commonly, submicroscopic variants, which include deletions, duplications and large-scale copy-number variants - collectively termed copy-number variants or copy-number polymorphisms - as well as insertions, inversions and translocations. Rapidly accumulating evidence indicates that structural variants can comprise millions of nucleotides of heterogeneity within every genome, and are likely to make an important contribution to human diversity and disease susceptibility.

Linkage Disequilibrium and Heritability of Copy-Number Polymorphisms within Duplicated Regions of the Human Genome

Article

Sep 2006

Studies of copy-number variation and linkage disequilibrium (LD) have typically excluded complex regions of the genome that are rich in duplications and prone to rearrangement. In an attempt to assess the heritability and LD of copy-number polymorphisms (CNPs) in duplication-rich regions of the genome, we profiled copy-number variation in 130 putative "rearrangement hotspot regions" among 269 individuals of European, Yoruba, Chinese, and Japanese ancestry analyzed by the International HapMap Consortium. Eighty-four hotspot regions, corresponding to 257 bacterial artificial chromosome (BAC) probes, showed evidence of copy-number differences. Despite a predisposing genetic architecture, no polymorphism was ever observed in the remaining 46 "rearrangement hotspots," and we suggest these represent excellent candidate sites for pathogenic rearrangements. We used a combination of BAC-based and high-density customized oligonucleotide arrays to resolve the molecular basis of structural rearrangements. For common variants (frequency >10%), we observed a distinct bias against copy-number losses, suggesting that deletions are subject to purifying selection. Heritability estimates did not differ significantly from 1.0 among the majority (30 of 34) of loci analyzed, consistent with normal Mendelian inheritance. Some of the CNPs in duplication-rich regions showed strong LD with nearby single-nucleotide polymorphisms (SNPs) and were observed to segregate on ancestral SNP haplotypes. However, LD with the best available SNP markers was weaker than has been reported for deletion polymorphisms in less complex regions of the genome. These observations may be accounted for by a low density of SNP data in duplicated regions, challenges in mapping and typing the CNPs, and the possibility that CNPs in these regions have rearranged on multiple haplotype backgrounds. Our results underscore the need for complete maps of genetic variation in duplication-rich regions of the genome.

A Bayesian Approach to Copy-Number–Polymorphism Analysis in Nuclear Pedigrees

Article

Nov 2007

Segmental copy-number polymorphisms (CNPs) represent a significant component of human genetic variation and are likely to contribute to disease susceptibility. These potentially multiallelic and highly polymorphic systems present new challenges to family-based genetic-analysis tools that commonly assume codominant markers and allow for no genotyping error. The copy-number quantitation (CNP phenotype) represents the total number of segmental copies present in an individual and provides a means to infer, rather than to observe, the underlying allele segregation. We present an integrated approach to meet these challenges, in the form of a graphical model in which we infer the underlying CNP phenotype from the (single or replicate) quantitative measure within the analysis while assuming an allele-based system segregating through the pedigree. This approach can be readily applied to the study of any form of genetic measure, and the construction permits extension to a wide variety of hypothesis tests. We have implemented the basic model for use with nuclear families, and we illustrate its application through an analysis of the CNP located in gene CCL3L1 in 201 families with asthma.

Breakthrough of the year: Human genetic variation

Article

Jan 2008
SCIENCE

Elizabeth Pennisi

Equipped with faster, cheaper technologies for sequencing DNA and assessing variation in genomes on scales ranging from one to millions of bases, researchers are finding out how truly different we are from one another.

A high-resolution survey of deletion polymorphism in the human genome Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization

75-81

Df Conrad
Carter Td Andrews
Np
Me Hurles
Jk Pritchard
Marioni
Jc

Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK (2006) A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 38:75– 81. 8. Marioni JC, et al. (2007) Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol 8:R228.

DetectionandmappingofamplifiedDNAsequencesinbreast cancer by comparative genomic hybridization

2156-2160

Kallioniemia

KallioniemiA,etal.(1994)DetectionandmappingofamplifiedDNAsequencesinbreast cancer by comparative genomic hybridization. Proc Natl Acad Sci USA 91:2156–2160.

Hidden copy number variation in the HapMap population

Abstract

No full-text available

Recommended publications

A large-scale survey of genetic copy number variations among Han Chinese residing in Taiwan