Fig 1 - uploaded by Kui Zhang
Content may be subject to copyright.
Main effect model: estimated 68 and 95% coverages of the 'true' values (indicated by bold and thin horizontal lines in the left column, respectively) and empirical powers or type I error rates [empirical powers or type I error rates for = 0.001 ( ! ), = 0.01 (o), and = 0.05 (+)] for each of 4 haplotypes based on the four methods with sample sizes of 250 (top), 500 (middle), and 1,000 (bottom). B = BayesGLM; C = GLM; R = rGLM; S = ScoreGLM. 

Main effect model: estimated 68 and 95% coverages of the 'true' values (indicated by bold and thin horizontal lines in the left column, respectively) and empirical powers or type I error rates [empirical powers or type I error rates for = 0.001 ( ! ), = 0.01 (o), and = 0.05 (+)] for each of 4 haplotypes based on the four methods with sample sizes of 250 (top), 500 (middle), and 1,000 (bottom). B = BayesGLM; C = GLM; R = rGLM; S = ScoreGLM. 

Source publication
Article
Full-text available
Genetic association studies based on haplotypes are powerful in the discovery and characterization of the genetic basis of complex human diseases. However, statistical methods for detecting haplotype-haplotype and haplotype-environment interactions have not yet been fully developed owing to the difficulties encountered: large numbers of potential h...

Contexts in source publication

Context 1
... this and the following subsections, we did not con- sider rGLM in the evaluation of empirical power as well as type I errors because, as mentioned before, its omnibus test does not produce p values for individual effects. Therefore, the empirical powers were calculated based only on ScoreGLM, GLM, and BayesGLM for haplo4.1 and haplo4.3 , from which we tried to evaluate the ability of these methods to detect any disease-predisposing hap- lotypes. Under the sample size of 250, BayesGLM dem- onstrated higher probabilities for detecting genetic ef- fects compared to both ScoreGLM and GLM (top right corner of fig. 1 ). Although the advantage of BayesGLM in the statistical validity was diminishing with the increase in sample size, it still persisted, especially for the rare hap- lotype, haplo4.3 , and for the powers under = 0.001 and 0.01 (middle right and bottom right corner of fig. 1 ). For all of the three methods, a sample size of 500 was suffi- cient to detect a common haplotype with a power of 90% approximately, and a sample size of 1,000 was sufficient to identify a rare haplotype with a power of 85% approx- ...
Context 2
... this and the following subsections, we did not con- sider rGLM in the evaluation of empirical power as well as type I errors because, as mentioned before, its omnibus test does not produce p values for individual effects. Therefore, the empirical powers were calculated based only on ScoreGLM, GLM, and BayesGLM for haplo4.1 and haplo4.3 , from which we tried to evaluate the ability of these methods to detect any disease-predisposing hap- lotypes. Under the sample size of 250, BayesGLM dem- onstrated higher probabilities for detecting genetic ef- fects compared to both ScoreGLM and GLM (top right corner of fig. 1 ). Although the advantage of BayesGLM in the statistical validity was diminishing with the increase in sample size, it still persisted, especially for the rare hap- lotype, haplo4.3 , and for the powers under = 0.001 and 0.01 (middle right and bottom right corner of fig. 1 ). For all of the three methods, a sample size of 500 was suffi- cient to detect a common haplotype with a power of 90% approximately, and a sample size of 1,000 was sufficient to identify a rare haplotype with a power of 85% approx- ...
Context 3
... scenario 1, only 4 haplotypes in the 4-SNP haplo- type block were modeled as main effects for the disease ( table 2 ). The 'true' values prespecified for these 4 haplo- types were first compared to their corresponding esti- mated coefficients based on the four methods (left col- umn of fig. 1 ). Under the sample size of 250, wider esti- mated 68 and 95% intervals that covered the 'true' values calculated based on BayesGLM were observed for each of 4 haplotypes compared to those calculated based on the other three methods, with the only exception that rGLM had little wider estimated intervals than BayesGLM for haplo4.1 (top left corner of fig. 1 ). With the increase in sample sizes, however, the superiority of reliability of BayesGLM faded for all of the haplotypes except haplo4.3 (middle left and bottom left corner of fig. 1 ), although its two coverage rates maintained a low growth rate. For all of the four methods, haplo4.3 had lower coverage than the other haplotypes, no matter what sample sizes were ...
Context 4
... scenario 1, only 4 haplotypes in the 4-SNP haplo- type block were modeled as main effects for the disease ( table 2 ). The 'true' values prespecified for these 4 haplo- types were first compared to their corresponding esti- mated coefficients based on the four methods (left col- umn of fig. 1 ). Under the sample size of 250, wider esti- mated 68 and 95% intervals that covered the 'true' values calculated based on BayesGLM were observed for each of 4 haplotypes compared to those calculated based on the other three methods, with the only exception that rGLM had little wider estimated intervals than BayesGLM for haplo4.1 (top left corner of fig. 1 ). With the increase in sample sizes, however, the superiority of reliability of BayesGLM faded for all of the haplotypes except haplo4.3 (middle left and bottom left corner of fig. 1 ), although its two coverage rates maintained a low growth rate. For all of the four methods, haplo4.3 had lower coverage than the other haplotypes, no matter what sample sizes were ...
Context 5
... scenario 1, only 4 haplotypes in the 4-SNP haplo- type block were modeled as main effects for the disease ( table 2 ). The 'true' values prespecified for these 4 haplo- types were first compared to their corresponding esti- mated coefficients based on the four methods (left col- umn of fig. 1 ). Under the sample size of 250, wider esti- mated 68 and 95% intervals that covered the 'true' values calculated based on BayesGLM were observed for each of 4 haplotypes compared to those calculated based on the other three methods, with the only exception that rGLM had little wider estimated intervals than BayesGLM for haplo4.1 (top left corner of fig. 1 ). With the increase in sample sizes, however, the superiority of reliability of BayesGLM faded for all of the haplotypes except haplo4.3 (middle left and bottom left corner of fig. 1 ), although its two coverage rates maintained a low growth rate. For all of the four methods, haplo4.3 had lower coverage than the other haplotypes, no matter what sample sizes were ...

Similar publications

Article
Full-text available
Single nucleotide polymorphisms (SNPs) in thioredoxin-interacting protein (TXNIP) gene may modulate TXNIP expression, then increase the risk of coronary artery disease (CAD). In a two-stage case-control study with a total of 1818 CAD patients and 1963 controls, we genotyped three SNPs in TXNIP and found that the variant genotypes of SNPs rs7212 [od...
Article
Full-text available
Extensive genetic studies have identified a large number of causal genetic variations in many human phenotypes; however, these could not completely explain heritability in complex diseases. Some researchers have proposed that the "missing heritability" may be attributable to gene-gene and gene-environment interactions. Because there are billions of...
Article
Full-text available
Neurodevelopmental disorders (NDDs) represent a growing medical challenge in modern societies. Ever-increasing sophisticated diagnostic tools have been continuously revealing a remarkably complex architecture that embraces genetic mutations of distinct types (chromosomal rearrangements, copy number variants, small indels, and nucleotide substitutio...
Article
Full-text available
Bryant-Li-Bhoj syndrome (BLBS), which became OMIM-classified in 2022 (OMIM: 619720, 619721), is caused by germline variants in the two genes that encode histone H3.3 ( H3-3A / H3F3A and H3-3B / H3F3B ) [1–4]. This syndrome is characterized by developmental delay/intellectual disability, craniofacial anomalies, hyper/hypotonia, and abnormal neuroima...
Article
Full-text available
Understanding the determinants of healthy mental ageing is a priority for society today. So far, we know that intelligence differences show high stability from childhood to old age and there are estimates of the genetic contribution to intelligence at different ages. However, attempts to discover whether genetic causes contribute to differences in...

Citations

... This endeavor can pose considerable difficulties and requires meticulous study design and statistical analysis. Li et al 94 proposed an efficient Bayesian, hierarchical, generalized linear model that surpasses existing methods in detecting haplotype interactions and promises to enrich understanding of complex diseases and inform prevention and treatment strategies. ...
Article
Full-text available
Noncommunicable diseases (NCDs) are influenced by the interplay between genetics and environmental exposures, particularly diet. However, many healthcare professionals, including nutritionists and dietitians, have limited genetic background and, therefore, they may lack understanding of gene–environment interactions (GxEs) studies. Even researchers deeply involved in nutrition studies, but with a focus elsewhere, can struggle to interpret, evaluate, and conduct GxE studies. There is an urgent need to study African populations that bear a heavy burden of NCDs, demonstrate unique genetic variability, and have cultural practices resulting in distinctive environmental exposures compared with Europeans or Americans, who are studied more. Although diverse and rapidly changing environments, as well as the high genetic variability of Africans and difference in linkage disequilibrium (ie, certain gene variants are inherited together more often than expected by chance), provide unparalleled potential to investigate the omics fields, only a small percentage of studies come from Africa. Furthermore, research evidence lags behind the practices of companies offering genetic testing for personalized medicine and nutrition. We need to generate more evidence on GxEs that also considers continental African populations to be able to prevent unethical practices and enable tailored treatments. This review aims to introduce nutrition professionals to genetics terms and valid methods to investigate GxEs and their challenges, and proposes ways to improve quality and reproducibility. The review also provides insight into the potential contributions of nutrigenetics and nutrigenomics to the healthcare sphere, addresses direct-to-consumer genetic testing, and concludes by offering insights into the field’s future, including advanced technologies like artificial intelligence and machine learning.
... Thus, rare variants can also be investigated using GWAS data through haplotype-based tests, allowing the use of data from much larger sample sizes than those of NGS. Several tests have been proposed to investigate the CDRV hypothesis through haplotype-based tests (Guo and Lin, 2009;Li et al., 2010;Li et al., 2011;Biswas and Lin, 2012;Lin et al., 2013), among which logistic Bayesian LASSO (LBL) is a well-studied and powerful method (Biswas and Lin, 2012;Biswas and Papachristou, 2014;Papachristou and Biswas, 2020). LBL was extended to incorporate gene-environment interactions (Zhang et al., 2017a;Zhang et al., 2017b;Papachristou and Biswas, 2020), data generated using complex sampling designs (Zhang et al., 2017a), and family data (Wang and Lin, 2014;Datta et al., 2018). ...
Article
Full-text available
In genetic association studies, the multivariate analysis of correlated phenotypes offers statistical and biological advantages compared to analyzing one phenotype at a time. The joint analysis utilizes additional information contained in the correlation and avoids multiple testing. It also provides an opportunity to investigate and understand shared genetic mechanisms of multiple phenotypes. Bivariate logistic Bayesian LASSO (LBL) was proposed earlier to detect rare haplotypes associated with two binary phenotypes or one binary and one continuous phenotype jointly. There is currently no haplotype association test available that can handle multiple continuous phenotypes. In this study, by employing the framework of bivariate LBL, we propose bivariate quantitative Bayesian LASSO (QBL) to detect rare haplotypes associated with two continuous phenotypes. Bivariate QBL removes unassociated haplotypes by regularizing the regression coefficients and utilizing a latent variable to model correlation between two phenotypes. We carry out extensive simulations to investigate the performance of bivariate QBL and compare it with that of a standard (univariate) haplotype association test, Haplo.score (applied twice to two phenotypes individually). Bivariate QBL performs better than Haplo.score in all simulations with varying degrees of power gain. We analyze Genetic Analysis Workshop 19 exome sequencing data on systolic and diastolic blood pressures and detect several rare haplotypes associated with the two phenotypes.
... Natural products and derivatives, with potential biological functions and special molecular structures, play an important role in today's drug market and have made important contributions to the design and improvement of drugs [15][16][17]. In this study, a series of structural biological and chemical methods, such as molecular docking and molecular dynamics simulation, were used to screen and identify compounds with potential inhibitory functions related to PDK1. ...
Preprint
Full-text available
Objective To screen ideal lead compounds with potential inhibition of 3-phosphoinositi-dependent protein kinase 1 (PDK1) from ZINC15 database, which is beneficial to drug design and improvement. Methods The Discovery Studio 4.5 computer-aided virtual screening technique was used to screen potential inhibitors of PDK1. Libdock was used for virtual screening and scoring of candidate compounds, ADME module was used for physical and chemical properties and toxicity analysis, and CDOCKER module was used for molecular docking analysis. The binding affinity of ligand-PDK1 was studied through molecular docking, and the stability of ligand-PDK1 in the natural environment was analyzed through molecular dynamics simulation. Results Two natural compounds ZINC00000157721 and ZINC000034189841 were screened from ZINC15 database. These two compounds have no CYP2D6 inhibition, easy to pass the blood-brain barrier, no hepatotoxicity, high binding affinity with PDK1, higher stability in the natural environment than positive drug BX-795, and stable existence. Conclusions The results show that ZINC00000157721 and ZINC000034189841 are ideal and safe lead compounds and have a potential inhibitory effect on PDK1. These compounds are safe candidates and may provide the basis and premise for the design and optimization of specific PDK1 inhibitors.
... Another category of rare variant association methods is haplotype-based tests. Even though less common than the collapsing tests, several haplotype-based methods have been proposed in the past few years [5][6][7][8][9][10]. The haplotype tests complement the collapsing tests and have certain advantages [12,13]. ...
... Our goal is to fill this gap, and, to this end, we consider five methodshaplo.glm [22], hapassoc [20,21], HapReg [18,19], Bayesian hierarchical generalized linear model (BhGLM) [9] and logistic Bayesian LASSO (LBL) [8,[24][25][26]. HapReg has two versions while LBL has three versions depending on whether gene-environment (G-E) independence assumption is made or not; we consider all versions in our comparison. ...
Article
Dissecting the genetic mechanism underlying a complex disease hinges on discovering gene-environment interactions (GXE). However, detecting GXE is a challenging problem especially when the genetic variants under study are rare. Haplotype-based tests have several advantages over the so-called collapsing tests for detecting rare variants as highlighted in recent literature. Thus, it is of practical interest to compare haplotype-based tests for detecting GXE including the recent ones developed specifically for rare haplotypes. We compare the following methods: haplo.glm, hapassoc, HapReg, Bayesian hierarchical generalized linear model (BhGLM) and logistic Bayesian LASSO (LBL). We simulate data under different types of association scenarios and levels of gene-environment dependence. We find that when the type I error rates are controlled to be the same for all methods, LBL is the most powerful method for detecting GXE. We applied the methods to a lung cancer data set, in particular, in region 15q25.1 as it has been suggested in the literature that it interacts with smoking to affect the lung cancer susceptibility and that it is associated with smoking behavior. LBL and BhGLM were able to detect a rare haplotype-smoking interaction in this region. We also analyzed the sequence data from the Dallas Heart Study, a population-based multi-ethnic study. Specifically, we considered haplotype blocks in the gene ANGPTL4 for association with trait serum triglyceride and used ethnicity as a covariate. Only LBL found interactions of haplotypes with race (Hispanic). Thus, in general, LBL seems to be the best method for detecting GXE among the ones we studied here. Nonetheless, it requires the most computation time.
... To overcome the critical barrier in interaction analysis for rare variants, instead of testing each pair of variants individually, group interaction tests that evaluate cumulative interaction effects of multiple genetic variants in a region or gene have recently been developed. Regression-based methods [2][3][4][5][6][7][8], haplotype-based methods [9][10][11][12][13][14][15], and machine learning-based methods [16][17][18][19][20] are proposed for epistasis analysis. ...
Article
Full-text available
To date, most genetic analyses of phenotypes have focused on analyzing single traits or analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power and improve our understanding of the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two genes in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large-scale simulations to calculate Type I error rates for testing interaction between two genes with multiple phenotypes and to compare the power with multivariate pairwise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate performance, the MFRG for epistasis analysis is applied to five phenotypes of exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 267 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has a much higher power to detect interaction than the interaction analysis of a single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.
... SNP-based methods test for all pairwise interactions between SNPs, while group-based methods detect interactions between groups of SNPs. Regression-based methods [2][3][4][5][6][7][8], haplotype-based methods [9][10][11][12][13][14][15], machine learning-based methods [16][17][18][19][20] are widely used for epistasis analysis. ...
Article
Full-text available
To date, most genetic analyses of phenotypes have focused on analyzing single traits or, analyzing each phenotype independently. However, joint epistasis analysis of multiple complementary traits will increase statistical power, and hold the key to understanding the complicated genetic structure of the complex diseases. Despite their importance in uncovering the genetic structure of complex traits, the statistical methods for identifying epistasis in multiple phenotypes remains fundamentally unexplored. To fill this gap, we formulate a test for interaction between two gens in multiple quantitative trait analysis as a multiple functional regression (MFRG) in which the genotype functions (genetic variant profiles) are defined as a function of the genomic position of the genetic variants. We use large scale simulations to calculate its type I error rates for testing interaction between two genes with multiple phenotypes and to compare its power with multivariate pair-wise interaction analysis and single trait interaction analysis by a single variate functional regression model. To further evaluate its performance, the MFRG for epistasis analysis is applied to five phenotypes and exome sequence data from the NHLBI Exome Sequencing Project (ESP) to detect pleiotropic epistasis. A total of 136 pairs of genes that formed a genetic interaction network showed significant evidence of epistasis influencing five traits. The results demonstrate that the joint interaction analysis of multiple phenotypes has much higher power to detect interaction than the interaction analysis of single trait and may open a new direction to fully uncovering the genetic structure of multiple phenotypes.
... In this article, our goal is to compare several recently proposed haplotype-based rare variant association methods of both types. These include regularized Generalized Linear Model (rGLM) [13], Logistic Bayesian LASSO (LBL) [14][15][16], Bayesian hierarchical GLM (BhGLM) [17], wei-SIMc-matching [10], Weighted Haplotype and Imputation-based tests (WHaIT) [9] and Haplotype Kernel Association Test (HKAT) [18]. We will compare these with more standard haplotype association methods, in particular, Hapassoc [19,20], Haplo.score ...
Article
Full-text available
Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods-Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types-individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method-Sequence Kernel Association Test (SKAT) and its two variants-SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.
... However, there is a continuing realization that rare haplotype variants resulting from common SNVs may also have an important role in understanding complex disease etiology. [2][3][4][5][6][7][8][9][10] The interest in detecting rare haplotype association with common diseases is further fueled by the recognition that rare haplotype may tag rare causal SNVs. [7][8][9][10] There are advantages pursuing rare haplotypes instead of rare SNVs. ...
... 11 For example, generalized linear model (GLM)-based methods [12][13][14] may encounter non-convergence in its expectation-maximization (EM) estimates when challenged with rare haplotypes. Among such new approaches, the majority use likelihood-based regularization methods (eg, Lasso 15 ) to weed out unassociated haplotypes [3][4][5]7,8 so that those that are associated with the disease, especially the rare ones, can be more precisely estimated for their effects on the trait. However, owing to the difficulty in evaluating the effect of the uncertainty of regularization parameters on assessing association, the Bayesian counterpart of Lasso has been proposed for studying rare haplotype association, 6,9,10 as well as the Bayesian hierarchical GLM approach. ...
... However, owing to the difficulty in evaluating the effect of the uncertainty of regularization parameters on assessing association, the Bayesian counterpart of Lasso has been proposed for studying rare haplotype association, 6,9,10 as well as the Bayesian hierarchical GLM approach. 5 Regardless of whether a method is likelihood based or Bayesian formulated, such a method relies on assuming an underlying model connecting the haplotypes to the disease, which, unfortunately, is unknown. Theoretically, it is possible to consider multiple hypothesized models, and then either choose the most likely one or perform model averaging, but such an approach will increase their computational intensity. ...
Article
Full-text available
Rare haplotypes may tag rare causal variants of common diseases; hence, detection of such rare haplotypes may also contribute to our understanding of complex disease etiology. Because rare haplotypes frequently result from common single-nucleotide polymorphisms (SNPs), focusing on rare haplotypes is much more economical compared with using rare single-nucleotide variants (SNVs) from sequencing, as SNPs are available and 'free' from already amassed genome-wide studies. Further, associated haplotypes may shed light on the underlying disease causal mechanism, a feat unmatched by SNV-based collapsing methods. In recent years, data mining approaches have been adapted to detect rare haplotype association. However, as they rely on an assumed underlying disease model and require the specification of a null haplotype, results can be erroneous if such assumptions are violated. In this paper, we present a haplotype association method based on Kullback-Leibler divergence (hapKL) for case-control samples. The idea is to compare haplotype frequencies for the cases versus the controls by computing symmetrical divergence measures. An important property of such measures is that both the frequencies and logarithms of the frequencies contribute in parallel, thus balancing the contributions from rare and common, and accommodating both deleterious and protective, haplotypes. A simulation study under various scenarios shows that hapKL has well-controlled type I error rates and good power compared with existing data mining methods. Application of hapKL to age-related macular degeneration (AMD) shows a strong association of the complement factor H (CFH) gene with AMD, identifying several individual rare haplotypes with strong signals.European Journal of Human Genetics advance online publication, 4 March 2015; doi:10.1038/ejhg.2015.25.
... Furthermore, some studies suggest that, if the epistatic variance is larger than the additive variance, more power can be achieved to detect SNPs by searching for epistasis between SNPs rather than evaluating only the main effects. 10 A variety of tools have been used to detect epistasis, such as regression, [11][12][13][14] Bayesian methods, [15][16][17][18][19][20] and artificial intelligence algorithms. [21][22][23][24][25][26][27] For higher order interactions, where regression methods are not suitable, several machine learning methods such as multifactor dimensionality reduction, 28 treebased methods, 25 and entropy-based methods 23,29 have been proposed, as they use classifiers and feature selection to reduce the computational burden. ...
Article
Full-text available
Epistasis helps to explain how multiple single-nucleotide polymorphisms (SNPs) interact to cause disease. A variety of tools have been developed to detect epistasis. In this article, we explore the strengths and weaknesses of an information theory approach for detecting epistasis and compare it to the logistic regression approach through simulations. We consider several scenarios to simulate the involvement of SNPs in an epistasis network with respect to linkage disequilibrium patterns among them and the presence or absence of main and interaction effects. We conclude that the information theory approach more efficiently detects interaction effects when main effects are absent, whereas, in general, the logistic regression approach is appropriate in all scenarios but results in higher false positives. We compute epistasis networks for SNPs in the FSD1L gene using a two-phase head and neck cancer genome-wide association study involving 2,185 cases and 4,507 controls to demonstrate the practical application of the methods.
... Other regularization methods, such as Bayesian adaptive LASSO and iterative adaptive LASSO [17], may also be adopted for haplotype-based analysis. Non-regularization-based methods have also been proposed to detect rare haplotype association using common SNP data in recent years [18][19][20]. ...
Article
In recent years, a myriad of new statistical methods have been proposed for detecting associations of rare single-nucleotide variants (SNVs) with common diseases. These methods can be generally classified as 'collapsing' or 'haplotyping' based. The former is the predominant class, composed of most of the rare variant association methods proposed to date. However, recent works have suggested that haplotyping-based methods may offer advantages and can even be more powerful than collapsing methods in certain situations. In this article, we review and compare collapsing- versus haplotyping-based methods/software in terms of both power and type I error. For collapsing methods, we consider three approaches: Combined Multivariate and Collapsing, Sequence Kernel Association Test and Family-Based Association Test (FBAT): the first two are population based and are among the most popular; the last test is family based, a modification from the popular FBAT to accommodate rare SNVs. For haplotyping-based methods, we include Logistic Bayesian Lasso (LBL) for population data and family-based LBL (famLBL) for family (trio) data. These two methods are selected, as they can be used to test association for specific rare and common haplotypes. Our results show that haplotype methods can be more powerful than collapsing methods if there are interacting SNVs leading to larger haplotype effects. Even if only common SNVs are genotyped, haplotype methods can still detect specific rare haplotypes that tag rare causal SNVs. As expected, family-based methods are robust, whereas population-based methods are susceptible, to population substructure. However, the population-based haplotype approach appears to have smaller inflation of type I error than its collapsing counterparts. © The Author 2015. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.