About
70
Publications
6,947
Reads
How we measure 'reads'
A 'read' is counted each time someone views a publication summary (such as the title, abstract, and list of authors), clicks on a figure, or views or downloads the full-text. Learn more
1,336
Citations
Publications
Publications (70)
Background: The -α 3.7I -thalassaemia deletion is very common throughout Africa because it protects against malaria. When undertaking studies to investigate human genetic adaptations to malaria or other diseases, it is important to account for any confounding effects of α-thalassaemia to rule out spurious associations.
Methods: In this study, we ha...
Background: The -α 3.7I -thalassaemia deletion is very common throughout Africa because it protects against malaria. When undertaking studies to investigate human genetic adaptations to malaria or other diseases, it is important to account for any confounding effects of α-thalassaemia to rule out spurious associations.
Methods: In this study we hav...
The malaria parasite Plasmodium falciparum invades human red blood cells via interactions between host and parasite surface proteins. By analyzing genome sequence data from human populations, including 1269 individuals from sub-Saharan Africa, we identify a diverse array of large copy number variants affecting the host invasion receptor genes GYPA...
Plasmodium falciparum invades human red blood cells by a series of interactions between host and parasite surface proteins. Here we analyse whole genome sequence data from worldwide human populations, including 765 new genomes from across sub-Saharan Africa, and identify a diverse array of large copy number variants affecting the host invasion rece...
Single SNP association test results with adjustment for additive effect of G6PD+202.DOI:
http://dx.doi.org/10.7554/eLife.15085.007
G6PDd score association test results.DOI:
http://dx.doi.org/10.7554/eLife.15085.015
(A) Summary of study designs of contributing partner studies to MalariaGEN Consortial Project 1 (CP1). (B) Genotyped sample distribution. (C) Summary of 65 SNPs selected for analysis and successfully genotyped. (D) G6PD+202 female association test results. (E) G6PD+202 male association test results. (F) G6PD+202 all individuals association test res...
(A) SNP selection across G6PD region for genotyping. (B) SpectroDESIGNER assay design file for 135 G6PD locus SNPs in four multiplexes. (C) SpectroDESIGNER assay design file for 107 G6PD locus SNPs in four multiplexes. (D) SpectroDESIGNER assay design file for 68 G6PD locus SNPs in three multiplexes.
DOI:
http://dx.doi.org/10.7554/eLife.15085.020
Single SNP association test results.DOI:
http://dx.doi.org/10.7554/eLife.15085.006
Glucose-6-phosphate dehydrogenase (G6PD) deficiency is believed to confer protection against Plasmodium falciparum malaria, but the precise nature of the protective effecthas proved difficult to define as G6PD deficiency has multiple allelic variants with different effects in males and females, and it has heterogeneous effects on the clinical outco...
The malaria parasite Plasmodium falciparum invades human red blood cells via interactions between host and parasite surface proteins. By analyzing genome sequence data from human populations, including 1269 individuals from sub-Saharan Africa, we identify a diverse array of large copy number variants affecting the host invasion receptor genes GYPA...
ELife digest
Our genomes contain a record of historical events. This is because when groups of people are separated for generations, the DNA sequence in the two groups’ genomes will change in different ways. Looking at the differences in the genomes of people from the same population can help researchers to understand and reconstruct the historical...
Similarity between two individuals in the combination of genetic markers along their chromosomes indicates shared ancestry and can be used to identify historical connections between different population groups due to admixture. We use a genome-wide, haplotype-based, analysis to characterise the structure of genetic diversity and gene-flow in a coll...
Understanding patterns of genetic diversity is a crucial component of medical research in Africa. Here we use haplotype-based population genetics inference to describe gene-flow and admixture in a collection of 48 African groups with a focus on the major populations of the sub-Sahara. Our analysis presents a framework for interpreting haplotype div...
The high prevalence of sickle haemoglobin in Africa shows that malaria has been a major force for human evolutionary selection, but surprisingly few other polymorphisms have been proven to confer resistance to malaria in large epidemiological studies. To address this problem, we conducted a multi-centre genome-wide association study (GWAS) of life-...
Many human genetic associations with resistance to malaria have been reported, but few have been reliably replicated. We collected data on 11,890 cases of severe malaria due to Plasmodium falciparum and 17,441 controls from 12 locations in Africa, Asia and Oceania. We tested 55 SNPs in 27 loci previously reported to associate with severe malaria. T...
Background
The vast majority of deaths in the Kilifi study area are not recorded through official systems of vital registration. As a result, few data are available regarding causes of death in this population.
Objective
To describe the causes of death (CODs) among residents of all ages within the Kilifi Health and Demographic Surveillance System...
Sickle cell disease (SCD) is common in many parts of sub-Saharan Africa (SSA), where it is associated with high early mortality. In the absence of newborn screening, most deaths among children with SCD go unrecognized and unrecorded. As a result, SCD does not receive the attention it deserves as a leading cause of death among children in SSA. In th...
Combining data from genome-wide association studies (GWAS) conducted at different locations, using genotype imputation and fixed-effects meta-analysis, has been a powerful approach for dissecting complex disease genetics in populations of European ancestry. Here we investigate the feasibility of applying the same approach in Africa, where genetic d...
Example of cluster plot from Malawi cohort with outlying sets of individuals.
(TIF)
Distribution of relatedness between most-related pairs.
(TIF)
Comparison of logistic regression (SNPTEST) and mixed model (MMM) P values.
(TIF)
SNPs showing highly divergent P values between logistic regression and mixed model scans.
(TIF)
Comparison of meta-analysis P values versus Bayes factors under the fixed-effect model.
(TIF)
Quantile-quantile plots of the region-based test in the three cohort and in the meta-analysis. The genomic control inflation factor is given in the title of the plots.
(TIF)
Manhattan plot showing –log10 P values (thresholded at 10) for additive, dominant, heterozygote, recessive, and general models, and additive model conditional on the genotype at the sickle locus rs334, across all imputed SNPs. Meta-analysis P values for all three cohorts and for the East African cohorts are also shown for additive, dominant, recess...
The distribution of ethnic groups in Kenyan samples that were imputed with higher or lower quality (as defined by the red line in Figure S12). The difference in the two distributions is highly significant (Fisher's exact test, P = 4×10−4), suggesting that ethnic differences contribute to the bimodal distribution of imputation quality seen in Figure...
–log10(P values) for test of association using the mixed model.
(TIF)
The distribution of imputation quality (measured by type2 r2) across imputed Kenyan samples. The red line is at r2 = 0.909, and is the minimum between the two peaks.
(TIF)
Pre-imputation individual QC.
(DOCX)
Top: signal of association in the HBB region after conditioning on the genotype at the known causal locus rs334. Bottom: signal of association in the ABO region after conditioning on the genotype at rs8176719.
(TIF)
Example output from the imputation quality control pipeline for the Kenya imputation. a) per-SNP certainty (mean maximum posterior genotype call); b) per-SNP accuracy (type2 r2); c) per-individual type2 r2, averaged across segments; d) per-segment heterozygous call accuracy (proportion of true heterozygous calls that are correctly imputed with high...
Population-specific PCA analysis of Kenyan samples.
(TIF)
a) Empirical distribution, across approximately 20,000 gene regions, of the maximum likelihood estimate of the eta parameter (see Text S2), for the region-based test. Overlaid (red line) is the assumed prior distribution under the alternative used to calculate Bayes factors in the region-based analysis. b) Scatter plot of the log10 combined Bayes F...
Post-imputation sample exclusions.
(DOCX)
Genomic Inflation factors (λ) for logistic regression and mixed-model scans.
(DOCX)
Enrichment of low region based test P values in three previously defined sets of regions. Each P value in the table results from a one-sided binomial test for an enrichment in the number of regions with empirical P value below the given threshold. The bottom row gives a summary of the distribution of the number of SNPs in each region. Note that the...
Manhattan plot showing –log10 P values (thresholded at 10) for additive, dominant, heterozygote, recessive, and general models, and additive model conditional on the genotype at the sickle locus rs334, across all non-excluded genotyped SNPs. Meta-analysis P values for all three cohorts and for the East African cohorts are also shown for additive, d...
Population-specific PCA analysis of Gambian samples.
(TIF)
Population-specific PCA analysis of Malawian samples.
(TIF)
Comparison of fixed, structured, correlated and independent-effect models at the ABO and HBB loci. The height of each bar represents the posterior probability that the corresponding model is true, under the assumption that one of the models is true.
(TIF)
Details on the 3 study sites and genotyping platforms.
(DOCX)
P values for correlation between the first 5 PCs and case/control status.
(DOCX)
Supplementary statistical details.
(PDF)
ROC curve showing empirical true positive rate (y-axis) against false positive rate (x-axis) for each method used to detect regional association (regional test with Fisher meta-analysis, regional test with Bayesian meta-analysis, best single-SNP frequentist meta-analysis in region, best single-SNP Bayes factor for each of the four choices of correl...
Regions showing most association in single-SNP and regional association test analyses.
(XLSX)
Malawi is one of the countries in the sub-Saharan Africa with high prevalence of HIV/AIDS. This paper ana- lyzes socio-demographic effects using estimates and projections by the United Nations Population Division. It compares estimates and projections for both short term (2005-2020) and also long term (1980-2050), with the reality of HIV/AIDS and w...
This chapter considers the problem of matching configurations of biological macromolecules when both alignment and superposition transformations are unknown. Alignment denotes correspondence – a bijection or mapping – between points in different structures according to some objectives or constraints. Superposition denotes rigid-body transformations...
One of the key ingredients in drug discovery is the derivation of conceptual templates called pharmacophores. A pharmacophore model characterizes the physicochemical properties common to all active molecules, called ligands, bound to a particular protein receptor, together with their relative spatial arrangement. Motivated by this important applica...
Large-scale studies of genomic variation could assist efforts to eliminate malaria. But there are scientific, ethical and practical challenges to carrying out such studies in developing countries, where the burden of disease is greatest. The Malaria Genomic Epidemiology Network (MalariaGEN) is now working to overcome these obstacles, using a consor...
We propose a simple procedure for generating virtual protein C(alpha) traces. One of the key ingredients of our method, to build a three-dimensional structure from a random sequence of amino acids, is to work directly on torsional angles of the chain which we sample from a von Mises distribution. With simple modeling of the hydrophobic effect in pr...
In conducting and reporting of medical research, there are some common pitfalls in using statistical methodology which may result in invalid inferences being made. This paper is aimed to highlight to inexperienced statisticians or non-statistician some of the common statistical pitfalls encountered when using statistics to interpret data in medical...
Large-scale studies of genomic variation could assist efforts to eliminate malaria. But there are scientific, ethical and practical challenges to carrying out such studies in developing countries, where the burden of disease is greatest. The Malaria Genomic Epidemiology Network (MalariaGEN) is now working to overcome these obstacles, using a consor...
This paper deals with the problem of estimating fracture planes, given only the data at borehole intersections with fractures.
We formulate an appropriate model for the problem and give a solution to fitting the planes using a Markov chain Monte Carlo
(MCMC) implementation. The basics of MCMC are presented, with particular emphasis given to reversi...
Case 1 Results. Results for alcohol dehydrogenase (1hdx_1) matching against its own SCOP family. Tables 1–2: Without amino acid property. Tables 3–4: With amino acid property
Case 4 Results. Results for alcohol dehydrogenase and FAD/NAD(P)-binding domain. Tables 1–5: Without physico-chemistry. Tables 5–10: With physico-chemistry.
Case 2 Results. Results for 17 – β hydroxysteroid dehydrogenase and family. Tables 1–5: Without amino acid property. Tables 6–10: With amino acid property.
Case 3 Results. Results for alcohol dehydrogenase (1hdx_1) and superfamily. Tables 1–14: Without physico-chemistry. Tables 14–28: With physico-chemistry.
Matching functional sites is a key problem for the understanding of protein function and evolution. The commonly used graph theoretic approach, and other related approaches, require adjustment of a matching distance threshold a priori according to the noise in atomic positions. This is difficult to pre-determine when matching sites related by varyi...
The paper deals with a stochastic stereologic problem of estimating fracture lines, given only the data at boreholes. We formulate an appropriate model. The problem is challenging since neither the lines (slope, intercept) are known, nor their number. We give an MCMC implementation where all the parameters are allowed to vary. We examine sensitivit...
The explosion in volume of protein structural information prior to any knowledge of protein biochemical function has made the characterisation of protein functional sites to be an area of huge interest. Structural similarity of functional sites from proteins with unknown function to those with known functions can be used to infer on the function of...
Protein structure simulations are important for understan ding and exploring properties of proteins and evaluating algorithms in bioinformatics. For example, computer-generated protein structures designed to mimic real a protein, decoys can be us ed to test the validity of a protein model. The model is considered correct only if is able to identify...
Thesis (Ph.D.) -- University of Leeds (Department of Statistics), 2006.