ArticlePDF Available

Population genetic structure of variable drug response

Authors:

Abstract and Figures

Geographic patterns of genetic variation, including variation at drug metabolizing enzyme (DME) loci and drug targets, indicate that geographic structuring of inter-individual variation in drug response may occur frequently. This raises two questions: how to represent human population genetic structure in the evaluation of drug safety and efficacy, and how to relate this structure to drug response. We address these by (i) inferring the genetic structure present in a heterogeneous sample and (ii) comparing the distribution of DME variants across the inferred genetic clusters of individuals. We find that commonly used ethnic labels are both insufficient and inaccurate representations of the inferred genetic clusters, and that drug-metabolizing profiles, defined by the distribution of DME variants, differ significantly among the clusters. We note, however, that the complexity of human demographic history means that there is no obvious natural clustering scheme, nor an obvious appropriate degree of resolution. Our comparison of drug-metabolizing profiles across the inferred clusters establishes a framework for assessing the appropriate level of resolution in relating genetic structure to drug response.
Content may be subject to copyright.
article
nature genetics •
volume 29 • november 2001
265
Population genetic structure of variable
drug response
James F. Wilson
1,2
, Michael E. Weale
3,4
, Alice C. Smith
1
, Fiona Gratrix
1
, Benjamin Fletcher
3
, Mark G.
Thomas
3
, Neil Bradman
3
& David B. Goldstein
1
Published online: 29 October 2001, DOI: 10.1038/ng761
Geographic patterns of genetic variation, including variation at drug metabolizing enzyme (DME) loci and drug
targets, indicate that geographic structuring of inter-individual variation in drug response may occur frequently.
This raises two questions: how to represent human population genetic structure in the evaluation of drug safety
and efficacy, and how to relate this structure to drug response. We address these by (i) inferring the genetic struc-
ture present in a heterogeneous sample and (ii) comparing the distribution of DME variants across the inferred
genetic clusters of individuals. We find that commonly used ethnic labels are both insufficient and inaccurate rep-
resentations of the inferred genetic clusters, and that drug-metabolizing profiles, defined by the distribution of
DME variants, differ significantly among the clusters. We note, however, that the complexity of human demo-
graphic history means that there is no obvious natural clustering scheme, nor an obvious appropriate degree of
resolution. Our comparison of drug-metabolizing profiles across the inferred clusters establishes a framework for
assessing the appropriate level of resolution in relating genetic structure to drug response.
1
Galton Laboratory, Department of Biology, University College London, London, UK.
2
Department of Zoology, University of Oxford, Oxford, UK.
3
The
Centre for Genetic Anthropology, Departments of Biology, University College London, London, UK.
4
Genostics Ltd, 28/30 Little Russell Street, London WC1A
2HN, UK. Correspondence should be addressed to D.B.G. (e-mail: d.goldstein@ucl.ac.uk).
Introduction
Many drugs that show therapeutic potential never reach the mar-
ket because of adverse reactions in some individuals, whereas
other drugs in common use are effective for only a fraction of the
population in which they are prescribed. This variation in drug
response depends on many factors, such as sex, age and the envi-
ronment, as well as genetic determinants. Since the 1950s, phar-
macogenetic studies have systematically identified allelic variants
at genes that influence drug response, including those of both
drug-metabolizing enzymes (DMEs)
1
and drug targets
2
, such as
the cytochrome P450 monooxygenase CYP2D6 (refs. 3,4) and
the N-acetyl transferase NAT2 (ref. 5) genes. Detailed functional
analysis of variants at genes such as these has clearly shown the
importance of genetic variation in drug responses. For example,
analysis of NAT2 alleles has identified amino acid–replacement
mutations that reduce activity and a noncoding mutation that
reduces translation, lowering the concentration of the enzyme
5
.
In the case of CYP2D6, common variants include a frameshift
leading to a truncated, nonfunctional protein and a splice-site
mutation resulting in the absence of the protein
3,4
. These and
other examples indicate that genetic tests might predict an indi-
vidual’s response to specific drugs, allowing medicines to be tai-
lored to specific genetic makeups. Because of the potential
commercial and clinical significance of such personalized medi-
cines, an understanding of the genetic role of variable drug
response is an important goal of biomedical research.
In addition to concerns surrounding individual variation in
drug response, the geographic distribution of certain variants has
highlighted the possible importance of average differences in
drug response across populations. Genetic polymorphisms in
DMEs, which probably contribute significantly to phenotypic
variation in drug response, all vary in frequency among popula-
tions
2
, some by as much as twelvefold
1
. For example, the well-
known poor-metabolizer phenotype of debrisoquine oxidation is
due to variant alleles of CYP2D6. Between 5% and 10% of Euro-
peans, but only 1% of Japanese, have loss-of-function variants
at this locus that affect the metabolism of more than 40 drugs,
including such commonly used agents as β-blockers, codeine and
tricyclic antidepressants. The CYP2D6 ultra-rapid metabolizer
alleles also vary in frequency, even within Europe, from 10% in
Northern Spain to 1–2% in Sweden
6
. Polymorphisms in DMEs
can lead to acute toxic responses and unwanted drug–drug inter-
actions or to therapeutic failure from augmented drug metabo-
lism (as in the case of CYP2D6 duplications)
1,7
.
These observations show that for some drugs, the tradeoffs
between efficacy and adverse drug reaction not only will differ
between individuals but also will show differences in average
effects across different populations
8
. Genetically structured pop-
ulations may be composed of two or more subpopulations with
distinct drug-reaction profiles and thus may be better considered
separately in some contexts. This raises the questions of the
appropriate way to infer human population genetic structure in
the context of the evaluation of drug safety and efficacy, and of
how to relate this inferred genetic structure to drug response. To
address this problem, we have used presumably neutral
microsatellite markers to infer genetic clusters for a heteroge-
neous population, such as may be used in drug trials large
enough to allow detection of both genetic and environmental
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
effects (for instance, Phase III trials). We compared the frequen-
cies of functionally significant alleles at DME loci across the
inferred clusters as an easily defined surrogate for drug response.
Using this approach, we (i) show that there is considerable scope
for population-genetic structuring in drug response in diverse
metropolitan populations, because of the variation they harbor
in DME allele frequency differences among identifiable genetic
clusters (ii) establish a framework for determining the appropri-
ate level of resolution (that is, the number of inferred clusters that
should be used) in relating this population-genetic structuring to
drug response and (iii) show that commonly used ethnic labels
(such as Black, Caucasian and Asian) are insufficient and inaccu-
rate descriptions of human genetic structure.
Results
We genotyped 16 chromosome 1 microsatellites
from the ABI prism panel 1 (an average of 17 cM
apart) and 23 X-linked microsatellites (2 cM
apart)
9
in each of eight populations: South
African Bantu speakers (46), Amharic- and
Oromo-speaking Ethiopians from Shewa and
Wollo provinces collected in Addis Ababa (48),
Ashkenazi Jews (48), Armenians (48), Norwe-
gian speakers from Oslo (47), Chinese from
Sichuan in southwestern China (39), Papua New
Guineans from Madang (48) and Afro-
Caribbeans collected in London (30).
Genetic structure
We used a model-based clustering method
implemented by the program STRUCTURE
10
to assign individuals to subclusters on the basis
of these genetic data, ignoring their actual
population affiliations. This mimics a scenario in which there
is cryptic population structure, or no information as to the
ethnic origin of the individuals. Briefly, the model imple-
mented in STRUCTURE assumes K clusters, each character-
ized by a set of allele frequencies at each locus; the admixture
model then estimates the proportion of each individual’s
genome having ancestry in each cluster. We estimated Pr(X|K),
where X represents the data, using a model allowing admix-
ture, for K between 1 and 6. From this and a uniform prior on
K between 1 and 6, we estimated Pr(K|X) using Bayess theo-
rem (Table 1)
10
. Virtually all of the posterior probability den-
sity is on K=4.
The apportionment of individuals (the average per-individual
proportion of ancestry) from each of the eight populations into the
four STRUCTURE-defined clusters (Table 2) broadly corresponds
to four geographical areas: Western Eurasia, Sub-Saharan Africa,
China and New Guinea. Notably, 62% of the Ethiopians fall in the
first cluster, which encompasses the majority of the Jews, Norwe-
gians and Armenians, indicating that placement of these individu-
als in a ‘Black’ cluster would be an inaccurate reflection of the
genetic structure. Only 24% of the Ethiopians are placed in the
cluster with the Bantu and most of the Afro-Caribbeans; however,
article
266 nature genetics •
volume 29 • november 2001
Table 1 • Inferring the number of clusters
K ln Pr(X|K) Pr(K|X)
1 –33680.97 0
2 –32650.80 0
3 –32046.80 0
4 –31943.23 1.000
5 –31972.33 0
6 –31987.10 0
Fig. 1 Allele frequencies at each DME gene in the STRUC-
TURE-defined clusters. In all but the last two, black indi-
cates wildtype and white, mutant; for CYP2D6, all mutant
alleles are pooled as white, and for NAT2 both tested
mutant alleles (*5 and *6) are pooled as white.
Cytochrome P450 1A2 (CYP1A2) metabolizes several drugs
and carcinogens, including the analgesic acetaminophen
(Tylenol) and probably antipsychotic drugs
18
. CYP2C19
metabolizes diazepam, barbiturates and antidepressants,
and a polymorphic variant is responsible for the classical
mephenytoin poor-metabolizer phenotype
19
. The classical
debrisoquine poor-metabolizer phenotype is due to a vari-
ant of CYP2D6
7
, and NAT2 is responsible for the classical
isoniazid polymorphism
5
. NAD(P):quinone oxidoreductase
(DIA4) converts quinones to stable hydroxyquinones and
bioactivates antitumor quinones and nitrobenzenes
15
.
Glutathione-S-transferase M1 (GSTM1) conjugates various
electrophilic compounds, including potent environmental
carcinogens such as aflatoxin B
1
epoxides
1
. The two NAT2
polymorphisms we genotyped both result in slow acetylator alleles which lead to increased risks of drug toxicity and of certain cancers
1,5
. Of the CYP2D6 alleles
we assayed, CYP2D6*1 is wildtype, *3 and *4 have no activity (which can lead to an acute toxic response to some drugs) and *2, *9 and *10 have reduced activ-
ity
17,20
. The CYP1A2 variant genotyped leads to increased enzyme inducibility in smokers
21
. We genotyped the major polymorphism in CYP2C19 responsible for
the mephenytoin poor-metabolizer trait. After the administration of various drugs, this variant can lead to bone marrow toxicity, fatal blood dyscrasias and other
adverse responses
1
. Increased susceptibilities to various cancers are associated with the deletion polymorphism in GSTM1 genotyped here, dramatically so for
smokers
1,14
. The mutation in DIA4 leads to a complete absence of the protein and thus loss of protection against the toxic and carcinogenic effects of quinones
15
.
Frequencies are shown for groupings corresponding to those shown in Table 2.
34%
66%
31%
69%
40%
60%
41%
59%
53%
47%
26%
74%
91%
9%
78%
22%
47%
53%
47%
53%
83%
17%
63%
37%
89%
11%
61%
39%
69%
31%
55%
45%
54%
46%
67%
33%
73%
27%
75%
25%
81%
19%
47%
53%
30%
70%
58%
42%
ABCD
CYP1A2
CYP2D6
DIA4
NAT2
GSTM1
CYP2C19
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
article
nature genetics •
volume 29 • november 2001
267
21% of the Afro-Caribbeans are placed in a cluster with the West
Eurasians (presumably reflecting genetic exchange with Euro-
peans). Finally, China and New Guinea are placed almost entirely
in separate clusters, indicating that the ethnic label ‘Asian’ is also an
inaccurate description of population structure.
Consideration of only the X-linked microsatellites for the pur-
poses of clustering supports K=3 with a clustering very similar to
that for the entire dataset, except that the Chinese and New
Guinean clusters are combined into one. When only the chromo-
some 1 microsatellites are used, the clustering is essentially the
same as for the whole dataset. This discrepancy may be explained
by one of two factors: (i) a lack of resolution in the X chromosome
microsatellites or (ii) a biological factor such as the different num-
ber of X chromosomes and autosomes carried by males and
females. To test these hypotheses, we carried out structure runs on
the chromosome 1 data using an amount of information equal to
that available from the X chromosome (22 alleles). The chromo-
some 1 microsatellites continued to support K=4, indicating that
a lack of resolution in the X chromosome microsatellites may not
have been the explanation. Perhaps, because the X chromosome
spends more time in the female germline than does chromosome
1 and because females have a higher migration rate than males
11
,
the X-linked loci have less genetic structure. Smaller random sub-
sets of the loci support a variety of values for K and do not agree
on the clustering scheme (data not shown). This is probably
because there are no natural clusters, as there has not been a his-
tory of bifurcation in human populations. Our results indicate
that a reasonably high number of loci should be used to obtain
consistency in clustering; one approach would be to use one
marker from each chromosome arm. All of the analyses we pre-
sent use the full dataset, resulting in four clusters (Table 2).
Drug-metabolizing enzymes
Our selection of DMEs includes representatives of both phase I
(oxidation or reduction) and phase II (conjugation) drug metab-
olism. We included three enzymes of the phase I cytochrome
P450 family: CYP1A2, CYP2C19 and CYP2D6. We also included
three conjugating or phase II metabolism enzymes: NAT2,
NAD(P):quinone oxidoreductase (DIA4) and glutathione-S-
transferase M1 (GSTM1). We determined allele frequencies at 11
variants in the genes encoding these six DMEs, all of which are
known to be functionally significant (Fig. 1).
There are notable differences in the allele frequencies of DME-
encoding genes between the genetically identified clusters (Fig. 1)
for five of six reported loci. To assess differentiation across clus-
ters, we counted allele frequencies in each of the clusters and cal-
culated χ
2
; we also tested for differences in allele frequencies using
logistic regression. Using both methods, and correcting for multi-
ple comparisons, the allele frequency distributions are signifi-
cantly different for four of the six loci (significant for NAT2,
CYP2C19, DIA4 and CYP2D6). The pattern is particularly striking
at CYP2C19, where the frequency of the mutant allele (the
mephenytoin polymorphism) in cluster B is more than four times
that of cluster A (P<0.0001). We also observed extreme differenti-
ation between clusters B and D for DIA4, for which the frequency
of the mutant allele (which provides no protection against the
toxic effects of quinones) differs by almost five-fold (P<0.0001).
This is a notable difference, as clusters B and D would be com-
bined as ‘Asian’ in current drug evaluation using ethnic labels.
NAT2 also shows significant differentiation between these two
clusters, as well as among the others. We observed strong to mod-
est differences in allele frequencies for the other DME genes
between at least two pairs of the clusters in each case. To further
explore cluster differentiation we counted the number of loci for
which there are significant allele frequency differences (using χ
2
)
for each of the pairs of clusters. Without correcting for multiple
comparisons, this number varied from 2 (of 6 loci) for B versus D,
to 5 (of 6) for B versus C. Given the important differences in drug
response determined by these variants, the scope for genetic struc-
turing in drug response clearly is high. For some drugs, therefore,
the trade-off between therapeutic response and adverse drug reac-
tions will differ between the clusters identified here, making this
kind of genetic analysis important in checking for such effects in
any phase III clinical trial.
We compared the predictive value of the genetic clusters to that
of commonly used ethnic labels by counting the DME allele fre-
quencies in the grouping resulting from those labels: Caucasian
42%
58%
32%
68%
33%
67%
67%
33%
42%
58%
78%
22%
79%
21%
30%
70%
51%
49%
26%
74%
92%
8%
79%
21%
51%
49%
48%
52%
85%
15%
65%
35%
68%
32%
63%
37%
ABC
CYP1A2
CYP2D6
DIA4
NAT2
GSTM1
CYP2C19
Fig. 2 Allele frequencies at each of the DME variants in the ethnically labeled
groups. See Fig. 1 legend for details. A, Bantu, Ethiopian and Afro-Caribbean
frequencies; B, those for Norwegians, Ashkenazi Jews and Armenians; C, those
for Chinese and New Guineans.
Table 2 • Proportion of membership of each sampled
population in STRUCTURE-defined subclusters
Population A B C D
Bantu 0.04 0.02 0.93 0.02
Ashkenazi 0.96 0.01 0.01 0.02
Ethiopia 0.62 0.08 0.24 0.06
Norway 0.96 0.02 0.01 0.01
Armenia 0.90 0.04 0.02 0.05
China 0.09 0.05 0.01 0.84
Papua New Guinea 0.02 0.95 0.01 0.02
Afro-Caribbean 0.21 0.03 0.73 0.03
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
article
268 nature genetics •
volume 29 • november 2001
(Norwegian, Ashkenazi Jew, Armenian), Black (Bantu,
Ethiopian, Afro-Caribbean) and Asian (Chinese, New Guinean;
Fig. 2). Notably, for DIA4, the large frequency difference between
clusters B and D (driven by the differentiation between China
and New Guinea) is averaged when both populations are
lumped; the mutant allele frequency is thus only one and a half
times as high as that in the other two groups. Indeed, the overall
differentiation for the ethnic groups is not significant after cor-
rection for multiple comparisons. Note that in no case did we
observe the reverse in our data: that is, the ethnic labels never
show sharp differentiation that is not observed in the clusters. In
addition, only in the case of CYP2D6 are the allele frequency dif-
ferentials as high as they are for genetically defined clusters.
Although there is some DME allele frequency differentiation
between ethnically labeled groups, in most cases it is less than
that seen for the genetic clusters. To confirm this, we fitted logis-
tic regression models to the allele data using membership in the
genetic clusters as the explanatory variables, and tested for the
increase in goodness of fit obtained by adding the ethnic labels as
explanatory variables. We then compared this to the increase in
goodness of fit obtained by adding the genetic cluster informa-
tion to the ethnic group information. Of those DME loci (NAT2,
CYP2C19, DIA4 and CYP2D6) that showed significant differenti-
ation in either the clusters or the ethnic groups, in three of four
cases, adding genetic cluster information to ethnic labels was
more significant than adding ethnic labels to genetic clusters. For
CYP2D6, the opposite was true.
Multilocus interactions
Undesirable drug reactions or interactions, as well as environ-
mental sensitivities, may also be due to the existence of variants at
two (or more) loci. An example of this may be the case of the
increased susceptibility to colorectal cancer in individuals with a
rapid/rapid metabolizer phenotype at CYP1A2 and NAT2, espe-
cially for those who prefer well-cooked meat
12
. It is important to
consider not only differences in allele frequency between the
inferred clusters but also differences in frequency for multilocus
genotypes. There are large frequency differentials between the
clusters we have identified for multilocus genotypes, which may
give rise to phenotypic combinations like this; in fact, the fre-
quency of the combination CYP1A2-A/A, NAT2*4/– observed in
cluster B (47%) is more than twice that seen in clusters A (19%)
or C (22%; P<0.01 for overall differentiation). When such inter-
actions are important, they may be apparent in the genetic analy-
sis described here, from the distribution of drug response across
inferred clusters.
Discussion
By carrying out the clustering analysis with the number of clus-
ters set to different values, we can compare the extent of differen-
tiation among the clusters to assess the appropriate level of
resolution. In the context of a Phase III trial, the appropriate
benchmark would reflect the amount of the total variation in
drug response explained by the genetic clusters. A surrogate test
would be to carry out exact tests of differentiation
13
on relevant
functional polymorphisms, stopping when an increase in the
number of clusters does not appreciably increase the degree of
differentiation. The clustering properties of STRUCTURE, how-
ever, can be unstable across different values of K, which compli-
cates the implementation of such an analysis.
It is well known that there are inter-ethnic differences in DME
allele frequencies and thus in drug response. Our focus here, how-
ever, has been to assess the scope for average difference in drug
response across genetically inferred clusters. Not only can these
clusters be derived in the absence of knowledge about ethnicity
(or geographic origin), but they are also more informative than
commonly used ethnic labels. Because of the potential clinical sig-
nificance of average differences in drug response, we conclude that
it is not only feasible but a clinical priority to assess genetic struc-
ture as a routine part of drug evaluation.
When the most important genes influencing response to a par-
ticular drug or group of drugs have been identified, it should be
possible to personalize medicine on the basis of an individual’s
genotype, assuming that routine individual genotyping is com-
mercially and technically feasible. Short of such detailed knowl-
edge, however, it is important to assess whether drugs work
similarly in different genetic subgroups. The appropriate level of
clustering may be evaluated empirically by assessing the amount
of variation in response explained by the inferred clusters. In
addition, we have shown that the common ethnic labels currently
available to regulatory authorities show a poor correspondence
with genetically inferred clusters.
Analysis of population structure in biomedical research
Our implementation of STRUCTURE is primarily meant to
show that familiar ethnic labels are not accurate guides to genetic
structure. We have not attempted to provide a definitive descrip-
tion of human population structure. The results of STRUCTURE
can, in fact, be quite difficult to interpret. Notably, statistical dif-
ficulties may arise when assessing convergence, and the assess-
ment of the appropriate value of K is currently not rigorous
10
.
These and other issues can lead to anomalous outcomes; for
example, an implausible value of K may be supported where one
of the clusters is more or less empty. In addition, results may vary
for biological reasons, such as when markers are affected differ-
entially by forces acting on the genome, such as gene flow.
Detailed analysis of STRUCTURE output and other clustering
schemes, using a standard battery of markers in a global sample
of human populations, will be needed to arrive at a canonical
clustering scheme for use in biomedical research. Such an evalua-
tion would need to be geographically exhaustive and to include a
sufficient number of markers throughout the genome to ensure
that the resulting clustering scheme is robust; consistent results
should be obtained with different marker and sample sets.
Methods
Microsatellite markers and structure inference. All subjects were unrelat-
ed males. We genotyped
9
the following X-linked microsatellites: DXS984,
996, 1036, 1053, 1062, 1203, 1204, 1205, 1206, 1211, 1212, 1220, 1223, 7103,
8014, 8061, 8068, 8073, 8085, 8086, 8087 and 8099. We genotyped the fol-
lowing chromosome 1 microsatellites: D1S196, 206, 213, 249, 255, 450, 484,
2667, 2726, 2785, 2797, 2800, 2836, 2842, 2878 and 2890. The chromosome 1
markers form part of the ABI Prism linkage mapping panel 1 and were
amplified according to the manufacturer’s instructions. We assigned indi-
viduals into clusters using the admixture model in the program STRUC-
TURE
10
, with no correlation in allele frequencies among populations and a
burn-in time of at least 1 million steps, followed by another 1 million steps
of the Markov Chain for data collection. We carried out multiple runs for
each set of conditions to be sure that the chain had converged; in total, we
carried out more than 500 runs.
DME genotyping. We sequenced the intronic C734A transversion in CYP1A2
and two SNPs in NAT2: C481T, defining allele *5 (in complete allelic associa-
tion with Ile113Thr) and G590A (giving Arg197Gln), defining allele *6. We
classified all other alleles as *4, and combined the two mutant allele frequen-
cies for the purpose of binary analysis. We genotyped the deletion allele of
glutathione-S-transferase M1 (GSTM1) using GSTM4 amplification as an
internal control
14
. We genotyped the C191T transition (giving Pro187Ser) in
DIA4 (ref. 15) and the G117A transition (leading to a truncated protein) in
CYP2C19 (ref. 16) using polymerase chain reaction–restriction fragment
length polymorphism (PCR–RFLP). We labeled GSTM1 and RFLP ampli-
cons fluorescently and determined sizes on an ABI 3100 automated sequencer
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
(Applied Biosystems). We typed CYP2D6 SNPs by gene-specific PCR, fol-
lowed by nested multiplex reamplification–RFLP detection of the following
‘key’ mutations
17
: C100T (Pro34Ser; alleles *10 and *4), G1846A (splicing
defect; allele *4), A2549del (frameshift; allele *3), 2613–2615AGAdel
(Lys281del; allele *9) and C2850T (Arg296Cys; allele *2). All other chromo-
somes were denoted *1 (thus, this category includes some non-wildtype alle-
les). For the binary analyses, we considered CYP2D6*1 as having normal
activity and all other alleles as having reduced activity. We labeled CYP2D6
amplicons using fluorescent primers and sized them on an ABI 377 automat-
ed sequencer (Applied Biosystems; genotyping details available from B.F.). In
the case of GSTM1, the assay does not allow differentiation between homozy-
gous and heterozygous presence of the nondeletion allele. For this case, we
carried out calculations on genotype frequencies and homozygous deletion
versus homozygous or heterozygous for the nondeletion allele. We estimated
the accuracy of our genotyping by retesting a number of samples from each
population. Error rates varied from 0 to 7% for the DME SNPs and from 0 to
5% for the microsatellites.
DME differentiation across clusters. We calculated DME allele frequen-
cies in the clusters by distributing an individual’s genotype among the clus-
ters, according to the proportion of ancestry that the individual had in each
cluster, as determined by STRUCTURE output. When individuals were
placed in the cluster in which they had the most ancestry, the results
changed very little (data not shown). To meet the assumption of a multino-
mial distribution, we evaluated χ
2
tables after placing individuals in the
clusters in which they had most ancestry.
Acknowledgments
D.B.G. is a Royal Society/Wolfson Research Merit Award holder.
Received 30 July; accepted 4 October 2001.
1. Weber, W.W. Pharmacogenetics (Oxford University Press, Oxford, 1997).
2. Evans, W.E. & Relling, M.V. Pharmacogenomics: translating functional genomics
into rational therapeutics. Science 286, 487–491 (1999).
3. Gough, A.C. et al. Identification of the primary gene defect at the cytochrome
P450 CYP2D locus. Nature 347, 773–776 (1990).
4. Kagimoto, M., Heim, M., Kagimoto, K., Zeugin, T. & Meyer, U.A. Multiple
mutations of the human cytochrome P450IID6 gene (CYP2D6) in poor
metabolizers of debrisoquine. Study of the functional significance of individual
mutations by expression of chimeric genes. J. Biol. Chem. 265, 17209–17214 (1990).
5. Blum, M., Demierre, A., Grant, D.M., Heim, M. & Meyer, U.A. Molecular
mechanism of slow acetylation of drugs and carcinogens in humans. Proc. Natl
Acad. Sci. USA 88, 5237–5241 (1991).
6. Bernal, M.L. et al. Ten percent of North Spanish individuals carry duplicated or
triplicated CYP2D6 genes associated with ultrarapid metabolism of debrisoquine.
Pharmacogenetics 9, 657–660 (1999).
7. Meyer, U.A. & Zanger, U.M. Molecular mechanisms of genetic polymorphisms of
drug metabolism. Annu. Rev. Pharmacol. Toxicol. 37, 269–296 (1997).
8. International Conference on Harmonisation. Ethnic Factors in the Acceptability of
Foreign Clinical Data. (International Conference on Harmonisation, 1998).
9. Wilson, J.F. & Goldstein, D.B. Consistent long-range linkage disequilibrium
generated by admixture in a Bantu–Semitic hybrid population. Am. J. Hum.
Genet. 67, 926–935 (2000).
10. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure
using multilocus genotype data. Genetics 155, 945–959 (2000).
11. Seielstad, M.T., Minch, E. & Cavalli-Sforza, L.L. Genetic evidence for a higher
female migration rate in humans. Nature Genet. 20, 278–280 (1998).
12. Kohlmeier, L., DeMarini, D. & Piegorsch, W. Gene-nutrient interactions in
nutritional epidemiology. in Design Concepts in Nutritional Epidemiology (eds.
Margetts, B. & Nelson, M.) 312–337 (Oxford University Press, Oxford, 1997).
13. Raymond, M. & Rousset, F. An exact test for population differentiation. Evolution
49, 1280–1283 (1995).
14. Krajinovic, M., Labuda, D., Richer, C., Karimi, S. & Sinnett, D. Susceptibility to
childhood acute lymphoblastic leukemia: influence of CYP1A1, CYP2D6, GSTM1,
and GSTT1 genetic polymorphisms. Blood 93, 1496–1501 (1999).
15. Gaedigk, A. et al. NAD(P)H:quinone oxidoreductase: polymorphisms and allele
frequencies in Caucasian, Chinese and Canadian Native Indian and Inuit
populations. Pharmacogenetics 8, 305–313 (1998).
16. Goldstein, J.A. & Blaisdell, J. Genetic tests which identify the principal defects in
CYP2C19 responsible for the polymorphism in mephenytoin metabolism.
Methods Enzymol. 272, 210–218 (1996).
17. Gaedigk, A. et al. Optimization of cytochrome P4502D6 (CYP2D6) phenotype
assignment using a genotyping algorithm based on allele frequency data.
Pharmacogenetics 9, 669–682 (1999).
18. Basile, V.S. et al. A functional polymorphism of the cytochrome P450 1A2
(CYP1A2) gene: association with tardive dyskinesia in schizophrenia. Mol.
Psychiatry 5, 410–417 (2000).
19. Ferguson, R.J. et al. A new genetic defect in human CYP2C19: mutation of the
initiation codon is responsible for poor metabolism of S-mephenytoin. J.
Pharmacol. Exp. Ther. 284, 356–361 (1998).
20. Daly, A.K. et al. Nomenclature for human CYP2D6 alleles. Pharmacogenetics 6,
193–201 (1996).
21. Sachse, C., Brockmoller, J., Bauer, S. & Roots, I. Functional significance of a CA
polymorphism in intron 1 of the cytochrome P450 CYP1A2 gene tested with
caffeine. Br. J. Clin. Pharmacol. 47, 445–449 (1999).
article
nature genetics •
volume 29 • november 2001
269
© 2001 Nature Publishing Group http://genetics.nature.com
© 2001 Nature Publishing Group http://genetics.nature.com
... Genetic variants in drug metabolising enzymes, transporters, receptors, and other proteins involved in drug response vary widely across different populations [11]. This is where targeted genotyping falls short, as it may miss population-specific variants which in turn affects the specificity and sensitivity of genotyping assays. ...
Article
Full-text available
Pharmacogenomic testing may be used to improve treatment outcomes and reduce the frequency of adverse drug reactions (ADRs). Population specific, targeted pharmacogenetics (PGx) panel-based testing methods enable sensitive, accurate and economical implementation of precision medicine. We evaluated the analytical performance of the GenoPharm ® custom open array platform which evaluates 120 SNPs across 46 pharmacogenes. Using commercially available reference samples (Coriell Biorepository) and in-house extracted DNA, we assessed accuracy, precision, and linearity of GenoPharm ® . We then used GenoPharm ® on 218 samples from two Southern African black populations and determined allele and genotype frequencies for selected actionable variants. Across all assays, the GenoPharm ® panel demonstrated 99.5% concordance with the Coriell reference samples, with 98.9% reproducibility. We observed high frequencies of key genetic variants in people of African ancestry: CYP2B6*6 (0.35), CYP2C9*8, *11 (0.13, 0.03), CYP2D6*17 (0.21) and *29 (0.11). GenoPharm ® open array is therefore an accurate, reproducible and sensitive test that can be used for clinical pharmacogenetic testing and is inclusive of variants specific to the people of African ancestry.
... 03 response (Xie et al., 2001;Murphy et al., 2013;Shah, 2015), research suggests that drug-metabolising enzymes have evolved due to our interaction with our environment. Diet then, is likely as important as genetic ancestry when understanding a person's response to medication (Wilson et al., 2001). For instance, foods, including fruits, alcohol, teas, and herbs are commonly known to induce or inhibit the activity of drug metabolising enzymes, such as CYPs (Fujita, 2004). ...
Article
Full-text available
Pharmacogenetics (PGx) is the study and application of how interindividual differences in our genomes can influence drug responses. By evaluating individuals’ genetic variability in genes related to drug metabolism, PGx testing has the capabilities to individualise primary care and build a safer drug prescription model than the current “one-size-fits-all” approach. In particular, the use of PGx testing in psychiatry has shown promising evidence in improving drug efficacy as well as reducing toxicity and adverse drug reactions. Despite randomised controlled trials demonstrating an evidence base for its use, there are still numerous barriers impeding its implementation. This review paper will discuss the management of mental health conditions with PGx-guided treatment with a strong focus on youth mental illness. PGx testing in clinical practice, the concerns for its implementation in youth psychiatry, and some of the barriers inhibiting its integration in clinical healthcare will also be discussed. Overall, this paper provides a comprehensive review of the current state of knowledge and application for PGx in psychiatry and summarises the capabilities of genetic information to personalising medicine for the treatment of mental ill-health in youth.
... However, we lack knowledge of the interactions between these drugs and the genes in the context of obesity. Furthermore, adverse drug reactions may differ among diverse populations [79]. As a follow-up to our study, we proposed the integration of genome editing techniques, such as CRISPR-Cas9 [80], to validate our prioritized key obesity genes in animal experiments. ...
Article
Full-text available
Objectives Genome-wide association studies (GWAS) have successfully revealed numerous susceptibility loci for obesity. However, identifying the causal genes, pathways, and tissues/cell types responsible for these associations remains a challenge, and standardized analysis workflows are lacking. Additionally, due to limited treatment options for obesity, there is a need for the development of new pharmacological therapies. This study aimed to address these issues by performing step-wise utilization of knowledgebase for gene prioritization and assessing the potential relevance of key obesity genes as therapeutic targets. Methods and results First, we generated a list of 28,787 obesity-associated SNPs from the publicly available GWAS dataset (approximately 800,000 individuals in the GIANT meta-analysis). Then, we prioritized 1372 genes with significant in silico evidence against genomic and transcriptomic data, including transcriptionally regulated genes in the brain from transcriptome-wide association studies. In further narrowing down the gene list, we selected key genes, which we found to be useful for the discovery of potential drug seeds as demonstrated in lipid GWAS separately. We thus identified 74 key genes for obesity, which are highly interconnected and enriched in several biological processes that contribute to obesity, including energy expenditure and homeostasis. Of 74 key genes, 37 had not been reported for the pathophysiology of obesity. Finally, by drug-gene interaction analysis, we detected 23 (of 74) key genes that are potential targets for 78 approved and marketed drugs. Conclusions Our results provide valuable insights into new treatment options for obesity through a data-driven approach that integrates multiple up-to-date knowledgebases.
... Population genetic studies have uncovered race-related genetic variations, and researchers have constructed ancestral-tree diagrams supporting the grouping within races (Bowcock et al. 1991, Bowcock et al. 1994, Calafell et al. 1998. This is also verified at the genetic level as the delineation of genetic clusters is found to be associated with the racial categories (Mountain et al. 1997, Stephens et al. 2001, Wilson et al. 2001, Rosenberg et al. 2002. Since there are biological differences across races, racial information may legitimately be useful for clinical purposes and is often included in MDM. ...
Preprint
Electronic health records (EHRs) serve as an essential data source for the envisioned artificial intelligence (AI)-driven transformation in healthcare. However, clinician biases reflected in EHR notes can lead to AI models inheriting and amplifying these biases, perpetuating health disparities. This study investigates the impact of stigmatizing language (SL) in EHR notes on mortality prediction using a Transformer-based deep learning model and explainable AI (XAI) techniques. Our findings demonstrate that SL written by clinicians adversely affects AI performance, particularly so for black patients, highlighting SL as a source of racial disparity in AI model development. To explore an operationally efficient way to mitigate SL's impact, we investigate patterns in the generation of SL through a clinicians' collaborative network, identifying central clinicians as having a stronger impact on racial disparity in the AI model. We find that removing SL written by central clinicians is a more efficient bias reduction strategy than eliminating all SL in the entire corpus of data. This study provides actionable insights for responsible AI development and contributes to understanding clinician behavior and EHR note writing in healthcare.
Article
Full-text available
In this paper, I plan to show that the use of a specific population concept—Millstein’s Causal Interactionist Population Concept (CIPC)—has interesting and counter-intuitive ramifications for discussions of the reality of biological race in human beings. These peculiar ramifications apply to human beings writ large and to individuals. While this in and of itself may not be problematic, I plan to show that the ramifications that follow from applying Millstein’s CIPC to human beings complicates specific biological racial realist accounts—naïve or otherwise. I conclude with the notion that even if biological races do exist—by fulfilling all of the criteria needed for Millstein’s population concept (which, given particular worries raised by Gannett (Synthese 177:363–385, 2010), and Winther and Kaplan (Theoria 60:54–80, 2013) may not)—the lower-bound limit for the scope of biological racial realism is at the level of populations, and as such they cannot say anything about whether or not individual organisms themselves have races.
Chapter
Translational research makes use of animal species based on various scientific parameters to ascertain their phylogenetic, physiological, and anatomical closeness to the targeted species (most often humans) and to use any or all of these outcome measures to forecast and determine the effectiveness of potential therapies. When the animals aid as a tool to probe into an unknown biological process, it is termed as an exploratory model. When used to explain a hypothesis, animals are classified as explanatory models and when efficacy and safety of a drug or device are predicted using outcomes from animals, they are termed as predictive models. Based on the methodology followed to produce the models, animal models are classified as naïve or induced. In naive models, there can be spontaneous disease models that are mutants, and when selectively bred, become valuable tools for translational research. Induced models are created using a wide array of techniques starting from pharmacologic/chemical induced diseases, nutritionally induced diseases, surgical interventions, or molecular techniques like CRISPR-Cas9 giving birth to transgenics and knockouts. The basic and advanced concepts and approaches used in applied laboratory animal science to develop and validate animal models will be focused upon in this chapter.
Thesis
Full-text available
Genetic sequencing has provided a glimpse of the potential of personalised (genotype-based) medicine. Driven by these potential benefits, both to the individual and for epidemiological studies, health management is evolving from the delivery of generalised medicine to a model based on personalised medicine. This change introduces a substantial shift in the way healthcare operates, allowing healthcare to evolve from a reactive incident model that treats illness and/or disease as it arises, into a proactive intervention model focused on the whole-of-life health and wellness of the individual through prediction and prevention. In parallel, the healthcare environment is changing from that of passive patients dealing reactively with emergent health events, to one of involved and active health consumers aiming to maximise their long-term wellness. The primary tools in the pursuit of this new objective include: a longitudinal health record; an individual’s genetic profile; lifestyle information; and that combined allows the individual’s pursuit of wellness to become a cohesive journey, rather than the management of a series of discrete health events. Individuals are becoming aware of the opportunities presented by personal medicine. To date, most health records cannot and do not store the information necessary to extrapolate future outcomes and hence pursue pro-active wellness. To date, government funding in public health is based on a reactive treatment model where funds are allocated to deal with issues of epidemiological importance. This will have to change if the benefits of personalised medicine and the pursuit of proactive wellness are to be effectively exploited. This thesis concludes that the current health record and associated health management ecosystem does not support the health and lifestyle information necessary to proactively maximise an individual’s ongoing wellness. The only health record model that may succeed in this objective is one that provides for the aggregation and management of an individual’s longitudinal health information, as proactive interventions are based on extrapolation. Furthermore, creation of longitudinal health records is only possible if responsibility for the record is assigned to the individual, as the individual is the only stakeholder in every relevant health and lifestyle event and, as a result, the only stakeholder with the capacity to record each event. The individual is also the party with the most to win or lose, as measured by quality adjusted life expectancy, and is therefore the most motivated to exploit the health record to maximise ongoing wellness. In addition, there are societal expectations that the individual should control and manage their own health information, there are legal rights frameworks (property, privacy, and digital asset) that support ownership by the individual, and there are no substantive ethical reasons why the individual should not manage and control their own health information. Ownership of health information is distinct from ownership of the health record, which is also distinct from ownership of the health record system. In implementation, it may eventuate that a 3rd party will provide the health record system with an appropriate balance of rights negotiated between the individual and the 3rd party. The individual, once granted ownership of the health record, may find sufficient benefit and ethical justification to allow subsets of this information to be used for research relevant to improving public health. An understanding of population genotypes will assist in the prioritisation and development of medicines, and in the subsequent allocation of public funding to the public health care system. It is argued that sufficient utility exists from developments in the public health system, that most individuals will voluntarily share subsets of their health information for the benefits that will ultimately accrue back to them. The primary upcoming challenge will be to interpret this health information into health related decision-making. Interpretation of genomic information, for the use of an individual/patient, is a skill currently limited to a small number of individuals with expertise in genetics, statistics, medicine, and ethics, and interpretation is a moving target as this is a science in its infancy. Knowledge is being written and re-written rapidly and this complexity will have to be simplified for the health record to have utility. Interpreting this genetic information is complex and unlikely to be within the capacity of the general healthcare professional. Tools to facilitate interpretation will be necessary and these could be encapsulated as decision support systems that provide interpreted knowledge alongside the health information itself.
Article
Clinical outcome prediction is important for stratified therapeutics. Machine learning (ML) and deep learning (DL) methods facilitate therapeutic response prediction from transcriptomic profiles of cells and clinical samples. Clinical transcriptomic DL is challenged by the low-sample sizes (34-286 subjects), high-dimensionality (up to 21,653 genes) and unordered nature of clinical transcriptomic data. The established methods rely on ML algorithms at accuracy levels of 0.6-0.8 AUC/ACC values. Low-sample DL algorithms are needed for enhanced prediction capability. Here, an unsupervised manifold-guided algorithm was employed for restructuring transcriptomic data into ordered image-like 2D-representations, followed by efficient DL of these 2D-representations with deep ConvNets. Our DL models significantly outperformed the state-of-the-art (SOTA) ML models on 82% of 17 low-sample benchmark datasets (53% with >0.05 AUC/ACC improvement). They are more robust than the SOTA models in cross-cohort prediction tasks, and in identifying robust biomarkers and response-dependent variational patterns consistent with experimental indications.
Article
Full-text available
The debrisoquine/sparteine-type polymorphism is a clinically important inherited variation of drug metabolism characterized by two phenotypes, the extensive metabolizer and the poor metabolizer (PM). Five to 10 percent of individuals in Caucasian populations are of the PM phenotype and have deficient metabolism of debrisoquine and over 25 other drugs. Our previous studies have revealed absence of cytochrome P450IID6 protein and aberrant splicing of IID6 premRNA in livers of PMs. Moreover, two mutant alleles of the P450IID6 gene locus (CYP2D6) were identified by restriction fragment length analysis to be associated with the PM phenotype. However, the mutations of the CYP2D6 gene causing absent P450IID6 protein have not been defined. Here we report the cloning and sequencing of two types of mutant alleles of CYP2D6 isolated from genomic libraries of three PM individuals. One allele (29-A) was characterized by a single nucleotide deletion in the 5th exon with consequent frameshift and was observed in one individual only. The other type of mutant allele (29-B) was present in all three PM individuals and its sequence contained multiple mutations, notably four base changes causing amino acid changes in exons 1, 2 and 9, and a point mutation at the consensus sequence of the splice site of the 3rd intron. To understand the significance of the individual mutations, chimeric genes were constructed between the wild-type IID6 gene and the mutant 29-B allele or site-specific mutations were introduced into the IID6-cDNA and these DNA constructs were transiently expressed in COS-1 cells. The mutations in exon 1 resulted in a functionally deficient IID6 protein and the mutation at the splice site in absent IID6 protein, whereas the mutations in exons 2 and 9 were of no consequence for IID6 function. Only the mutation at the splice site thus explains the absence of P450IID6 protein in livers of PM individuals and appears to be a common cause of polymorphic drug oxidation.
Article
Full-text available
The acetylation polymorphism is one of the most common genetic variations in the transformation of drugs and chemicals. More than 50% of individuals in Caucasian populations are homozygous for a recessive trait and are of the "slow acetylator" phenotype. They are less efficient than "rapid acetylators" in the metabolism of numerous drugs and environmental and industrial chemicals. The acetylation polymorphism is associated with an increased risk of drug toxicity and with an increased frequency of certain cancers. We report the identification of the primary mutations in two alleles of the gene for the N-acetyltransferase (NAT; acetyl-CoA:arylamine N-acetyltransferase, EC 2.3.1.5) isozyme NAT2 associated with slow acetylation. These alleles, M1 and M2, account for more than 90% of slow acetylator alleles in the European population we have studied. M1 and M2 were identified by restriction fragment length polymorphisms with Kpn I and Msp I and subsequently cloned and sequenced. M1 and M2 each are characterized by a combination of two different point mutations, one causing an amino acid substitution (Ile-113----Thr in M1, Arg-197----Gln in M2), the other being silent (C 481----T in M1, C 282----T in M2). Functional expression of M1 and M2 and of chimeric gene constructs between mutant and wild-type NAT2 in COS-1 cells suggests that M1 causes a decrease of NAT2 protein in the liver by defective translation, whereas M2 produces an unstable enzyme. On the basis of the mutations described here and a rare mutant allele (M3) reported recently, we have developed a simple DNA amplification assay that allows the predictive genotyping of more than 95% of slow and rapid acetylator alleles and the identification of individuals at risk.
Article
Although acute lymphoblastic leukemia (ALL) is the most common childhood cancer, factors governing susceptibility to this disease have not yet been identified. As such, ALL offers a useful opportunity to examine the glutathione S-transferase and cytochrome P450 genes in determining susceptibility to pediatric cancers. Both enzymes are involved in carcinogen metabolism and have been shown to influence the risk a variety of solid tumors in adults. To determine whether these genes played a similar role in childhood leukemogenesis, we compared the allele frequencies of 177 childhood ALL patients and 304 controls for the CYP1A1, CYP2D6, GSTM1, and GSTT1 genes. We chose the French population of Quebec as our study population because of its relative genetic homogeneity. The GSTM1 null and CYP1A1*2A genotypes were both found to be significant predictors of ALL risk (odds ratio [OR] = 1.8). Those possessing both genotypes were at an even greater risk of developing the disease (OR = 3.3). None of the other alleles tested for proved to be significant indicators of ALL risk. Unexpectedly, girls carrying the CYP1A1∗4 were significantly underrepresented in the ALL group (OR = 0.2), suggesting that a gender-specific protective role exists for this allele. These results suggest that the risk of ALL may indeed be associated with xenobiotics-metabolism, and thus with environmental exposures. Our findings may also explain, in part, why ALL is more prevalent among males than females.
Article
We describe a model-based clustering method for using multilocus genotype data to infer population structure and assign individuals to populations. We assume a model in which there are K populations (where K may be unknown), each of which is characterized by a set of allele frequencies at each locus. Individuals in the sample are assigned (probabilistically) to populations, or jointly to two or more populations if their genotypes indicate that they are admixed. Our model does not assume a particular mutation process, and it can be applied to most of the commonly used genetic markers, provided that they are not closely linked. Applications of our method include demonstrating the presence of population structure, assigning individuals to populations, studying hybrid zones, and identifying migrants and admixed individuals. We show that the method can produce highly accurate assignments using modest numbers of loci—e.g., seven microsatellite loci in an example using genotype data from an endangered bird species. The software used for this article is available from http://www.stats.ox.ac.uk/~pritch/home.html.
Chapter
There is increasing evidence that the ways in which nutrients are handled metabolically is, to a greater or lesser extent, under genetic control. Equally, nutrient (and non-nutrient) intakes affect the expression of genetic predispositions. These complex interactions (nutrient regulation of gene transcription, food-induced DNA damage, phytochemical enhancement or protection of DNA integrity, genetic susceptibility to nutrition-related diseases) increasingly shed light on epidemiological relationships between diet and health and disease. This chapter considers design and analytical implications for understanding gene-nutrient interactions, including specific statistical models. It concludes with a discussion of ethical issues, and an appendix for sample size determination in relation to the determination of genetic characteristics in nutritional epidemiological studies. © Barrie M. Margetts, Michael Nelson, and the contributors listed on p. ix, 1997. All rights reserved.
Article
netics and pharmacologic effects of medica-tions is determined by their importance for the activation or inactivation of drug sub-strates. The effects can be profound toxicity for medications that have a narrow therapeu-tic index and are inactivated by a polymor-phic enzyme (for example, mercaptopurine, azathioprine, thioguanine, and fluorouracil) (6) or reduced efficacy of medications that require activation by an enzyme exhibiting genetic polymorphism (such as codeine) (7). However, the overall pharmacologic ef-fects of medications are typically not mono-genic traits; rather, they are determined by the interplay of several genes encoding proteins involved in multiple pathways of drug metab-olism, disposition, and effects. The potential polygenic nature of drug response is illustrat-
Article
The mammalian cytochrome P450-dependent monooxygenase system is involved in the metabolism of drugs and chemical carcinogens. The role of these enzymes in toxicological response is exemplified by an autosomal recessive polymorphism at the cytochrome P450 CYP2D6 debrisoquine hydroxylase locus which results in the severely compromised metabolism of at least 25 drugs, and which in some cases can lead to life-threatening side-effects. In addition, this polymorphism, which affects 8-10% of the caucasian population, has been associated with altered susceptibility to lung and bladder cancer. Here we report the identification of the primary mutation responsible for this metabolic defect and the development of a simple DNA-based genetic assay to allow both the identification of most individuals at risk of drug side-effects and clarification of the conflicting reports on the association of this polymorphism with cancer susceptibility.