Content uploaded by Minakshi Mukherjee
Author content
All content in this area was uploaded by Minakshi Mukherjee on Jan 04, 2023
Content may be subject to copyright.
Metabolic Engineering 74 (2022) 139–149
Available online 29 October 2022
1096-7176/© 2022 International Metabolic Engineering Society. Published by Elsevier Inc. All rights reserved.
Machine-learning guided elucidation of contribution of individual steps in
the mevalonate pathway and construction of a yeast platform strain for
terpenoid production
Minakshi Mukherjee
a
, Rachael Hageman Blair
b
, Zhen Q. Wang
a
,
*
a
Department of Biological Sciences, University at Buffalo, State University of New York, Buffalo, NY, NY14260, USA
b
Department of Biostatistics, University at Buffalo, State University of New York, Buffalo, NY, NY14260, USA
ARTICLE INFO
Keywords:
Terpene
Saccharomyces cerevisiae
Random forest
Mevalonate kinase
Metabolic engineering
ABSTRACT
The production of terpenoids from engineered microbes contributes markedly to the bioeconomy by providing
essential medicines, sustainable materials, and renewable fuels. The mevalonate pathway leading to the synthesis
of terpenoid precursors has been extensively targeted for engineering. Nevertheless, the importance of individual
pathway enzymes to the overall pathway ux and nal terpenoid yield is less known, especially enzymes that are
thought to be non-rate-limiting. To investigate the individual contribution of the ve non-rate-limiting enzymes
in the mevalonate pathway, we created a combinatorial library of 243 Saccharomyces cerevisiae strains, each
having an extra copy of the mevalonate pathway integrated into the genome and expressing the non-rate-limiting
enzymes from a unique combination of promoters. High-throughput screening combined with machine learning
algorithms revealed that the mevalonate kinase, Erg12p, stands out as the critical enzyme that inuences product
titer. ERG12 is ideally expressed from a medium-strength promoter which is the ‘sweet spot’ resulting in high
product yield. Additionally, a platform strain was created by targeting the mevalonate pathway to both the
cytosol and peroxisomes. The dual localization synergistically increased terpenoid production and implied that
some mevalonate pathway intermediates, such as mevalonate, isopentyl pyrophosphate (IPP), and dimethylallyl
pyrophosphate (DMAPP), are diffusible across peroxisome membranes. The platform strain resulted in 94-fold,
60-fold, and 35-fold improved titer of monoterpene geraniol, sesquiterpene
α
-humulene, and triterpene squa-
lene, respectively. The terpenoid platform strain will serve as a chassis for producing any terpenoids and terpene
derivatives.
1. Introduction
Terpenoids are ve-carbon isoprene derivatives that constitute the
largest class of natural products and are widely used as fuels, medicines,
and fragrances (Christianson, 2017; Belcher et al., 2020). However,
terpenoid yields from natural biological sources are often low, and
chemical synthesis is challenging due to their structural complexity.
Engineering microbes, especially bakers’ yeast, for sustainable terpe-
noid production has achieved considerable success in the past decade
(Ro et al., 2006; Engels et al., 2008). Terpenoid biosynthesis in yeast
relies on the mevalonate (MVA) pathway, which produces the universal
terpenoid precursors isopentyl pyrophosphate (IPP) and dimethylallyl
pyrophosphate (DMAPP) (Fig. 1A).
Engineered yeast strains for terpenoid production usually
overexpresses MVA pathway genes to provide sufcient IPP and DMAPP
for producing a wide range of terpenoids in yeast Saccharomyces cer-
evisiae (Navale et al., 2021). In recent works, all seven genes of the MVA
pathway were overexpressed from the yeast genome to increase con-
centrations of IPP and DMAPP and subsequently increased the titer of
specic terpenoids (Guo et al., 2018; Yuan and Ching, 2014; Yee et al.,
2019; Lv et al., 2016; Jiang et al., 2021; Westfall et al., 2012; Peng et al.,
2017; Li et al., 2020; Liu et al., 2020; Zhang et al., 2020). The seven
genes were usually expressed from strong promoters, and there has been
limited attention to balancing the expression of each gene. Unbalanced
expression of pathway genes may lead to the accumulation of in-
termediates that inhibit enzyme activities through feedback regulations
(Sauro, 2017). Combinatorial screening of the MVA pathway genes
expressed from promoters with various strengths can help identify the
* Corresponding author. Department of Biological Sciences, University at Buffalo, 653 Cooke Hall, Buffalo, NY14260, USA.
E-mail address: zhenw@buffalo.edu (Z.Q. Wang).
Contents lists available at ScienceDirect
Metabolic Engineering
journal homepage: www.elsevier.com/locate/meteng
https://doi.org/10.1016/j.ymben.2022.10.004
Received 26 April 2022; Received in revised form 16 October 2022; Accepted 23 October 2022
Metabolic Engineering 74 (2022) 139–149
140
optimal expression of each enzyme for maximized pathway ux and
terpenoid production. Such effort can also reveal the in vivo contribution
of each gene in the MVA pathway, especially the ve non-rate-limiting
enzymes. While there is a consensus that HMG-CoA reductase Hmg1p
and IPP isomerase Idi1p are bottlenecks (Han et al., 2018; Zhao et al.,
2017; Jiang et al., 2017; Xie et al., 2015; Verwaal et al., 2007), varying
information exists regarding the relative contribution of the other ve
MVA pathway genes (Kwak et al., 2020; Zhou et al., 2018; McClory
et al., 2019; Hu et al., 2020; Madsen et al., 2011; Yao et al., 2018;
Redding-Johanson et al., 2011; Alonso-Gutierrez et al., 2015). Thus, an
exhaustive study elucidating the relative importance of the ve
non-rate-limiting enzymes in the MVA pathway will deepen our
fundamental knowledge of the pathway enzymes and guide future en-
gineering to increase terpenoid titers.
Moreover, creating a yeast platform strain with increased terpenoid
precursors can shorten the strain development process to support the
high-titer production of terpenoids. A platform strain is a genetically
engineered microbe that provides abundant precursors for producing
various products (Nielsen, 2015). Developing a platform strain elimi-
nates repetitive engineering of the same precursor pathway for different
target molecules. Several yeast platform strains have been developed to
access precursors for alkaloids and aromatics (Chen et al., 2013;
Rodriguez et al., 2015; Gold et al., 2015; Campbell et al., 2016; Pyne
et al., 2020), but no such platform strain exists for terpenoids. Therefore,
we aim to build a yeast platform strain that can be used to produce any
terpenoid once compound-specic downstream modications are
incorporated.
In this study, we created a combinatorial library of 243 stable
transgenic strains with each of the ve non-rate-limiting MVA pathway
genes under three different promoters. Machine learning algorithms
revealed that ERG12 encoding the mevalonate kinase is the most critical
gene, apart from HMG1 and ID, that contributes signicantly to the
productivity of the MVA pathway. We have also created a universal
yeast platform strain for producing any terpenoids by dual-targeting the
MVA pathway to both the cytosol and peroxisomes. The dual-targeting
experiment revealed that some MVA pathway intermediates, including
mevalonate and IPP/DMAPP, are diffusible between cytosol and per-
oxisomes. The platform strain produced 94-fold higher monoterpene
geraniol, 60-fold higher sesquiterpene
α
-humulene, and 35-fold higher
triterpene squalene compared to the wild-type control.
2. Materials and methods
2.1. Strains and growth media
S. cerevisiae strains used to construct the engineered strains, CEN.
PK2–1C (MATa; his3D1; leu2-3_112; ura3-52; trp1-289; MAL2-8c; SUC2),
CEN.PK2-1D (MAT
ɑ
; his3D1; leu2-3_112; ura3-52; trp1-289; MAL2-8c;
SUC2) and CEN.PK2 (MATa/
ɑ
; his3D1/his3D1; leu2-3_112/leu2-3_112;
ura3-52/ura3-52; trp1-289/trp1-289; MAL2-8c/MAL2-8c; SUC2/SUC2),
were acquired from Euroscarf, Germany. E. coli strain DH5ɑ was used for
cloning and plasmid propagation.
E. coli cells were grown on Luria-Bertani (LB) plates with appropriate
antibiotics. Yeast synthetic dropout media used for integrations, mating,
and culturing contained 0.67% (w/v) yeast nitrogen base without amino
acids (Difco, Franklin Lakes, NJ), 2% (w/v) dextrose (Fisher Scientic,
Waltham, MA), 0.07% (w/v) synthetic complete amino acid mix (CSM)
without certain amino acids (Sunrise Science, Knoxville, TN). SD +400
μ
g/ml G418 (pH =7) (Goldbio, St. Louis, MO), which selects for the
plasmid, was used for seed culture preparation. YPD (1% yeast extract,
2% peptone, and 2% dextrose) without antibiotic selection was used for
preparing the growth curves in Fig. 4B. YPD +200
μ
g/ml G418 was used
for compound production (Vickers et al., 2013).
2.2. Gene synthesis, PCR, and cloning
The ERG20
WW
, tObGES, ZSS1, and CdGeDH genes were codon-
optimized and synthesized by IDT (Newark, NJ). PCR amplication
was performed using the Phusion High Fidelity DNA Polymerase (NEB,
Ipswich, MA) according to the manufacturer’s protocol. Gibson assem-
bly (Gibson, 2011) was used to clone the sgRNAs into the pCAS (Fer-
nandes et al., 2007) plasmid for CRISPR-guided genomic integration.
Golden Gate assembly (Mukherjee et al., 2021) was performed to
assemble all the other constructs. The sequences of all part plasmids
were conrmed using Sanger sequencing (GeneWiz, South Plaineld,
NJ). A schematic outlining the general strategy for cloning the
multi-gene plasmids is included in Fig. S1. All the constructs created and
primers used are listed in Tables S1–S8.
Fig. 1. Overexpressing the complete MVA pathway
led to increased geraniol production. (A) The MVA
pathway leads to geraniol production. Proteins in
blue were overexpressed MVA enzymes. Erg10p:
acetoacetyl-CoA thiolase; Erg13p: 3-hydroxy-3-meth-
ylglutaryl-CoA (HMG-CoA) synthase; tHmg1p: trun-
cated HMG-CoA reductase without the regulatory
domain; Erg12p: mevalonate kinase; Erg8p: phos-
phomevalonate kinase Erg19p: mevalonate pyro-
phosphate decarboxylase; Idi1p: isopentenyl-
diphosphate isomerase; Erg20
ww
p: Erg20p (F96W,
N127W) mutant acting as a geranyl pyrophosphate
(GPP) synthase; tObGES: truncated geraniol synthase
from Ocimum basilicum. IPP: isopentyl pyrophosphate;
DMAPP: dimethylallyl pyrophosphate. (B) Schematic
showing the genomic integration of seven MVA
pathway genes and the tObGES-Erg20
ww
p fusion
protein expressed episomally from a strong constitu-
tive promoter (pPYK001). The two proteins are fused
together with a “GSG” linker. (C) Geraniol yield in
engineered strains (MVAc1, MVAc2, MVAc3, and
MVAc4). "c” indicates that genes are localized to the
cytosol. Fold increase compared to the wild type at
each time point is noted at the top of each bar. Data
represent the average ±SD of three independent
biological replicates.
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
141
2.3. Strain construction
Yeast competent cells were co-transformed with the NotI digested
and linearized multi-gene (Lee et al., 2015) and pCAS-sgRNA (Ryan
et al., 2014) plasmids using the Frozen-EZ yeast transformation II kit
(Zymo Research, Irvine, CA) according to the manufacturer’s protocol.
The transformed cells were plated on appropriate dropout media for
selection and incubated at 30 ◦C for two days and 37 ◦C for an additional
day to facilitate genomic integration (Ryan et al., 2014). Two pairs of
diagnostic primers were used to conrm each integration by
polymerase-chain reactions (PCR) using the GoTaqGreen DNA poly-
merase (Promega, Madison, WI). For further conrmation of each gene
in two-gene inserts at ROX1 and GAL80 loci, primers were designed such
that the forward and reverse primers bind to the rst and the second
gene, respectively. For three gene inserts at the GAL1 locus, an addi-
tional pair of forward and reverse primers binds to the second and third
genes, respectively. All the primers used are listed in Table S8.
2.4. Mating of yeast strains
243 library strains: One colony was picked from each of the 27 GAL1Δ
and 9 ROX1ΔGAL80Δ+tObGES-ERG20
ww
strains from their respective
dropout plates (SD-Leu and SD-Ura-Trp-His) and streaked out in vertical
and horizontal lines respectively on an SD-Leu- Ura-Trp-His plate fol-
lowed by incubating at 30 ◦C for two days (see schematic in Fig. S2).
Colonies growing at the intersection of the streaks were further streaked
out on a fresh SD-Leu-Ura-Trp-His plate and incubated at 30 ◦C over-
night. They were then screened with diagnostic and gene-specic
primers to conrm the integration. For the MVA platform strain, one
colony from MVAc4 and MVAp4 were streaked out as above on an SD-
Leu-Ura-Trp +200
μ
g/ml Hygromycin (Goldbio, St. Louis, MO) plate
and incubated and screened as mentioned above.
2.5. Geraniol production and quantication
2.5.1 Geraniol production: For geraniol production from strains CEN.
PK2–1C and MVAc1-MVAc4, yeast colonies transformed with the
pPYK1-tObGES-ERG20
ww
plasmids were grown overnight in 5 ml SD-His
at 30 ◦C with shaking at 200 rpm. The overnight culture was inoculated
at an initial OD
600
of 0.1 into fresh SD-His and grown at 30 ◦C with
shaking at 200 rpm for 48 h 1 ml of the culture was collected at 12, 24,
and 48 h and was pelleted at 16,000×g for 1 min, and 50
μ
l of the su-
pernatant was used to quantify geraniol using the geraniol dehydroge-
nase (GeDH) assay (Lin et al., 2018).
For library screening, seed cultures were set up with three replicates
of each wildtype CEN.PK2 and 243 strains by inoculating three colonies
of each strain into 200
μ
l SD-Leu-Ura-Trp-His media in 96-well plates.
The overnight culture was inoculated at an initial OD
600
of ~0.1 into
fresh SD-Leu-Ura-Trp-His media in 96-deep-well plates; each well has
500
μ
l culture. The deep-well plates were incubated at 30 ◦C with
shaking at 400 rpm for 12 h. The plates were centrifuged at 3,220×g for
5 min, and 50
μ
l of the supernatant was used for the GeDH assay.
For geraniol production from the wildtype CEN.PK2–1C, MVAc4,
MVAp4, and MVA platform strains, yeast colonies transformed with
either pGAL1-tObGES-ERG20
ww
or tObGES-ERG20
ww
-SKL were grown
overnight in 5 ml SD +400
μ
g/ml G418 (pH =7). The overnight culture
was inoculated at an initial OD
600
of 0.1 into fresh YPD +200
μ
g/ml
G418 and grown at 30 ◦C with shaking at 200 rpm for 24 h 1 ml of the
culture was collected and pelleted at 16,000×g for 1 min, and 50
μ
l of
the supernatant was used to quantify geraniol using the GeDH assay.
2.5.2 Geraniol dehydrogenase assay: CdGeDH gene from Castellaniella
defragrans, encoding the geraniol dehydrogenase, was cloned into the
pET-24 vector by Gibson assembly. Protein purication and the assay
were performed with slight modications from the protocol described in
Lin et al., 2018). Briey, pET-24_CdGeDH with a C-terminal his-tag was
transformed into E. coli (BL21), a single colony was inoculated for seed
culture overnight and diluted 50-fold in a scaled-up culture, grown at
37 ◦C till OD
600
of 0.6, then 0.1 mM of IPTG (Goldbio, St. Louis, MO) was
added, followed by grown at 16 ◦C for 24 h. The culture was centrifuged
at 3220×g for 20 min, the supernatant was discarded, and the pellet was
resuspended in lysis buffer (50 mM Tris pH =7.5, 5 mM imidazole, and
1 mM phenylmethylsulfonyl uoride) and 1 mg/ml lysozyme (Sigma
Aldrich, St. Louis, MO). Cells were lysed with a sonicator (Misonix,
Farmingdale, NY) for 2 min with 10 s pulses. Proteins were puried
using a Ni-NTA column (Qiagen, Germantown, MD). Unbound proteins
were eliminated with wash buffer (50 mM Tris pH-7.5, 40 mM imid-
azole), and GeDH protein was eluted with elution buffer (50 mM Tris
pH-7.5, 250 mM imidazole). The purify of the resulting CdGeDH enzyme
was routinely examined by protein gel electrophoresis.
For the GeDH assay, 50
μ
l of the spent media was mixed with 50
μ
l of
a prepared reaction mix such that the nal mixture contained: 100 mM
Tris-HCl (pH 8.0), 2 mM nicotinamide adenine dinucleotide (NAD
+
)
(Goldbio, St. Louis, MO), 2 mM resazurin sodium salt (Acros Organics,
Belgium), 0.002 U puried geraniol dehydrogenase, and 1 U diaphorase
(Sigma Aldrich, St.Louis, MO). To prepare geraniol standard curve, 10X
of each geraniol concentration was prepared by dissolving the authentic
geraniol standard (Acros Organics, Belgium) in acetone. Next, the 10X
concentrations were diluted and added to the reaction mix such that the
nal geraniol concentration is 1X. The geraniol standard curves used for
Figs. 1C, 2B and 4C are shown in Fig. S3. Each reaction was incubated at
room temperature for 45 min, and uorescence was recorded at the
excitation and emission of 530 nm and 590 nm, respectively, using a
Tecan Spark microplate reader (Morrisville, NC). The geraniol concen-
trations of MVA platform +tObGES-ERG20
ww
were conrmed using gas
chromatography coupled with mass spectrometry (GC-MS) (Fig. S4).
2.6. Terpene quantication using GC-MS
For geraniol, citronellol, and geranyl acetate extraction, 1 ml culture
was centrifuged at 16,000×g for 1 min, 500
μ
l of the supernatant was
mixed with 500
μ
l hexane and shaken in a plate shaker at the highest
speed for 10 min, followed by centrifugation at 16,000×g for 2 min. 500
μ
l of the hexane layer was diluted ve folds in hexane and used for GC-
MS. For
α
-humulene extraction, 1 ml culture was centrifuged at
16,000×g for 1 min, and 500
μ
l of the supernatant was mixed with 500
μ
l ethyl acetate and shaken in a plate shaker at the highest speed for 10
min followed by centrifugation at 16,000×g for 2 min. 500
μ
l of the ethyl
acetate layer was collected for GC-MS. For squalene extraction, 1 ml
culture was centrifuged at 16,000×g for 1 min. The supernatant was
discarded, and the pellet was dissolved in 200
μ
l ethyl acetate, followed
by homogenizing with 100 mg of 0.5 mm glass beads in a Bullet
Blender® tissue homogenizer at the highest setting for 10 min at 4 ◦C.
300
μ
l ethyl acetate was then added to the sample, and the sample was
further vortexed and centrifuged at 16,000×g for 2 min. 500
μ
l of the
hexane layer was collected for GC-MS.
Terpenes were detected using a Thermo Trace 1300 Gas Chromato-
graph and Thermo Q-exactive
TM
Orbitrap Mass Spectrometer (Waltham,
MA). 5
μ
L geraniol-containing samples, 2
μ
L
α
-humulene-, or squalene-
containing samples were injected into a Thermo Scientic TraceGOLD
TG-5SILMS column (30 m long, 0.25 mm inner diameter, 0.25
μ
m lm
thickness) using helium as the carrier gas (1 ml/min). The injector was
held at 200 ◦C. For geraniol, citronellol, and geranyl acetate analysis, the
oven was held at 40 ◦C for 4 min, followed by ramping up to 280 ◦C at a
rate of 20 ◦C/min and then holding at 280 ◦C for 2 min. The mass range
monitored was 39–200 m/z in the positive ion mode. Geraniol eluted at
10.24 min, citronellol at 9.93 min, and geranyl acetate at 10.99 min. For
α
-humulene, the oven was held at 80 ◦C for 3 min, followed by ramping
up to 180 ◦C at a rate of 15 ◦C/min and further ramping to 240 ◦C at the
rate of 10 ◦C/min, holding for 1 min. The mass range monitored was
50–250 m/z in the positive ion mode.
α
-humulene eluted at 9.7 min. For
squalene, the oven was held at 80 ◦C for 3 min, followed by ramping up
to 180 ◦C at a rate of 15 ◦C/min and further ramping to 310 ◦C at 20 ◦C/
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
142
min and then holding at 280 ◦C for 1 min. The mass range monitored was
50–450 m/z in the positive ion mode. Squalene eluted at 16.8 min. The
MS transfer line was at 250 ◦C, and the source temperature was 200 ◦C.
The resolution was set to 60,000. The MS was set to monitor total ion
counts.
Peak areas for geraniol,
α
-humulene, and squalene were quantied
using the Xcalibur™ software (Thermo Fisher, Waltham, MA). Absolute
sample concentrations were calculated from a standard curve of
authentic geraniol (Acros Organics, Belgium), citronellol (Acros Or-
ganics, Belgium), geranyl acetate (Thermo Scientic, Waltham, MA),
α
-humulene (Millipore Sigma, Burlington, MA), and squalene (TCI
America, Portland, OR) standards. To prepare standard curves, geraniol,
citronellol, and geranyl acetate were diluted in hexane and squalene and
α
-Humulene standards in ethyl acetate. Geraniol and squalene standards
were diluted over a range of 1.56–25 mg/L, citronellol 1.06–6.25 mg/L,
and
α
-Humulene 0.531–12.5 mg/L. Ions of m/z values 123.1168 ±5
ppm, 138.1403±5 ppm, 136.1247±5 ppm, 93.0698 ±5 ppm, and
121.1012 ±5 ppm were used for quantifying the peak area for geraniol,
citronellol, geranyl acetate,
α
-humulene, and squalene, respectively.
2.7. Statistical methods
A random forest (RF) (Breiman, 2001) was used to t predictive
models for geraniol production. Briey, RFs construct ensembles of
Classication and Regression Trees (CART) (Breiman et al., 2017) from
bootstrap replications of the data. Each CART model is a decision tree
that creates a prediction of geraniol, and the nal prediction is based on
aggregation over the ensemble. Models were t based on out-of-bag
estimation (Breiman, 1996), which prevents overtting.
Tree-based models such as RFs are particularly useful when in-
teractions are expected between variables, in this case, the MVA
pathway enzymes, and for delineating the role and importance of the
individual variables (Breiman, 1996) in the prediction of the outcome,
geraniol titer. Another strength of the RF is that it implements bootstrap
resampling of the data (Efron and LePage, 1992), accounting for un-
certainty in the population, and is ideal for a smaller sample size of this
type. The bootstrap replication datasets are generated by resampling the
observations (strains) with replacement and are the same size as the
original dataset. The output is an ensemble of prediction models
aggregated to produce a prediction for each observation. The accuracy
of the RF was estimated using a simple residual sum of squares (RSS) loss
function averaged over out-of-bag (OOB) samples (Friedman JTibshir-
ani, 2001) in the ensemble to produce a mean squared error (MSE).
Using the OOB error estimate eliminates the requirement for a set-aside
test set (Breiman, 2001). Notably, by nature of the resampling, not all
the observations are present in each bootstrap replication. OOB error
leverages this for estimation by aggregating only over the predictors in
the ensemble for which an observation was not randomly selected in the
bootstrap, which inherently avoids overtting (Breiman, 2001). OOB
estimation is an effective alternative for smaller datasets that may be
sensitive to training and testing splits or fold assignments in
cross-validation.
Variable importance (Breiman, 2001; Friedman JTibshirani, 2001)
measures were used to prioritize the enzymes according to their
contribution to the predictive accuracy of the outcome. Importance is
measured by increases in node purity that serves as a surrogate for the
performance of the random forest. High increases in node purity indicate
that the predictive strength of the model shows high levels of
improvement when the enzyme is included in the random forest, and its
elimination from the data set would considerably degrade the predictive
strength (Fig. 3A).
Partial Dependence Plots (PDP) are a popular technique for visual-
izing the contribution of variables to an outcome and the relationships
between pairs of variables and an outcome (Cutler et al., 2007; Green-
well, 2017). Using the variable importance measure as a prioritization,
we examined the impact of the ve MVA pathway enzymes on geraniol
production and their interactions. PDP proles were computed using
grids created of ten equally spaced values over the support region for
each enzyme. Linear interpolation was used to estimate geraniol pro-
duction in between data points.
Individual Conditional Expectation (ICE) curves (Goldstein et al.,
2015) were also examined for the highest and lowest-producing strains.
ICE curves enable the visualization of the functional relationships be-
tween the predicted values of geraniol production and enzyme levels for
individual strains and are useful for assessing sensitivity (Fig. S4).
Analysis was performed in the R programming language with the
“randomForest” (Breiman, 2001), “PDP” (Greenwell, 2017), and “vivo”
packages.
Fig. 2. Construction and screening of the combina-
torial yeast MVA library with varying promoter
strengths. (A) A diploid library of 243 strains, each
having tHMG1 and IDI1 under strong promoters and
ERG13, ERG12, ERG19, ERG10, and ERG8 under a
unique combination of strong, medium, or weak
promoters integrated into the genome. The tObGES-
ERG20
ww
fusion protein was expressed from a
plasmid. Color intensity represents promoter
strength. The strains were cultured in 96-deep-well
plates, and the geraniol produced was quantied
using a uorescence-based assay. (B) Heat map
showing relative promoter strengths and the corre-
sponding uorescence normalized to OD
600
of the
wild type and the 243 strains. The top ten strains with
the highest uorescence readings are marked with an
asterisk. Data represent the average of three inde-
pendent biological replicates.
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
143
3. Results
3.1. Sequential integration of the complete MVA pathway into the yeast
genome
We have focused on genomic integration instead of a plasmid-based
system because an ideal platform strain should be genetically stable and
not require selective markers during fermentation. An additional copy of
all seven MVA pathway genes was integrated sequentially into the yeast
genome under the rationale that overexpression of the complete MVA
pathway would increase IPP and DMAPP levels. The MVA pathway
genes were inserted into three genomic loci, GAL80, GAL1, and ROX1
(Fig. 1B, Table 1). GAL80 and GAL1 deletions allowed gene expression
under galactose-inducible promoters when glucose was the sole carbon
source (Westfall et al., 2012). ROX1 was disrupted to boost the MVA
pathway by alleviating transcriptional repression (Trikka et al., 2015).
Each MVA pathway gene was expressed from a unique, strong consti-
tutive promoter to minimize potential homologous recombination
(Orr-Weaver et al., 1981). The sequentially engineered MVA strains
(MVAc1-4) were transformed with a plasmid enabling the production of
geraniol, a fragrant monoterpenoid and a precursor for medicinally
important indole alkaloids (Chen and Viljoen, 2010; Brown et al., 2015).
The fusion protein tObGES-ERG20
ww
(Wang et al., 2021; Ignea et al.,
2014) was used for geraniol biosynthesis as fusing geraniol diphosphate
synthase (ERG20
ww
) with geraniol synthase (tObGES) resulted in higher
geraniol production than when the two are separately expressed
(Fig. S5).
Geraniol yield increased with the increase in the number of over-
expressed MVA pathway genes (Fig. 1C). Strain MVAc1 with ERG10 and
tHMG1 overexpressed had over 2.5-fold increased geraniol yield after
12 h of shake-ask cultivation. Strain MVAc2 only showed a marginal
increase compared with MVAc1, likely because the excessive mevalo-
nate generated by tHMG1 overexpression was not channeled into the
MVA pathway due to the lack of the mevalonate kinase ERG12 in the
heterologous pathway. Strain MVAc3 overexpressing ve out of the
seven MVA pathway genes further increased geraniol yield. MVAc4 with
the complete MVA pathway overexpressed had the highest geraniol
yield, which is 7.5-fold of the wild type at 12 h. Geraniol titer was
maximum at 24 h (Fig. S6). Therefore, in addition to the two rate-
limiting enzymes, the other ve enzymes also play important roles in
increasing the MVA pathway productivity.
3.2. Creating a combinatorial strain library to survey the promoter space
of MVA pathway genes
When integrating the complete MVA pathway into the genome,
strong yeast promoters are usually used. However, they may not be the
ideal set of promoters that maximize pathway productivity. To nd the
optimal promoter combination of pathway genes and to delineate the
contribution of each gene to MVA pathway productivity, we created a
combinatorial strain library of 243 diploid strains with varying pro-
moter strengths. The rate-limiting genes tHMG1 and IDI1 were always
expressed from a strong promoter since their essentiality to the pathway
is well-documented (Han et al., 2018; Zhao et al., 2017; Jiang et al.,
2017; Xie et al., 2015; Verwaal et al., 2007; Zhou et al., 2012). Each of
the remaining ve genes was expressed from a unique combination of
strong, medium, or weak promoters, creating 3
5
=243 strains (Fig. 2A).
The choice of promoters and their relative expression strengths were
based on the extensive characterization of yeast promoters by Lee et al.
(2015) (Table S10).
The construction of the combinatorial library was streamlined by
mating engineered haploids of opposite mating types. Haploid strains of
mating-type MATa overexpressed ERG13, ERG12, and ERG19, each
under three different promoters, in the GAL1 locus. 3
3
=27 of such
Fig. 3. Random Forests were used to assess the importance and dependence of the MVA enzymes. (A) Variable importance from a random forest predicting readout.
Enzymes are ranked according to increases in node purity, a measure of performance. (B–F) Partial dependence plots show the predicted geraniol readout values as a
function of enzyme expression for ERG19, ERG13, ERG12, ERG10, and ERG8. The blue tick marks represent the promoter strengths within the data, and the
remaining curve was generated through interpolation. (G–L) Two-way partial dependence plots for the interactions between ERG12 and the other four pathway
enzymes, as well as the interactions between ERG19 and ERG13, and ERG8 and ERG10.
Table 1
List of strains generated for creating the MVA platform strain.
Strains Description Source
MVAc1 CEN-PK2-1C; rox1Δ::pHHF2-ERG10-tENO1, pTDH3-
tHMG1-tTDH1, URA3
This
study
MVAc2 MVAc1; gal80Δ::pTEF1-ERG8-tSSA1, pCCW12-IDI1-
tENO2, TRP1
This
study
MVAc3 MVAc1; gal1Δ::pPGK1-ERG13-tPGK1, pTEF2-ERG12-
tADH1, pHHF1-ERG19-tCYC1, LEU2
This
study
MVAc4 MVAc3; gal80Δ::pTEF1-ERG8-tSSA1, pCCW12-IDI1-
tENO2, TRP1
This
study
MVAp1 CEN-PK2-1D; rox1Δ::pHHF2-ERG10-SKL-tENO1, pTDH3-
tHMG1-SKL-tTDH1, URA3
This
study
MVAp2 MVAp1; gal80Δ::pTEF1-ERG8-SKL-tSSA1, pCCW12-IDI1-
SKL-tENO2, pTEF1-HygR-tTEF1
This
study
MVAp3 MVAp1; gal1Δ::pPGK1-ERG13-SKL-tPGK1, pTEF2-
ERG12-SKL-tADH1, pHHF1-ERG19-SKL-tCYC1, LEU2
This
study
MVAp4 MVAp3; gal80Δ::pTEF1-ERG8-SKL-tSSA1, pCCW12-IDI1-
SKL-tENO2, pTEF1-HygR-tTEF1
This
study
MVA
platform
CEN-PK2; rox1Δ::pHHF2-ERG10-tENO1, pTDH3-tHMG1-
tTDH1, URA3; gal1Δ::pPGK1-ERG13-tPGK1, pTEF2-
ERG12-tADH1, pHHF1-ERG19-tCYC1, LEU2; gal80Δ::
pTEF1-ERG8-tSSA1, pCCW12-IDI1-tENO2, TRP1; rox1Δ::
pHHF2-ERG10-SKL-tENO1, pTDH3-tHMG1-SKL-tTDH1,
URA3; gal1Δ::pPGK1-ERG13-SKL-tPGK1, pTEF2-ERG12-
SKL-tADH1, pHHF1-ERG19-SKL-tCYC1, LEU2,
gal80Δ::pTEF2-ERG8-SKL-tSSA1, pCCW12-IDI1-SKL-
tENO2, pTEF1-HygR-tTEF1
This
study
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
144
MATa strains were created (Table S10). Similarly, haploid strains with
the opposite MAT
ɑ
mating type overexpressed the other four MVA
pathway genes with ERG10 and ERG8 under three different promoters,
generating 3
2
=9 strains (Table S11). These nine strains were also
transformed with a plasmid bearing the tObGES-ERG20
ww
fusion gene
for geraniol production. Mating the engineered haploid strains with the
opposite mating type generated 3
3
×3
2
=243 diploid strains, each
containing an extra copy of the seven MVA pathway genes and capable
of producing geraniol. The strain library was cultivated in 96-deep-well
plates, followed by geraniol quantication using a high-throughput
uorescence-based assay (Lin et al., 2018). A heat map with the pro-
moter strengths and uorescence readings of all strains revealed a
unique pattern that the strains expressing ERG12 from a
medium-strength promoter produced some of the highest amounts of
geraniol. Eight out of the top ten geraniol-producing strains had ERG12
expressed from the medium-strength promoter (Fig. 2B). Quantitative
real-time PCR veried that transcript levels of overexpressed MVA
pathway genes positively correlated with the promoter strengths
(Fig. S7, Table S9). Quantication of intracellular mevalonate, a critical
pathway intermediate, in the strains with all strong promoters (
α
1), all
medium promoters (β5), and all weak promoters (γ9) showed a pro-
gressive decrease, as expected. (Table S12).
3.3. Applying machine learning to the combinatorial strain library
Machine learning was used to investigate the combinatorial library
with the primary objective of understanding the impact of each of the
ve enzymes on the productivity of the MVA pathway. Random forest
models (Breiman, 2001) were t to the data in the combinatorial library
with the outcome variable as geraniol production. Variable importance
measures indicate that the top three enzymes that are critical for pre-
dicting geraniol production are Erg19p, the mevalonate pyrophosphate
decarboxylase; Erg13p, the HMG-CoA synthase; and Erg12p, the
mevalonate kinase (Fig. 3A). In addition to the ranking, we also view the
drops in importance as insightful, especially between Erg12p and
Erg10p. This large gap secured the role of the top three enzymes as
critical for the predictive accuracy of geraniol production in the 243
strains.
Next, we took a closer look at measures of variable importance using
Partial Dependence Plots (PDPs) (Greenwell, 2017) to visualize the
contribution of the enzyme levels to geraniol output. PDP of the ve
enzymes showed the predicted geraniol production when an enzyme
was set at a given promoter strength (Fig. 3B–F). Erg13p, Erg19p, and
Erg8p showed increased geraniol production when their promoter
strengths were increased, eventually leveling off at saturation (Fig. 3B,
C&F), as expected. However, a unique role of mevalonate kinase
(Erg12p) was apparent from the PDP of ERG12 (Fig. 3D), which showed
that a maximum geraniol production was reached within our data when
its expression level was moderately low and then decreased with higher
promoter strength. Erg10p did not show saturation in the promoter
strengths tested.
In the two-enzyme interaction plots (Fig. 3G-L), the role of Erg12p is
even more apparent. When the value of ERG12 was in the moderate
range, the predicted geraniol output was the highest. This could be due
to several reasons, such as feedback inhibitions of Erg12p by pathway
intermediates (Anthony et al., 2009; Garcia and Keasling, 2014; Primak
et al., 2011; Kazieva et al., 2017) and metabolic burden leading to
protein aggregation. Therefore, moderate expression of ERG12 most
likely strikes the right balance for higher ux through the pathway.
The two-enzyme interaction plot between ERG19 and ERG13
(Fig. 3K) showed the highest geraniol production when the expression of
ERG19 was low and ERG13 was high. In the same plot, we also see
relatively high predicted readout values when the expression of ERG19
was high and ERG13 was moderate. This reverse balance is likely
because when Erg13p is expressed highly, Erg12p might be feedback
inhibited due to the increased intermediates downstream of Erg19p
(Anthony et al., 2009; Garcia and Keasling, 2014; Primak et al., 2011;
Kazieva et al., 2017), and lower expression of ERG19 would be more
desirable. However, when Erg13p is expressed low, Erg19p must have a
higher expression to maximize the pathway productivity since it cata-
lyzes the irreversible step, which releases CO
2
to produce IPP. The rest of
the two-enzyme interaction plots are similar to ERG10 and ERG8 in-
teractions (Fig. 3L), where expression of both enzymes led to the highest
amount of product, as expected, and are included in Fig. S8.
While the global analysis, including data from the entire combina-
torial library, provides information in the prediction of geraniol output,
the local analysis focuses on the top ten producers. Through the exam-
ination of the enzyme proles and their variable importance of the ten
highest geraniol-producing strains, we can gain insights into the role of
the individual enzymes in the prediction of high geraniol levels. The
local importance of pathway enzymes in the top ten strains supplements
the PDP plots and shows a clear pattern where Erg12p comes out as the
most important enzyme in seven out of ten strains (Table 2, Fig. S9). In
Table 2, there are two instances of ERG12’s expression as high (promoter
strength =7.77). In both cases, the expression of ERG8, ERG13, and
ERG19 is also high. This is also supported in the Individual Conditional
Expectation (ICE) curves (Goldstein et al., 2015) (Fig. S10), which show
that if ERG12’s expression is high, other pathway enzymes’ expression
has to be also high to maximize geraniol production. In the top ten
geraniol-producing strains, eight have ERG12 expressed at a moderately
low range (promoter strength =1.69), which we found to be a ‘sweet
spot.’ When ERG12 is expressed moderately, there are a variety of sce-
narios that can arise to produce a high amount of geraniol. Indeed,
within the eight strains having ERG12 expressed in a moderately low
range, seven have Erg12p as the most important enzyme for determining
nal productivity (Table 2, Fig. S9). In addition, Erg19p has consistently
moderate low abundance across the top ten strains when Erg12p is in the
sweet spot. Taken together, Erg12p is clearly the most critical enzyme
for maximum geraniol production out of the ve non-rate-limiting
enzymes.
These local and global measures of variable importance provide
complementary information. While the global analysis focuses overall
on the variables that are important for predicting readouts of all ranges,
the local importance allows us to zoom in on the patterns that give rise to
high geraniol production. Not surprisingly, they tell somewhat different
stories. Although ranked third in global variable importance, Erg12p is
the control point that limits production in the entire pathway and is the
most important enzyme when it comes to maximization of geraniol
production. The prominent role of Erg12p is likely due to feedback
regulations by pathway intermediates (Hinson et al., 1997; Chen et al.,
2018; Fu et al., 2008; Ma et al., 2011), reduced protein expression, or
protein aggregation.
3.4. Dual localization of the MVA pathway to both the cytosol and
peroxisomes
To further increase geraniol production, we localized the MVA
pathway into both the cytosol and peroxisomes. Peroxisomes are an
excellent choice for metabolic compartmentalization as they are not
essential for cell survival (Sibirny, 2016). Additionally, fatty acid
β-oxidation inside peroxisomes generates a pool of acetyl-CoA, which is
the substrate for the MVA pathway (Dusseaux et al., 2020). A haploid
peroxisome strain (MVAp4) was generated by tagging all seven MVA
genes with a C-terminal -SKL tripeptide. Similar to the MVAc4 strain, the
MVAp4 strain has seven MVA genes integrated into the genome.
Next, MVAc4 and MVAp4 strains were mated to obtain a diploid
strain, creating the MVA platform strain (Fig. 4A). The growth curves of
the strains showed that the engineered strains had no growth defect and,
in fact, grew signicantly faster than the wild-type strains in rich media
(Fig. 4B). When transformed with a plasmid bearing tObGES-ERG20
ww
,
the MVA platform strain doubled geraniol titers compared to the haploid
strains, indicating that the dual targeting of the MVA pathway
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
145
signicantly increased geraniol production (Fig. 4C). We also generated
two control strains, MVA cyto*2 and MVA per*2, in which two copies of
the entire MVA pathway were targeted to either the cytosol or peroxi-
somes (Fig. S13). The MVA platform strain produced comparable
amount of geraniol as the MVA cyto*2 strain but higher amount than the
MVA per*2 strain. This could be due to the insufcient NADPH inside
peroxisomes that limited the MVA pathway productivity. There was no
difference in geraniol titers between the strains expressing the MVA
pathway in the cytosol (MVAc4) and peroxisomes (MVAp4) (Fig. 4C and
D). Similar results were observed when the same strains were cultured in
minimal media (Fig. S12). Expressing the tObGES-ERG20
ww
in the
peroxisome of the cytosolic strain MVAc4 showed only a small drop in
geraniol titer compared to the strain with both the fusion protein and the
additional MVA pathway localized to the cytosol. Furthermore, when
localizing the tObGES-ERG20
ww
into the cytosol of the peroxisomal
strain MVAp4, there was no signicant drop in geraniol titer compared
to the strain with the fusion protein and the additional MVA pathway
localized to the peroxisome. These data indicate that the IPP/DMAPP
may diffuse somewhat freely between the cytosol and the peroxisome.
To check if the pathway intermediate, mevalonate, is diffusible, two
more strains, MVAp-c and MVAc-p were constructed. MVAp-c has the
top half of the pathway, from ERG10 to tHMG1, localized to the
peroxisome, and the bottom half of the pathway, from ERG12 to IDI1, in
the cytosol. Conversely, MVAc-p has the top half of the pathway local-
ized to the cytosol and the bottom half of the pathway in the peroxisome
(Fig. S11). There was no difference in geraniol titer among the strains
MVAc4 and MVAp-c or MVAp4 and MVAc-p; thus, mevalonate diffuses
readily between the cytosol and peroxisome.
The growth of the engineered strains showed an inversed relation-
ship with geraniol titer, possibly caused by geraniol toxicity to yeast at
higher concentrations (Denby et al., 2018). When normalized by OD
600
,
there is an over two-fold increase in geraniol production in the MVA
platform strain compared to the haploids (Fig. 4D). When extending the
culturing time from 24 to 48 h, geraniol production decreased signi-
cantly (Fig. S14). The decrease in geraniol titer could be due to the
compound’s volatility or the reduced expression of the heterologous
MVA pathway genes when glucose has been exhausted during the sta-
tionary phase (Peng et al., 2015). We also detected a minor product,
citronellol, which is reduced from geraniol by yeast’s native enzymes
(Fig. S14), whereas another common geraniol derivative, geraniol ace-
tate, was not detected. In an attempt to increase geraniol production,
MVAp4 and MVA platform strains were grown in a fatty-acid-based
media (YPO) (Gerke et al., 2020). However, the geraniol production in
YPO decreased 2-fold compared to the productivity in YPD (Fig. S15).
This was likely due to the low activity of promoters for expressing MVA
genes in fatty-acid-based media since most of these promoters are from
the glycolysis pathway.
3.5. Producing diverse terpenoids from the MVA platform strain
The MVA platform strain can be conveniently leveraged to jumpstart
the production of a wide range of terpenoids since the users only need to
transform a plasmid with the desired prenyltransferase and terpene
synthase. To demonstrate the versatility of the MVA platform strain, we
next utilized it to produce a sesquiterpene,
α
-humulene, and a tri-
terpene, squalene, in addition to the monoterpene geraniol.
α
-humulene
Table 2
Top ten strains with the highest level of geraniol. The numbers under each enzyme are the relative promoter strengths quantied by Lee et al. (Lee et al., 2015).
Strains ERG10 ERG13 ERG12 ERG8 ERG19 Geraniol (a.u.) Critical enzymes
α
1 9.01 11.01 7.77 8.85 4.81 518.85 ±0.54 Erg8p
β2 9.01 2.85 1.69 2.28 1.53 517.94 ±13.96 Erg12p
α
4 3.00 11.01 7.77 8.85 4.81 516.19 ±87.54 Erg8p
N3 9.01 1.06 1.69 0.91 1.53 513.53 ±42.87 Erg10p
N2 9.01 1.06 1.69 2.28 1.53 510.49 ±11.46 Erg12p
β4 3.00 2.85 1.69 8.85 1.53 509.51 ±21.59 Erg12p
β5 3.00 2.85 1.69 2.28 1.53 505.28 ±10.16 Erg12p
β7 1.06 2.85 1.69 8.85 1.53 502.44 ±15.87 Erg12p
β3 9.01 2.85 1.69 0.91 1.53 502.34 ±12.10 Erg12p
β1 9.01 2.85 1.69 8.85 1.53 501.19 ±1.77 Erg12p
Fig. 4. Creating the MVA platform strain by over-
expressing the MVA pathway in both cytosol and
peroxisomes. (A) The diploid strain (MVA platform)
was created by mating the haploid MVAc4 and
haploid MVAp4. (B) Growth (OD
600
) of the engi-
neered MVAc4, MVAp4, and MVA platform strains
and their wildtype counterparts. (C) Geraniol titer
and OD
600
of engineered MVAc4, MVAp4, and MVA
platform strains with tObGES-ERG20
ww
in either the
cytosol (‘C’) or peroxisomes (‘P’). (D) Geraniol yield
in the above strains. Data represent the average ±SD
of three independent biological replicates.
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
146
has potential anti-inammatory properties and acts as a precursor for
the anti-cancer drug zerumbone (Fernandes et al., 2007; Zhang et al.,
2018), while squalene is used as an emollient in personal care products
due to its skin-compatible properties (Popa et al., 2015). For
α
-humulene
production, the MVA platform strain transformed with a plasmid having
ERG20 encoding the FPP synthase and ZSS1 encoding an
α
-humulene
synthase from Zingiber zerumbet (Alemdar et al., 2016) produced
~60-fold more
α
-humulene than the wild type in 24 h (Fig. 5A–C).
Fusion constructs with ERG20-ZSS1 produced about half of the amount
compared with the non-fused counterpart, indicating that the fused
enzymes have unfavorable conformational properties. OD
600
increased
with the increase of
α
-humulene, which is likely due to a parallel in-
crease in squalene, the precursor for ergosterol (Li et al., 2020). For
squalene production, the MVA platform strain was transformed with a
plasmid having ERG20 and ERG9 encoding a squalene synthase. The
resulting strain yielded ~35-fold more squalene than the wild type when
grown in the presence of terbinane, an anti-fungal agent that inhibits
Erg1p, which metabolizes squalene to 2,3-oxidosqualene (Garaiova
et al., 2014) (Fig. 5A, D&E). Fusion constructs of ERG20 and ERG9
produced approximately half the amount of squalene, potentially due to
unfavorable protein conformation. The growth of these strains was
positively correlated with the amount of squalene produced since
squalene is the substrate for ergosterol biosynthesis.
4. Discussion
In this study, we investigated the contribution of individual enzymes
to the MVA pathway, which is widely utilized to improve titers of ter-
penoids. Previous studies have highlighted the importance of HMG1 and
IDI1 as rate-limiting enzymes (Han et al., 2018; Zhao et al., 2017; Jiang
et al., 2017; Xie et al., 2015; Verwaal et al., 2007; Zhou et al., 2012);
however, there is a lack of consensus about the role of the other ve
enzymes in the pathway (Kwak et al., 2020; Zhou et al., 2018; McClory
et al., 2019; Hu et al., 2020; Madsen et al., 2011; Yao et al., 2018;
Redding-Johanson et al., 2011; Alonso-Gutierrez et al., 2015; Anthony
et al., 2009; Garcia and Keasling, 2014; Chen et al., 2018; Ma et al.,
2011; Pojer et al., 2006). To clarify the importance of non-rate-limiting
enzymes in the MVA pathway, we created a combinatorial yeast library
for a comprehensive exploration of the promoter space of each of the
ve enzymes. Machine learning-guided modeling quantitatively
revealed the contribution of each enzyme to product titer and found
Erg19, Erg13, and Erg12p as crucial enzymes in determining product
yield. Note that the importance of each enzyme in a given pathway
cannot be inferred from the Gibbs free energy (ΔG) of the reaction it
catalyzes since enzymes act by decreasing the activation energy neces-
sary for reactions to proceed but do not change the overall ΔG of the
reactions (NelsonCox, 2004). While monoterpene geraniol was
employed as a readout of the MVA pathway, the modeling results are
likely extendable to terpenoids with longer chain lengths because all
these terpenoids require IPP: DMAPP ratio equal or above one, whereas
the product ratio of IDI1 at equilibrium is IPP: DMAPP =1: 2.2 (Street
et al., 1990).
We identied the medium expression of Erg12p as the ‘sweet spot’
for optimal terpenoid yield. Indeed, previous research showed that
mevalonate kinase is feedback inhibited by multiple terpenoid in-
termediates, including mevalonate, IPP, DMAPP, farnesyl pyrophos-
phate (FPP), geraniol pyrophosphate (GPP), and geranylgeranyl
pyrophosphate (GGPP) (Hinson et al., 1997; Chen et al., 2018; Fu et al.,
2008; Ma et al., 2011). A feedback-resistant mevalonate kinase from
archaea (Primak et al., 2011; Kazieva et al., 2017) may be used instead
of the native enzyme for further enhancement of the pathway produc-
tivity. Further, our analysis of the top ten geraniol-producing strains
(Table 2) shows that the strongest combination,
α
1, expressing all seven
MVA pathway genes under strong promoters, indeed maximizes gera-
niol production, but several pathway genes can be expressed with
relatively weaker promoters without signicantly reducing the product
titer. Seven out of the top ten producers having at least four genes
expressed from medium or weak promoters produced comparable
Fig. 5. Production of
α
-humulene and squalene using the MVA platform strain (A) Pathway for
α
-humulene and squalene production. ZSS1 encodes an
α
-humulene
synthase from Zingiber zerumbet; ERG9 encodes a squalene synthase in S. cerevisiae; ERG1 encodes a squalene epoxidase in S. cerevisiae. (B) Episomal constructs
express ERG20 and ZSS1 either separately or as a fusion protein with a ‘GSG’ linker. (C)
α
-Humulene production and growth (OD
600
) of the wild type (WT) and the
engineered MVA platform expressing ERG20 and ZSS1. (D) Episomal constructs express ERG20 and ERG9 separately or as a fusion gene with a “GSG” linker. (E)
Squalene production and growth (OD
600
) of WT and the engineered MVA platform with ERG20 and ERG9. Data represent the average ±SD of three independent
biological replicates.
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
147
geraniol titer as the top strain
α
1. It is to be noted that these conclusions
may only apply to the MVA pathway during the exponential phase of
growth.
The dual localization of the MVA pathway to both the cytosol and
peroxisomes signicantly increased geraniol titers (Fig. 4), most likely
due to the high abundance of acetyl-CoA and NADPH in the peroxisomes
and cytosol, respectively. Interestingly, targeting the MVA pathway into
the peroxisome but the prenyltransferase and terpenoid synthase into
the cytosol yielded similar amounts of geraniol. The same observations
were made when switching the localization of the overexpressed MVA
pathway and the prenyl transferase and geraniol synthase. These results
indicate that IPP/DMAPP are diffusible across the peroxisome mem-
brane. Similarly, we have constructed strains MVAc-p and MVAp-c to
show that mevalonate can diffuse readily across peroxisome membranes
(Fig. S11). Since peroxisome has a single-layer membrane, small mole-
cules can travel across either passively or facilitated by transporters
(Antonenkov et al., 2009). Furthermore, multiple MVA enzymes have
been reported to be localized in peroxisomes of plants and animals
(Guirimand et al., 2012; Simkin et al., 2011; Breitling and Krisans, 2002;
Sapir-Mir et al., 2008), which also supports the diffusion of MVA in-
termediates between peroxisomes and cytosol. The faster growth of the
engineered strains with the MVA pathway overexpressed is likely due to
the increased demand for acetyl-CoA, ATP, and NADPH, which results in
the accelerated turnover of sugar, lipids, and amino acids in the rich
media.
We used the dual localization strategy to create a platform strain as a
starting point for the production of terpenoids. Although plasmid-based
expression for peroxisomal localized genes resulted in a much higher
monoterpene production (Dusseaux et al., 2020), we focused on
genomic integration as it is known to be more stable than the
plasmid-based system (Ryan et al., 2014). Users only need to transfer a
plasmid carrying the particular prenyltransferase and terpenoid syn-
thase into the platform strain for the production of target terpenoids. To
demonstrate the versatility of our platform strain, we used it to produce
geraniol,
α
-humulene, and squalene as representatives of the three
classes of terpenes: mono-, sesqui-, and triterpenes. The highest titer in
shaking ask culture reported so far for geraniol,
α
-humulene, and
squalene are 523.96 mg/L (Jiang et al., 2017), 160 mg/L (Zhang et al.,
2020), and 1.3 g/L (Liu et al., 2020), respectively. These titers were
achieved by introducing compound-specic genetic modications and
optimizing culturing conditions. We did not introduce any additional
compound-specic genomic modications in the platform strain since
such modications will narrow the product scope of the platform. As a
result, the terpene titers from the off-the-shelf usage of the platform
strain were not expected to be the highest. Thus, future
compound-specic genomic modications hold promise to increase the
titers of a particular terpenoid. For example, genes such as ATF1 and
OYE2 may be deleted to increase geraniol titer by preventing its meta-
bolism (Brown et al., 2015). For increasing
α
-humulene and squalene
production, genes encoding non-specic phosphatases such as LPP1 and
DPP1 (Faulkner et al., 1999; Albertsen et al., 2011; Scalcinati et al.,
2012) may be deleted to prevent the divergence of farnesyl pyrophos-
phate (FPP) to farnesol. Expressing ERG9 from a weak promoter (Zhang
et al., 2018) or tagging it for degradation (Zhang et al., 2020) can lead to
higher
α
-humulene accumulation. Expressing ERG1 under a weak pro-
moter (Liu et al., 2020) can improve the production of squalene.
5. Conclusions
This study elucidated the detailed contribution of the ve non-rate-
limiting enzymes of the MVA pathway in S. cerevisiae by creating a
combinatorial yeast library. Analysis using machine learning algorithms
revealed the critical role of Erg12p in determining MVA pathway pro-
ductivity. A platform strain with dual localization of the MVA pathway
into both the cytosol and peroxisomes was created. This strain can be
leveraged to produce diverse terpenoids. The insights gained regarding
the contribution of individual MVA pathway enzymes and the MVA
yeast platform created will guide the future design and engineering to
produce high titers of any terpenoid.
Funding sources
This project was supported by the Research Foundation for the State
University of New York [71272] to Z. Q. Wang and the National Science
Foundation [CHE-1919594] to the University at Buffalo Chemistry In-
strument Center.
Declaration of competing interest
The authors declare that they have no known competing nancial
interests or personal relationships that could have appeared to inuence
the work reported in this paper.
Author contributions
Minakshi Mukherjee: Investigation, Methodology, Formal analysis,
Validation, Visualization, Writing – Original Draft, Writing -Review &
Editing. Rachael Hageman Blair: Software, Methodology, Formal
analysis, Visualization, Writing – Original Draft, Writing -Review &
Editing. Zhen Q. Wang: Conceptualization, Resources, Supervision,
Project administration, Funding acquisition, Writing – Original Draft,
Writing -Review & Editing.
Data availability
Data will be made available on request.
Acknowledgment
The authors are grateful to Dr. John Dueber for providing the raw
data for the relative promoter strengths of characterized yeast pro-
moters. We thank Dr. Sarah Walker for providing access to the Tecan
Spark microplate reader. We also thank Dr. Valerie Freichs for assistance
with developing the chromatography methods.
Appendix A. Supplementary data
Supplementary data to this article can be found online at https://doi.
org/10.1016/j.ymben.2022.10.004.
References
Albertsen, L., et al., 2011. Diversion of ux toward sesquiterpene production in
Saccharomyces cerevisiae by fusion of host and heterologous enzymes. Appl. Environ.
Microbiol. 77, 1033–1040.
Alemdar, S., et al., 2016. Heterologous expression, purication, and biochemical
characterization of alpha-Humulene Synthase from Zingiber zerumbet Smith. Appl.
Biochem. Biotechnol. 178, 474–489.
Alonso-Gutierrez, J., et al., 2015. Principal component analysis of proteomics (PCAP) as
a tool to direct metabolic engineering. Metab. Eng. 28, 123–133.
Anthony, J.R., et al., 2009. Optimization of the mevalonate-based isoprenoid
biosynthetic pathway in Escherichia coli for production of the anti-malarial drug
precursor amorpha-4,11-diene. Metab. Eng. 11, 13–19.
Antonenkov, V.D., Mindthoff, S., Grunau, S., Erdmann, R., Hiltunen, J.K., 2009. An
involvement of yeast peroxisomal channels in transmembrane transfer of glyoxylate
cycle intermediates. Int. J. Biochem. Cell Biol. 41, 2546–2554.
Belcher, M.S., Mahinthakumar, J., Keasling, J.D., 2020. New frontiers: harnessing pivotal
advances in microbial engineering for the biosynthesis of plant-derived terpenoids.
Curr. Opin. Biotechnol. 65, 88–93.
Breiman, L., 1996. Out-of-bag Estimation.
Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 2017. Classication and
Regression Trees. Routledge.
Breitling, R., Krisans, S.K., 2002. A second gene for peroxisomal HMG-CoA reductase? A
genomic reassessment. J. Lipid Res. 43, 2031–2036.
Brown, S., Clastre, M., Courdavault, V., O’Connor, S.E., 2015. De novo production of the
plant-derived alkaloid strictosidine in yeast. Proc. Natl. Acad. Sci. U. S. A. 112,
3205–3210.
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
148
Campbell, A., et al., 2016. Engineering of a nepetalactol-producing platform strain of
Saccharomyces cerevisiae for the production of plant seco-iridoids. ACS Synth. Biol. 5,
405–414.
Chen, W., Viljoen, A.M., 2010. Geraniol — a review of a commercially important
fragrance material. South Afr. J. Bot. 76, 643–651.
Chen, Y., Daviet, L., Schalk, M., Siewers, V., Nielsen, J., 2013. Establishing a platform
cell factory through engineering of yeast acetyl-CoA metabolism. Metab. Eng. 15,
48–54.
Chen, H., et al., 2018. Directed evolution of mevalonate kinase in Escherichia coli by
random mutagenesis for improved lycopene. RSC Adv. 8, 15021–15028.
Christianson, D.W., 2017. Structural and chemical biology of terpenoid cyclases. Chem.
Rev. 117, 11570–11648.
Cutler, D.R., et al., 2007. Random forests for classication in ecology. Ecology 88,
2783–2792.
Denby, C.M., et al., 2018. Industrial brewing yeast engineered for the production of
primary avor determinants in hopped beer. Nat. Commun. 9, 965.
Dusseaux, S., Wajn, W.T., Liu, Y., Ignea, C., Kampranis, S.C., 2020. Transforming yeast
peroxisomes into microfactories for the efcient production of high-value
isoprenoids. Proc. Natl. Acad. Sci. U. S. A. 117, 31789–31799.
Efron, B., LePage, R., 1992. Introduction to Bootstrap. Wiley & Sons, New York.
Engels, B., Dahm, P., Jennewein, S., 2008. Metabolic engineering of taxadiene
biosynthesis in yeast as a rst step towards taxol (paclitaxel) production. Metab. Eng.
10, 201–206.
Faulkner, A., et al., 1999. The LPP1 and DPP1 gene products account for most of the
isoprenoid phosphate phosphatase activities in Saccharomyces cerevisiae. J. Biol.
Chem. 274, 14831–14837.
Fernandes, E.S., et al., 2007. Anti-inammatory effects of compounds alpha-humulene
and (-)-trans-caryophyllene isolated from the essential oil of Cordia verbenacea. Eur.
J. Pharmacol. 569, 228–236.
Friedman J, H.T., Tibshirani, R., 2001. The Elements of Statistical Learning. Springer
series in statistics, New York.
Fu, Z., Voynova, N.E., Herdendorf, T.J., Miziorko, H.M., Kim, J.J., 2008. Biochemical
and structural basis for feedback inhibition of mevalonate kinase and isoprenoid
metabolism. Biochemistry 47, 3715–3724.
Garaiova, M., Zambojova, V., Simova, Z., Griac, P., Hapala, I., 2014. Squalene epoxidase
as a target for manipulation of squalene levels in the yeast Saccharomyces cerevisiae.
FEMS Yeast Res. 14, 310–323.
Garcia, D.E., Keasling, J.D., 2014. Kinetics of phosphomevalonate kinase from
Saccharomyces cerevisiae. PLoS One 9, e87112.
Gerke, J., et al., 2020. Production of the fragrance geraniol in peroxisomes of a product-
tolerant baker’s yeast. Front. Bioeng. Biotechnol. 8, 582052.
Gibson, D.G., 2011. Enzymatic assembly of overlapping DNA fragments. Methods
Enzymol. 498, 349–361.
Gold, N.D., et al., 2015. Metabolic engineering of a tyrosine-overproducing yeast
platform using targeted metabolomics. Microb. Cell Factories 14, 73.
Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E., 2015. Peeking inside the black box:
visualizing statistical learning with plots of individual conditional expectation.
J. Comput. Graph Stat. 24, 44–65.
Greenwell, B.M., 2017. pdp: an R Package for constructing partial dependence plots. Rev.
Javer. 9, 421.
Guirimand, G., et al., 2012. A single gene encodes isopentenyl diphosphate isomerase
isoforms targeted to plastids, mitochondria and peroxisomes in Catharanthus roseus.
Plant Mol. Biol. 79, 443–459.
Guo, X.J., et al., 2018. Metabolic engineering of Saccharomyces cerevisiae for 7-dehy-
drocholesterol overproduction. Biotechnol. Biofuels 11, 192.
Han, J.Y., Seo, S.H., Song, J.M., Lee, H., Choi, E.S., 2018. High-level recombinant
production of squalene using selected Saccharomyces cerevisiae strains. J. Ind.
Microbiol. Biotechnol. 45, 239–251.
Hinson, D.D., Chambliss, K.L., Toth, M.J., Tanaka, R.D., Gibson, K.M., 1997. Post-
translational regulation of mevalonate kinase by intermediates of the cholesterol and
nonsterol isoprene biosynthetic pathways. J. Lipid Res. 38, 2216–2223.
Hu, Z., et al., 2020. Improve the production of D-limonene by regulating the mevalonate
pathway of Saccharomyces cerevisiae during alcoholic beverage fermentation. J. Ind.
Microbiol. Biotechnol. 47, 1083–1097.
Ignea, C., Pontini, M., Maffei, M.E., Makris, A.M., Kampranis, S.C., 2014. Engineering
monoterpene production in yeast using a synthetic dominant negative geranyl
diphosphate synthase. ACS Synth. Biol. 3, 298–306.
Jiang, G.Z., et al., 2017. Manipulation of GES and ERG20 for geraniol overproduction in
Saccharomyces cerevisiae. Metab. Eng. 41, 57–66.
Jiang, L., et al., 2021. Improved functional expression of cytochrome P450s in
Saccharomyces cerevisiae through screening a cDNA library from Arabidopsis thaliana.
Front. Bioeng. Biotechnol. 9, 764851.
Kazieva, E., et al., 2017. Characterization of feedback-resistant mevalonate kinases from
the methanogenic archaeons Methanosaeta concilii and Methanocella paludicola.
Microbiology (Read.) 163, 1283–1291.
Kwak, S., et al., 2020. Redirection of the glycolytic ux enhances isoprenoid production
in Saccharomyces cerevisiae. Biotechnol. J. 15, e1900173.
Lee, M.E., DeLoache, W.C., Cervantes, B., Dueber, J.E., 2015. A highly characterized
yeast toolkit for modular, multipart assembly. ACS Synth. Biol. 4, 975–986.
Li, T., et al., 2020. Metabolic Engineering of Saccharomyces cerevisiae to overproduce
squalene. J. Agric. Food Chem. 68, 2132–2138.
Lin, J.-L., Ekas, H., Markham, K., Alper, H.S., 2018. An enzyme-coupled assay enables
rapid protein engineering for geraniol production in yeast. Biochem. Eng. J. 139,
95–100.
Liu, G.S., et al., 2020. The yeast peroxisome: a dynamic storage depot and subcellular
factory for squalene overproduction. Metab. Eng. 57, 151–161.
Lv, X., et al., 2016. Dual regulation of cytoplasmic and mitochondrial acetyl-CoA
utilization for improved isoprene production in Saccharomyces cerevisiae. Nat.
Commun. 7, 12851.
Ma, S.M., et al., 2011. Optimization of a heterologous mevalonate pathway through the
use of variant HMG-CoA reductases. Metab. Eng. 13, 588–597.
Madsen, K.M., et al., 2011. Linking genotype and phenotype of Saccharomyces cerevisiae
strains reveals metabolic engineering targets and leads to triterpene hyper-
producers. PLoS One 6, e14763.
McClory, J., Lin, J.T., Timson, D.J., Zhang, J., Huang, M., 2019. Catalytic mechanism of
mevalonate kinase revisited, a QM/MM study. Org. Biomol. Chem. 17, 2423–2431.
Mukherjee, M., Caroll, E., Wang, Z.Q., 2021. Rapid assembly of multi-gene constructs
using modular Golden Gate cloning. JoVE 168, e61993.
Navale, G.R., Dharne, M.S., Shinde, S.S., 2021. Metabolic engineering and synthetic
biology for isoprenoid production in Escherichia coli and Saccharomyces cerevisiae.
Appl. Microbiol. Biotechnol. 105, 457–475.
Nelson, D.L., Cox, M.M., 2004. Lehninger Principles of Biochemistry.
Nielsen, J., 2015. Bioengineering. Yeast cell factories on the horizon. Science 349,
1050–1051.
Orr-Weaver, T.L., Szostak, J.W., Rothstein, R.J., 1981. Yeast transformation: a model
system for the study of recombination. Proc. Natl. Acad. Sci. U. S. A. 78, 6354–6358.
Peng, B., Williams, T.C., Henry, M., Nielsen, L.K., Vickers, C.E., 2015. Controlling
heterologous gene expression in yeast cell factories on different carbon substrates
and across the diauxic shift: a comparison of yeast promoter activities. Microb. Cell
Factories 14, 91.
Peng, B., et al., 2017. A squalene synthase protein degradation method for improved
sesquiterpene production in Saccharomyces cerevisiae. Metab. Eng. 39, 209–219.
Pojer, F., et al., 2006. Structural basis for the design of potent and species-specic
inhibitors of 3-hydroxy-3-methylglutaryl CoA synthases. Proc. Natl. Acad. Sci. U. S.
A. 103, 11491–11496.
Popa, O., Babeanu, N.E., Popa, I., Nita, S., Dinu-Parvu, C.E., 2015. Methods for obtaining
and determination of squalene from natural sources. BioMed Res. Int. 2015, 367202.
Primak, Y.A., et al., 2011. Characterization of a feedback-resistant mevalonate kinase
from the archaeon Methanosarcina mazei. Appl. Environ. Microbiol. 77, 7772–7778.
Pyne, M.E., et al., 2020. A yeast platform for high-level synthesis of
tetrahydroisoquinoline alkaloids. Nat. Commun. 11, 3337.
Redding-Johanson, A.M., et al., 2011. Targeted proteomics for metabolic pathway
optimization: application to terpene production. Metab. Eng. 13, 194–203.
Ro, D.K., et al., 2006. Production of the antimalarial drug precursor artemisinic acid in
engineered yeast. Nature 440, 940–943.
Rodriguez, A., Kildegaard, K.R., Li, M., Borodina, I., Nielsen, J., 2015. Establishment of a
yeast platform strain for production of p-coumaric acid through metabolic
engineering of aromatic amino acid biosynthesis. Metab. Eng. 31, 181–188.
Ryan, O.W., et al., 2014. Selection of chromosomal DNA libraries using a multiplex
CRISPR system. Elife 3, e03703.
Sapir-Mir, M., et al., 2008. Peroxisomal localization of Arabidopsis isopentenyl
diphosphate isomerases suggests that part of the plant isoprenoid mevalonic acid
pathway is compartmentalized to peroxisomes. Plant Physiol. 148, 1219–1228.
Sauro, H.M., 2017. Control and regulation of pathways via negative feedback. J. R. Soc.
Interface 14.
Scalcinati, G., et al., 2012. Dynamic control of gene expression in Saccharomyces
cerevisiae engineered for the production of plant sesquitepene alpha-santalene in a
fed-batch mode. Metab. Eng. 14, 91–103.
Sibirny, A.A., 2016. Yeast peroxisomes: structure, functions and biotechnological
opportunities. FEMS Yeast Res. 16.
Simkin, A.J., et al., 2011. Peroxisomal localisation of the nal steps of the mevalonic acid
pathway in planta. Planta 234, 903–914.
Street, I.P., Christensen, D.J., Poulter, C.D., 1990. Hydrogen exchange during the
enzyme-catalyzed isomerization of isopentenyl diphosphate and dimethylallyl
diphosphate. J. Am. Chem. Soc. 112, 8577–8578.
Trikka, F.A., et al., 2015. Iterative carotenogenic screens identify combinations of yeast
gene deletions that enhance sclareol production. Microb. Cell Factories 14, 60.
Verwaal, R., et al., 2007. High-level production of beta-carotene in Saccharomyces
cerevisiae by successive transformation with carotenogenic genes from
Xanthophyllomyces dendrorhous. Appl. Environ. Microbiol. 73, 4342–4350.
Vickers, C.E., Bydder, S.F., Zhou, Y., Nielsen, L.K., 2013. Dual gene expression cassette
vectors with antibiotic selection markers for engineering in Saccharomyces cerevisiae.
Microb. Cell Factories 12, 96.
Wang, X., et al., 2021. Engineering Escherichia coli for production of geraniol by
systematic synthetic biology approaches and laboratory-evolved fusion tags. Metab.
Eng. 66, 60–67.
Westfall, P.J., et al., 2012. Production of amorphadiene in yeast, and its conversion to
dihydroartemisinic acid, precursor to the antimalarial agent artemisinin. Proc. Natl.
Acad. Sci. U. S. A. 109, E111–E118.
Xie, W., Lv, X., Ye, L., Zhou, P., Yu, H., 2015. Construction of lycopene-overproducing
Saccharomyces cerevisiae by combining directed evolution and metabolic
engineering. Metab. Eng. 30, 69–78.
Yao, Z., et al., 2018. Enhanced isoprene production by reconstruction of metabolic
balance between strengthened precursor supply and improved isoprene synthase in
Saccharomyces cerevisiae. ACS Synth. Biol. 7, 2308–2316.
Yee, D.A., et al., 2019. Engineered mitochondrial production of monoterpenes in
Saccharomyces cerevisiae. Metab. Eng. 55, 76–84.
Yuan, J., Ching, C.B., 2014. Combinatorial engineering of mevalonate pathway for
improved amorpha-4,11-diene production in budding yeast. Biotechnol. Bioeng.
111, 608–617.
Zhang, C., et al., 2018. Production of sesquiterpenoid zerumbone from metabolic
engineered Saccharomyces cerevisiae. Metab. Eng. 49, 28–35.
M. Mukherjee et al.
Metabolic Engineering 74 (2022) 139–149
149
Zhang, C., Li, M., Zhao, G.R., Lu, W., 2020. Harnessing yeast peroxisomes and cytosol
acetyl-Coa for sesquiterpene alpha-humulene production. J. Agric. Food Chem. 68,
1382–1389.
Zhao, J., et al., 2017. Dynamic control of ERG20 expression combined with minimized
endogenous downstream metabolism contributes to the improvement of geraniol
production in Saccharomyces cerevisiae. Microb. Cell Factories 16, 17.
Zhou, Y.J., et al., 2012. Modular pathway engineering of diterpenoid synthases and the
mevalonic acid pathway for miltiradiene production. J. Am. Chem. Soc. 134,
3234–3241.
Zhou, P., et al., 2018. Crystal structure of cytoplasmic acetoacetyl-CoA thiolase from
Saccharomyces cerevisiae. Acta Crystallogr F Struct Biol Commun 74, 6–13.
M. Mukherjee et al.