ArticlePDF Available

Abstract and Figures

Transcription activator-like effectors (TALEs) have revolutionized the field of genome engineering. We present here a systematic assessment of TALE DNA recognition, using quantitative electrophoretic mobility shift assays and reporter gene activation assays. Within TALE proteins, tandem 34-amino acid repeats recognize one base pair each and direct sequence-specific DNA binding through repeat variable di-residues (RVDs). We found that RVD choice can affect affinity by four orders of magnitude, with the relative RVD contribution in the order NG > HD ∼ NN ≫ NI > NK. The NN repeat preferred the base G over A, whereas the NK repeat bound G with 103-fold lower affinity. We compared AvrBs3, a naturally occurring TALE that recognizes its target using some atypical RVD-base combinations, with a designed TALE that precisely matches ‘standard’ RVDs with the target bases. This comparison revealed unexpected differences in sensitivity to substitutions of the invariant 5′-T. Another surprising observation was that base mismatches at the 5′ end of the target site had more disruptive effects on affinity than those at the 3′ end, particularly in designed TALEs. These results provide evidence that TALE–DNA recognition exhibits a hitherto un-described polarity effect, in which the N-terminal repeats contribute more to affinity than C-terminal ones.
Content may be subject to copyright.
Quantitative analysis of TALE–DNA interactions
suggests polarity effects
Joshua F. Meckler
1
, Mital S. Bhakta
1
, Moon-Soo Kim
1
, Robert Ovadia
2
,
Chris H. Habrian
2
, Artem Zykovich
1
, Abigail Yu
1
, Sarah H. Lockwood
1
,
Robert Morbitzer
3
, Janett Elsa
¨esser
3
, Thomas Lahaye
3
, David J. Segal
1,
* and
Enoch P. Baldwin
2,
*
1
Genome Center and Department of Biochemistry and Molecular Medicine, University of California, Davis,
CA 95616, USA,
2
Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA
and
3
Department of Biology, Institute of Genetics, Ludwig-Maximilians-University Munich, 82152 Martinsried,
Germany
Received November 26, 2012; Revised January 18, 2013; Accepted January 23, 2013
ABSTRACT
Transcription activator-like effectors (TALEs) have
revolutionized the field of genome engineering. We
present here a systematic assessment of TALE DNA
recognition,using quantitative electrophoretic mo-
bility shift assays and reporter gene activation
assays. Within TALE proteins, tandem 34-amino
acid repeats recognize one base pair each and direct
sequence-specific DNA binding through repeat
variable di-residues (RVDs). We found that RVD
choice can affect affinity by four orders of magni-
tude, with the relative RVD contribution in the order
NG >HD NN NI >NK. The NN repeat preferred
the base G over A, whereas the NK repeat bound G
with 10
3
-fold lower affinity. We compared AvrBs3, a
naturally occurring TALE that recognizes its target
using some atypical RVD-base combinations, with a
designed TALE that precisely matches ‘standard’
RVDs with the target bases. This comparison
revealed unexpected differences in sensitivity to
substitutions of the invariant 50-T. Another surpri-
sing observation was that base mismatches at the
50end of the target site had more disruptive effects
on affinity than those at the 30end, particularly in
designed TALEs. These results provide evidence
that TALE–DNA recognition exhibits a hitherto un-
described polarity effect, in which the N-terminal
repeats contribute more to affinity than C-terminal
ones.
INTRODUCTION
Transcription activator-like effectors (TALEs) are
sequence-specific DNA-binding proteins that the bacterial
pathogen Xanthomonas injects into plant cells. Inside the
plant cell, they bind to and activate specific host pro-
moters (1). Their promoter specificity is conferred by a
series of tandem protein repeats, typically 34 amino
acids in length. Unlike any previously described DNA-
binding domain, each repeat recognizes a single DNA
base pair. Amino acids at positions 12 and 13, known as
repeat variable di-residues (RVDs), determine the base
preferences of a repeat. Deciphering the correspondence
between RVD composition and target DNA bases created
the ‘TALE DNA binding code’, making TALEs the
first DNA-binding protein class for which robust and
comprehensive rules of DNA recognition are known
(2,3). Sequence-specific DNA binding is achieved by
simple assembly of individual repeats with desired base
specificities.
Recent crystallographic work revealed the structural
basis for TALE–DNA recognition (4,5). Each repeat
consists of two alpha helices connected by a three-residue
loop that contains the RVDs (the ‘RVD loop’). Sequential
repeats interact to form a solenoid that binds to one DNA
strand, with the TALE N-terminal to C-terminal direction
aligned with the DNA 50to 30direction. Position 13
contacts the target base in the major groove through
hydrogen bonds or van der Waals interactions, while
position 12 stabilizes the RVD loop structure. Thus,
repeat sequence preferences are essentially determined by
a single amino acid–base interaction.
*To whom correspondence should be addressed. Tel: +1 530 754 9134; Fax: +1 530 754 9658; Email: djsegal@ucdavis.edu
Correspondence may also be addressed to Enoch P. Baldwin. Tel: +1 530 752 1108; Fax: +1 530 752 3085; Email: epbaldwin@ucdavis.edu
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.
Present address:
Moon-Soo Kim, Department of Chemistry, 1906 College Heights Boulevard, Western Kentucky University, Bowling Green, KY 42101, USA.
Nucleic Acids Research, 2013, 1–11
doi:10.1093/nar/gkt085
ßThe Author(s) 2013. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research Advance Access published February 13, 2013
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
Because of their modularity, easy programmability and
reliability, TALE proteins have become the preferred
DNA-binding domain to create artificial transcription
factors (ATFs) and nucleases (TALENs), and have
rapidly transformed the field of genome engineering
(6–10). These properties also provide a special opportunity
for synthetic biology, in which quantitative and predict-
able interactions of modular transcription factor and
promoter parts are required to engineer gene regulatory
circuits (11). However, few direct quantitative assessments
of TALE–DNA affinities and specificities have been
reported and, as of yet, there is no predictive framework
in place. Extant data largely consist of cell-based tran-
scription factor reporter assays in which the readout can
be complicated by numerous factors in addition to TALE
DNA-binding affinity. To understand and predict TALE
repeat affinities, specificities and functionalities, quantita-
tive binding data are required for members of this import-
ant new class of DNA-binding domain.
We present here the first systematic quantitative
assessment of TALE DNA affinity and recognition. We
combined quantitative gel shift and transcription reporter
assays to explore the relative affinities of individual RVDs
in vitro and their relation to activity in vivo. We also
examined the specificity of two G-recognition RVDs,
NN and NK, and the distribution of binding affinity
over the length of the repeat region. Our data provide
physical explanations and quantification of previously
reported trends and suggest some complexities in this
seemingly simple mode of DNA recognition. In particular,
we demonstrate that the N-terminal TALE repeats
interact more strongly with DNA than the C-terminal
repeats, suggesting a polarity effect of TALE binding.
MATERIALS AND METHODS
Designed TALE construction
Designed TALE (dTALE) repeat arrays were modularly
assembled using the Golden Gate cloning reagents
described in (12), with slight modifications to the proced-
ures. The 17.5-repeat arrays were assembled in two steps
of cut-ligation reactions. The first reaction assembled two
five-repeat arrays and one seven-repeat array. Each
cut-ligation reaction used 75 ng of appropriate plasmids
with BsaI-HF (New England Biolabs) and T4 ligase
(New England Biolabs) that were incubated at 37C for
5 h. On sequence verification, the three segments were
assembled in a second cut-ligation reaction using a
vector containing the last half-repeat to form a complete
17.5-repeat array (5+5+7+0.5 = 17.5). Final 17.5-repeat
arrays were cloned by StuI/AatII digestion into
pPreTALE
111-42
and pPreTALE
94-42
, which contained
truncated N- and C-termini of the naturally occurring
TALE PthXo1 in pAH103 (5), generated by polymerase
chain reaction (PCR) using primers listed in Supple-
mentary Table S1.XhoI and AgeI sites incorporated at
the termini allowed subcloning of entire dTALEs into ex-
pression vectors.
In this work, the RVD-containing repeat region is taken
as starting at the beginning of the ‘0 repeat’ (residue 255
for AvrBs3, LTDGQ ...) and ending at the end of the
complete ‘0.5 repeat’ (residue 897, ...SRPDP). Thus, the
111-42 truncation refers to a variant that retains 111
N-terminal and 42 C-terminal residues from the full-
length TALE, appended to the RVD repeat region
(Supplementary Figure S1A).
Protein preparations
AvrBs3
254-180
and dTALEs were cloned using BamHI/AgeI
and XhoI/AgeI, respectively, into pMAL-TEV, a prokary-
otic expression plasmid derived from pMAL-c5x (New
England Biolabs) that contained a site for the Tobacco
Etch Virus (TEV) protease. TALE reading frames were
bounded by an N-terminal maltose-binding protein (MBP)
tag, a TEV protease cleavage site and a His
6
C-terminal
His-tag (Supplementary Figure S1B). Tandem affinity puri-
fication allowed isolation of homogeneous full-length MBP-
TALE–His
6
fusion proteins (Supplementary Figure S2B).
BL21 cells (Novagen) were transformed and grown over-
night on Luria Broth agar containing 100mg/ml carbenicil-
lin. Single colonies were inoculated into 25 ml of Luria Broth
containing 100 mg/ml carbenicillin, and grown with vigorous
shaking at 37C. At an OD600 of 0.4, incubation was
continued at 30C to an OD600 of 0.6–0.8. Isopropyl-b-
D-thiogalactopyranoside (IPTG) (0.1 mM final) was
added and the cultures were shaken at 30C for 3–4 h.
Cells were pelleted (10 min, 2000g) and stored at 80C.
Purification was carried out at 4C, and all buffers contained
2 mM sodium azide. Cells were resuspended in 40 ml lysis/
wash buffer (500mM NaCl, 5 mM imidazole, 20 mM
Tris-Cl, pH 7.9) and lysed using a microfluidizer
(Microfluidics Corp., Model M100-Y). The resulting
lysate, including washes (100 ml total), was clarified by cen-
trifugation (40 min, 15 000g), and the supernatant was
passed through a 2-ml column bed of Ni-IDA resin (2–
3 ml/min, Novagen). The column was washed with 100ml
of lysis/wash buffer, 100ml of high-salt wash buffer (2 M
NaCl, 5 mM imidazole, 20 mM Tris-Cl, pH 7.9) to com-
pletely remove bound nucleic acids and another 100 ml of
lysis/wash buffer. The MBP–TALE fusion proteins were
eluted in five 2-ml fractions using His elution buffer
(500 mM NaCl, 500 mM imidazole, 20 mM Tris-Cl, pH
7.9). Fractions containing more than 0.1 OD280 were
passed through a 1-ml Luer lock syringe column containing
0.75 ml of amylose resin (New England Biolabs) (0.3 ml/
min). The columns were then washed with 20 ml of TALE
storage buffer (480 mM KCl, 1.6mM ethylenediaminete-
traacetic acid (EDTA), 2 mM dithiothreitol (DTT), 12mM
Tris-Cl, pH 7.5). The highly purified fusion protein was
eluted in 0.5-ml aliquots with TALE storage buffer contain-
ing 10 mM maltose. The most concentrated fractions (1 to 4
OD280) were dialyzed against 2 300 ml of TALE storage
buffer, quantified by ultraviolet absorbance and flash-frozen
at 80Cin50-mL aliquots. The zinc finger DNA-binding
domain of Zif268 (13) was subcloned into pMAL-TEV and
purified as described for dTALEs, except buffers contained
100 mMZnCl
2
. The molar extinction coefficients at 280 nm
were 81820, 81820, 92820 and 69330 for the MBP-
TALE
111-42
, MBP-TALE
94-42
, MBP-AvrBs3
254-180
and
MBP-Zif268 proteins, respectively (EXPASY). Typical
2Nucleic Acids Research, 2013
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
final concentrations were 10–20 mM, with an overall yield of
1–5 mg protein (4–20 mg/l media). These proteins main-
tained binding activity for at least 1 week at 4C. For
cases in which the MBP tag was removed, quantitative
cleavage was achieved by incubation of 10–20 mM dTALE
with TEV proteinase (20 mg/ml final, a gift from Chris
Fraser, UC Davis) and 5 mM DTT overnight at 4C.
Electrophoretic mobility shift assay
Biotin-labeled DNA targets were generated by PCR amp-
lification using a 50biotinylated forward primer of 69-mer
oligonucleotides containing 19-base pair TALE target
sites or the Zif268 site (Supplementary Table S2). PCR
reactions contained unlabeled reverse primer in a 4:1
ratio over the biotinylated primer. Amplified targets
were column purified (Qiagen). Binding reactions were
mixed on ice and placed in the dark for 1 h at room
temperature (22C) in 1TALE electrophoretic mobil-
ity shift assay (EMSA) buffer (12 mM Tris-Cl, pH 7.5,
60 mM KCl, 2 mM DTT, 0.05% NP-40, 50 ng/mL
double-stranded poly (deoxyinosine-deoxycytosine)n
(dIdC), 0.1 mg/ml bovine serum albumin (BSA), 5%
glycerol, 5 mM MgCl
2
, 0.2 mM EDTA). As indicated,
zinc finger binding reactions were performed in 1 x
TALE EMSA buffer supplemented with 100 mM ZnCl
2
,
or in Zinc Buffer A (ZBA: 100 mM Tris, 90 mM KCl,
1 mM MgCl
2
, and 90 mM ZnCl
2
, pH 7.5) with 5%
glycerol, 0.1 mg/ml BSA, 0.05% NP-40. All binding reac-
tions contained 25–55 pM target DNA, and purified
proteins with a concentration of 0.1 – 2500 nM. After
the room-temperature binding reaction, samples were
placed at 4C for 30 min. For all experiments, besides
the ‘polarity’ assays, gel electrophoresis was performed
on a 1.3% agarose gel using Amresco Biotechnology
Grade Agarose I in 0.5tris-borate-EDTA (TBE)
buffer (Bio-Rad). Gels were pre-run at 105 V in 0.5
TBE buffer at 4C for 30 min before loading. Binding re-
actions were loaded onto the gel while the current was on,
and run for 20–30 min. Using a wet-transfer apparatus
(Bio-Rad), the DNA was blotted onto a Biodyne B
nylon membrane (Pierce) for 20 min at 100 V at 4C.
The DNA was cross-linked to the membrane with an
ultraviolet cross-linker (Stratagene) for 4 min. The
biotinylated DNA was visualized using the LightShift
Chemiluminescent EMSA Kit (Pierce) according to the
manufacturer’s protocol. Equilibrium binding constants
(apparent K
D
) were calculated from protein titration ex-
periments. Gel images on X-ray film (Denville Scientific)
were scanned and then quantitated using ImageJ. All
reported EMSA measurements were averages of at least
three experiments performed with independent protein di-
lutions. For the ‘polarity’ gel shifts, the protocol was the
same, except that tris-borate (TB) buffer was substituted
for TBE buffer (in the gel and running buffer) at identical
concentrations. Representative EMSA data are shown in
Supplementary Figure S3.
ATF Assay
dTALEs were cloned using XhoI and AgeI into the
phosphoglycerate kinase (PGK) promoter-driven
mammalian expression vector pPGK-VP64 (14), which
appended an N-terminal HA epitope tag and nuclear lo-
calization sequence, and a C-terminal VP64 transcrip-
tional activation domain (15)(Supplementary Figure
S1C). Target sites for the dTALEs were cloned between
NotI and XhoI sites upstream of the SV40 promoter in
pGL3-control plasmids (Promega), using primers listed
in Supplementary Table S4. In 24-well plates, HEK293T
cells at 80% confluency in Dulbecco’s Modified Eagle
Medium (DMEM) supplemented with 10% fetal calf
serum, 1 U/ml of penicillin and 1 mg/ml of streptomycin
were co-transfected with 100 ng of dTALE ATF expres-
sion plasmid, 25 ng of modified pGL3-control firefly
luciferase reporter plasmid containing a dTALE target
site and 25 ng of pRL-TK-Renilla Luciferase plasmid (as
a transfection control, Promega), using Lipofectamine
2000 (Invitrogen). Cells were harvested 48 h post-
transfection by removing media, washing with 500 mLof
1Dulbecco’s Phosphate-Buffered Saline (DPBS) and
then followed by lysis in 100 mLof1passive lysis
buffer (Promega) with 1complete protease inhibitors
(Roche). Clarified cell lysates (20 mL) were used to deter-
mine luciferase activity using DualGlo reagents (40 mL,
Promega) in a Veritas microplate luminometer (Turner
Biosystems). All experiments were performed in duplicate
and repeated on two different days.
Binding site specificity assay using massively parallel
sequencing (Bind-n-Seq)
Bind-n-Seq was performed essentially as described (13)
using a full-length MBP–AvrBs3 fusion protein
(AvrBs3
254-267
) as bait. AvrBs3
254-267
was purified from
induced cells using amylose affinity resin (New England
Biolabs) according to manufacturer’s instructions. Bar-
coded 93-mer double-stranded oligonucleotide targets con-
taining Illumina primer binding sites and a 21-nt random
region were incubated with 450 nM AvrBs3
254-267
in 1
TALE EMSA buffer. Bound complexes were precipitated
using amylose resin and enriched by six wash steps in the
corresponding salt buffer. Eluted DNA was sequenced on
an Illumina sequencer. Sequencing reads were filtered and
sorted using custom Perl scripts found in the MERMADE
package, an updated version of the Bind-n-Seq data
analysis pipeline. MERMADE is freely available with
user documentation at http://korflab.ucdavis.edu/
Datasets/BindNSeq. Briefly, high-quality reads (composed
only of A, C, T or G, with a valid constant region [‘AA’]
and unique random region) were retained and split into
separate files based on their unique 3-nt barcode
(MERMADE scripts: sequence_converter.pl, debarcode
.pl). For motif analysis, recovered sequences were
analyzed relative to a file of unenriched background
21-mer sequences using a sliding window of 6-12 bp
(MERMADE scripts: kmer_counter.pl, kmer_selector.pl).
Sequences showing 2-fold enrichment relative to back-
ground were then analyzed by MERMADE using an itera-
tive motif searching approach (MERMADE scripts:
mermade.pl, motif_expander.pl). The graphical representa-
tion of the sequence motif was rendered using WebLogo.
Nucleic Acids Research, 2013 3
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
RESULTS
Generating a scaffold
Natural TALE proteins consist of a series of DNA-
binding repeats flanked by 250-residue N-terminal and
C-terminal extensions that direct transcriptional regula-
tion and protein localization (Figure 1A). To develop a
biochemically tractable scaffold suitable for DNA affinity
measurements, we truncated the N- and C-termini of a
TALE composed of an AvrBs3 StuI/AatII fragment
central core bounded by PthXol flanking sequences
(Supplementary Figure S1A). Using Jpred secondary-
structure predictions (17) to indicate ordered boundaries
in the flanking sequences, we designed two N-terminal
truncations containing 111 or 94 residues upstream of
the 0 repeat and one C-terminal truncation containing
42 residues downstream of the terminal repeat,
AvrBs3
111-42
and AvrBs3
94-42
, respectively (Figure 1A
and Supplementary Figure S1A). Significantly, digestion
of an MBP–AvrBs3 fusion protein with Factor Xa, a
site-specific protease that also cleaves after Arg residues
in unstructured regions, yielded a 77-kD fragment with
full DNA-binding activity (Supplementary Figure S2A).
Edman sequencing indicated that Factor Xa cleaved 114
residues N-terminal to the 0 repeat, whereas fragment size
and Factor Xa arginine specificity suggested that the
C-terminal flank was cleaved 37 or 39 residues after the
terminal repeat. As a reference, we also produced a nearly
full-length BamHI/AgeI fragment of the natural AvrBs3
(18), which contained 254 - and 180-residue native N- and
C-terminal extensions, respectively (AvrBs3
254-180
). The
proteins were expressed as fusions with an N-terminal
MBP affinity tag and a TEV protease-cleavable linker,
as well as a C-terminal His
6
affinity tag. A two-column
affinity purification scheme yielded milligram quantities of
homogeneous, full-length, soluble MBP–His
6
fusion
proteins (Supplementary Figure S2B).
EMSAs (Supplementary Figure S3) were performed
with a DNA target, Bs3, which contained the 19-bp ‘Bs3
box’ bound by AvrBs3 (3)(Figure 1B, Supplementary
Table S2). The presence of the MBP tag did not affect
binding affinity, as its removal by TEV protease
cleavage had no significant effect on apparent dissociation
constant (K
D
) values (Figure 1C). AvrBs3
254-180
and
AvrBs3
111-42
had nearly identical K
D
values for the Bs3
box site, 3-4 nM. In contrast, AvrBs3
94-42
bound Bs3
poorly, with a K
D
of 220 nM. To compare the function-
ality of the two scaffolds in cells, we developed an ATF
reporter assay, in which a TALE–VP64 activation domain
fusion protein drove expression of a luciferase reporter
gene through an SV40 promoter with an upstream
TALE target site. In agreement with the affinity data,
the 111-42 framework showed a nearly 4-fold activation
over background, whereas the 94-42 framework did not
activate, with the promoter containing the Bs3 box target
site (Figure 1C). Interestingly, AvrBs3
254-180
produced
3-fold more gene activation than AvrBs3
111-42
. This
result was unexpected because in vitro AvrBs3
111-42
bound as well as AvrBs3
254-180
.Xanthomonas-delivered
AvrBs3 has been shown to activate multiple host plant
promoter sequences. Alignment of these sequences
resulted in the identification of an AvrBs3 consensus
target sequence known as the UPA box (2)(Figure 1B,
Supplementary Table S3). We inserted the UPA box in
place of the Bs3 box and repeated the ATF assay
(Figure 1C). Interestingly, all three proteins performed
better on the UPA box as compared with the Bs3 box-
containing promoter. However, unlike with the Bs3 box,
the 254-180 and 111-42 frameworks produced a similar
13-fold activation. Again, the 94-42 framework
yielded 3 - to 4-fold lower activation, indicating that
>94 N-terminal flanking residues are required for
Figure 1. Affinity and transcriptional activation data for several
AvrBs3 variants. (A) Schematic of a TALE polypeptide showing the
18 RVD-containing repeats with N- and C-terminal flanking regions.
The ‘0 repeat’ is shown in white. The numbers indicate the lengths of
the N- and C-terminal extensions outside the repeat region used in the
different constructs described in this work. A comprehensive survey of
N- and C-terminal boundaries used in previous TALE studies is given
in Supplementary Figure S1.(B) RVD amino acid composition of
AvrBs3 (first row), along with the sequence of a natural DNA target,
Bs3 (third row), and the consensus AvrBs3 site, UPA (2)(fourth row).
The RVD composition of the dAvrBs3 variant, which contains only the
standard NI, HD and NG RVDs and no mismatches to the Bs3 box
target site, is also shown (second row). AvrBs3 RVDs that are
‘non-standard’ or mismatched to Bs3, and the corresponding RVDs
in dAvrBs3, are underlined. The UPA site bases that differ from Bs3
are also underlined. (C) EMSA and ATF activation data were obtained
as described in Materials and Methods. Target site sequences and RVD
compositions are listed in Supplementary Tables S2 and S3. The
affinity of Zif268 was measured in TALE 1binding buffer. Zif268
affinity measured in a standard zinc-finger binding buffer (16) was more
typical, 11 ± 4 nM.
4Nucleic Acids Research, 2013
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
high-affinity binding comparable with AvrBs3
254-180
. The
111-42 framework was used for all subsequent
experiments.
Using dTALEs to interrogate RVD relative affinities
To compare the binding affinities of the five RVDs that
have been widely used to program dTALE specificities
(HD, NI, NG, NN and NK), we used a ‘host-guest’
design in which 10 ‘guest’ positions containing the
RVDs to be tested were interspersed with eight constant
‘host’ RVDs in a largely alternating pattern. The base 50to
the target site was kept constant as T. Host contexts I – III
(Table 1) sampled all base-step and base-triple combin-
ations to examine potential context effects. Importantly,
this setup avoided the structural peculiarities of
homopolymeric runs in the target DNA sequences.
Because the host repeats remained constant in each of
the three contexts, we reasoned that their contribution
to binding could be accounted for, allowing us to make dir-
ect comparisons based on only the guest repeat identity.
Fifteen dTALEs were constructed and matched with
corresponding DNA targets (Table 1). dTALE I-NIp
refers to a protein with host context I and NI RVD-
containing repeats at the guest positions, whereas the
cognate DNA target is referred to as I-A. dTALEs con-
taining G-recognizing NN and NK guests were compared
against identical G-containing target sites (e.g. I-G, II-G
and III-G). The proteins were expressed in E. coli and
purified to homogeneity (Supplementary Figure S2B).
EMSA revealed that their apparent K
D
s spanned
four orders of magnitude, from 160 pM to 1.8 mM
(Figure 2A, Table 1). Several trends became immediately
apparent. The repeat type was the largest factor in affinity
differences. The dTALEs with NG, HD and NN guest
RVDs bound their targets with high affinity (160 pM –
2.4 nM). The strongest affinities were for the three NG
guest proteins, whereas III-NNp and III-HDp also had
picomolar affinity. The NK guest dTALEs bound least
well in all three contexts. Although consistently better
than NK, the NI guest dTALEs also bound poorly but
with more variation. I-NIp had a K
D
of 240 nM, but
III-NIp bound with a K
D
of 27 nM. This 9-fold difference
in affinity, despite the two proteins sharing the same
overall distribution of repeat types, clearly demonstrates
the potential for significant contextual effects. Excluding
the NK-containing proteins, context III was the most fa-
vorable setting for NI, NN and HD proteins, with a 3 - to
9-fold advantage over contexts I and II. Taken in whole,
the gel shift data suggested that the relative affinities of
individual repeats can be ordered as NG >HD NN
NI >NK.
In the ATF assay, reporter gene activation by the
dTALE series ranged from 1.4 - to 19-fold (Figure 2B,
Table 1). All three HD guest proteins were strong activa-
tors, with levels at least 10-fold over background. As pre-
dicted from the binding data, the three NK guest proteins
were the poorest activators. The correlation of affinity to
activation was not linear. A simple log-log model
produced a reasonable fit of the data (R
2
= 0.68,
P= 0.0002, Supplementary Figure S4), but several
dTALEs displayed considerable deviation. For example,
the tight-binding III-NGp produced relatively low activa-
tion (4-fold) compared with the two other NG guest
proteins (>10-fold activation levels). Conversely,
moderate-binding II-NNp showed the highest activation
of the entire set (19-fold), higher than the other NN
guests. There was an apparent demarcation in activation
between 2.4 nM and 27 nM affinities (P=210
12
, based
on a comparison of 60 individual non-averaged fold acti-
vation values in the two categories using a two-tailed
heteroscedastic Student t-test). The six dTALEs with
apparent K
D
27 nM had an average fold activation of
2.7 fold, whereas those with K
D
2.4 nM had an average
fold activation of 10.2. Considering fold activation alone,
the effectiveness of the five repeat types studied here would
be HD NG NN NI >NK.
The NN RVD prefers G over A
According to correlations of natural TALE RVDs and
their target sites (2,3), the NN repeat has a similar prefer-
ence for G and A, and there are cases where this has been
shown in dTALEs (2,19). However, other reports (8,20)
have shown instances where NN displays an apparent
preference for G. We compared NN repeat binding to G
and A by measuring I-NNp, II-NNp and III-NNp
affinities to the corresponding hosts containing G or A
guests (Figure 2C and Supplementary Table S4). When
the NN guest repeats were paired with A rather than G
in contexts I and II, binding was reduced 49 - and 41-fold,
respectively. In context III, the reduction was less severe,
but was still >17-fold. The same trend was apparent in the
ATF assay, with an 8-fold reduction in activation in
context II, and reduction to background levels in
contexts I and III. These data indicate that NN RVDs
prefer binding G over A, although on a per-repeat basis,
the binding energy differences are relatively small (see
Discussion).
Natural and designed versions of AvrBs3 differ in their
requirement for a 50T
Most naturally occurring TALEs bind to DNA sequences
beginning with a T (2,3), but dTALEs have been reported
to recognize targets that have bases other than T in the 50
position (8,21) and in one case, the 50-T requirement was
dependent on the N-terminal extension length (21). One
major difference between natural TALEs and dTALEs
produced by available assembly kits is that the artificial
proteins (22,23) are predominantly constructed using just
the HD, NG, NI, NN and NK RVDs. In contrast, most
natural TALEs contain one or more ‘non-standard’
RVDs, such as NS, N* (residue 13 deleted), HG and
others (1,2). In addition, ‘mismatches’ between RVDs
and their consensus bases are rather typical. In fact,
there has not been a single documented case of a naturally
occurring TALE and a plant target promoter with a
perfect code-predicted match. For example, AvrBs3,
when bound to its Bs3 box target, has two HD-A
mismatches and an NG-C mismatch. Additionally,
AvrBs3 contains three ‘non-standard’ NS RVDs.
Nucleic Acids Research, 2013 5
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
To assess the influence of the non-standard and mis-
matched RVDs in AvrBs3, we used the TALE code to
create dAvrBs3
111-42
, composed of standard RVDs that
match perfectly to the Bs3 box (Figure 1B). Both
dAvrBs3
111-42
and AvrBs3
111-42
performed similarly in
EMSA and ATF reporter assays (Figure 1C), suggesting
that the non-standard RVDs did not confer an obvious
binding advantage to AvrBs3
111-42
. Surprisingly, the two
proteins displayed markedly different behavior against
target sites substituted at the 50-T (Figure 3 and Supple-
mentary Table S5). Substitution with A, C or G reduced
AvrBs3
111-42
binding affinity by 13 - to 20-fold
(Figure 3A), and reduced ATF reporter activity to back-
ground levels (Figure 3B). Thus, for AvrBs3
111-42
,a5
0-T is
essential. In contrast, 50A, C or G reduced affinity for
dAvrBs3
111-42
by only 2 - to 3-fold, and activated gene
expression only slightly less than with a 50-T. These data
suggest that the non-standard and/or mismatched repeats
in naturally occurring proteins, which are generally not
included in engineered dTALEs, may play an important
role in binding specificity.
TALE proteins display a binding polarity, favoring the
target sequence 50end
The N- to C-terminal directionality of the TALE–DNA
interaction gives rise to the question of whether DNA
affinity and specificity are evenly distributed over the
length of the repeats, or concentrated in particular
regions. We used the Bind-n-Seq assay (13) to probe the
sequence binding preferences of AvrBs3
254-267
. Bind-n-Seq
is an in vitro target site selection assay that presents
proteins with a 21-bp randomized DNA library, and
bound oligonucleotides are analyzed by high-throughput
sequencing. DNAs bound to AvrBs3
254-267
were enriched
for bases on the 50end of the consensus UPA target
(Figure 4A, top). In contrast, no enrichment was seen
for the specified bases at the 30end. To ensure the
apparent enrichment was not an artifact of the
motif-finding method, a sliding window was used to
identify 6-mer segments of the Bs3 target site in the
sequencing reads from the AvrBs3
254-267
-selected library.
Again, the library was enriched with only 6-mers from the
50end of the binding site but not the 30end (Figure 4A,
Table 1. RVD composition, DNA target sites and affinity and transcriptional activation data for a series of 15 dTALEs
Three host contexts, I, II and III, with 10 guest RVDs indicated (shaded). The RVD types are color-coded (NI, green; NK, gray; NN, black; HD,
blue; and NG, red). Host RVD compositions are the same but differently ordered in the three contexts. The DNA target sequences match the
corresponding RVD pattern given above them, including the invariant 50-T base that is present in all targets. EMSA dissociation constants (K
D
values) and ATF fold activation measurements were made as described in ‘Materials and Methods’ section.
6Nucleic Acids Research, 2013
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
bottom). These data suggested that N-terminal repeats
might contribute more to the overall affinity than
C-terminal repeats.
To explore this bias quantitatively, we tested dTALE
binding on a series of target sites shortened on either the
50or 30ends by substitutions of three, six and nine
terminal bases (Figure 4B). To avoid biases inherent in
making specific replacements, a random mixture of the
three non-target bases replaced the target bases at each
substituted position. For example, if a binding site
position was T, the modified target site would contain
an evenly distributed mixture of A, G or C. In our
naming convention, target variant ‘3m3’ has three bases
on the 30end randomized, whereas ‘5m6’ has six bases
randomized on the 50end. An invariant 50-T was main-
tained, reasoning that changing this base might unfairly
bias the outcome because of its known importance. We
first examined two high-affinity proteins, III-HDp
and III-NGp (Figure 4C and D and Supplementary
Table S6). Substitutions at the 50end of the target site
reduced affinity considerably more than the 30substitu-
tions. In the most dramatic case of III-NGp, substitutions
of three or six 30bases had essentially no effect, whereas
substitutions of the first three or six bases after the 50-T
reduced binding 15-fold and 370-fold, respectively. Put
another way, mismatches of the first three, six and nine
bases reduced affinity 15 -, 180 - and 150-fold more, re-
spectively, than equivalent mismatches at the 30end.
Importantly, III-NGp still bound tightly to a site with
nine 30mismatches (K
D
= 2.5 nM). III-HDp showed a
similar trend.
Interestingly, the polarity effect for AvrBs3
111-42
was
much smaller than for III-HDp or III-NGp. Truncating
substitutions at either the 30and 50ends caused strong
reductions in affinity (Figure 4F), indicating that binding
affinity and/or specificity were more equally distributed
across the repeats. Nonetheless, the 50-end mismatches
had greater effect (1.3 - to 3.1-fold). In contrast, as with
the 50-T preferences, dAvrBs3
111-42
differed from
AvrBs3
111-42
in displaying marked polarity effects
(Figure 4H). Mismatches at the 50end reduced binding
Figure 2. Affinity and transcriptional activation data of 15 dTALEs.
Comparison of (A) EMSA affinity constants for 15 dTALEs for their
cognate DNA targets (K
A
= 1/K
D,
vertical axis, logarithmic scale),
and (B) fold activation in an ATF assay. Guest RVD types (horizontal
axis) and host contexts (I, white; II, gray; III, black bars) are indicated.
Data were taken from Table 1.(C) Comparison of NN RVD inter-
action with G and A. EMSA affinities (left panel) and ATF fold acti-
vation (right panel) are shown for NN RVD TALE proteins with
corresponding G and A DNA targets. Numerical data are given in
Supplementary Table S4.
Figure 3. Natural and designed TALEs show differential dependence
on 50-T. The 50base of the Bs3 box target site affects the (A) affinity
(K
A
= 1/K
D
, linear scale) and (B) fold activation of the natural
AvrBs3
111-42
(black bars) more dramatically than for a designed
dAvrBs3
111-42
(white bars) (see Figure 1B for RVD compositions).
Nucleic Acids Research, 2013 7
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
26-fold more than at the 30end, but substitutions beyond
the first 30and 50triplets had little additional effect.
To determine if binding polarity is corroborated by
in vivo activation, we tested III-NGp, AvrBs3
111-42
, and
dAvrBs3
111-42
against mutated target sites in the ATF
reporter assay. Unlike the EMSA experiments, the activa-
tion assay required a specific target sequence for each
protein. The base least represented in the TALE binding
code for the particular RVD was used (Supplementary
Table S3). For AvrBs3
111-42
and dAvrBs3
111-42
, no acti-
vation was observed for any of the truncated sites
(Figures 4E, G and I). This was not surprising, because
the in vitro dissociation constants were higher than 46 nM
and, as described previously, K
D
values 27 nM generally
correlated to weak or no reporter gene activation. In
contrast, III-NGp showed little reduction or increased ac-
tivation levels for the three- and six-base truncations on
the 30end, but similar truncations on the 50end dramat-
ically reduced activity (Figure 4E).
DISCUSSION
The discovery of the TALE DNA-binding code was one of
the most exciting recent developments in the field of
engineered DNA-binding proteins. The modular nature
of TALE specificity, the accessibility of materials and
protocols for assembly and their comparatively robust
programmability are significant advantages over other
technologies such as zinc fingers or meganucleases (24).
TALE technology was rapidly incorporated into
designed site-specific nucleases, transcription factors and
recombinases (6–10). Although the targeting activities are
generally reliable, efficiencies vary widely and are near
background in 10–15% of the cases (23). In one case, a
single base change in the TALE-target pair can elicit
5-fold changes in activity (25). Plausible reasons include
low protein expression, inefficient dTALE folding, target
DNA modifications, chromatin structure variations and
the affinity of the dTALE for its target (23). However,
cell-based measurements make it difficult to disentangle
the intrinsic properties of dTALE-DNA interactions
from the multitude of other influences. Here, we directly
quantified dTALE DNA affinity with purified proteins
and well-defined target substrates. In parallel, we
assessed dTALE activity in vivo using an ATF reporter
assay.
A central conclusion of our work is that repeat compos-
ition significantly influences affinity. The NG RVD
contributed most strongly. NN and HD repeats were
also strong binders, whereas NI and NK RVDs
contributed much less to affinity than the other three
repeat types. Overall, based on the averaged affinities in
the three host contexts, the relative contributions are NG
(1)>NN (0.18) HD (0.16) NI (0.0016) >NK
(0.00016). On a per-repeat basis, the free-energy differ-
ences vary from 0.4 to 2.2 kJ/mol, relative to the NG
repeat. These values, in conjunction with the affinities,
suggest that the average binding contribution for a
single repeat is also small, 1 – 4 kJ/mol. However, the
modest correlation between dTALE DNA affinity and
ATF activity supports the idea that cell-based assays are
complex and likely dependent on several factors, of which
affinity is only one. For example, the synthetic DNA
targets and plasmids used in the activation assay may
contain cryptic target sites or sequences that influence
binding and/or promoter activity. Nonetheless, our quan-
titative studies should provide a good baseline for building
a better model of transcriptional responses.
Figure 4. Polarity effects of truncating substitutions at the 50and 30
ends of the target site. (A) The most frequent binding motif produced
by the Bind-n-Seq target site selection assay for AvrBs3
254-267
is aligned
below the expected UPA target site. The graph below the motif shows
the enrichment of 6-mers corresponding to the 50end of the target site,
but no enrichment of 306-mers, in the protein-bound DNA pool.
(B) EMSA target sets used to test truncations (underlined) of the Bs3
box binding site are shown using IUPAC nomenclature. Target sets for
the III-HDp and III-NGp are provided in Supplementary Table S2.
(C,D,Fand H) EMSA data are expressed as a percentage of
affinity retained, compared with the 19-bp substrate, when the indicated
protein binds the corresponding site with the indicated number (hori-
zontal axis) of either 50(blue)or3
0(yellow) truncated bases. (E,Gand
I) ATF reporter assay data are expressed as fold activation when the
indicated protein binds the indicated truncated site. The activation level
using the wild-type target site is shown (dashed line). Numerical data
are provided in Supplementary Table S6.
8Nucleic Acids Research, 2013
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
Our results indicating that the NG RVD confers the
highest affinity are in disagreement with conclusions by
Streubel et al. (20), who suggested that NG-containing
repeats are ‘weak’. A possible explanation for this discrep-
ancy could be the use of uninterrupted runs of six identical
repeats in their assays. Transcriptional activation was
reduced concomitant with increasing numbers of adjacent
NG and NI repeats, and for most other RVDs, a run of six
had the lowest activations [Supplementary Data from ref-
erence (20)]. When just two repeats were used, the relative
dTALE efficacies correlated more closely with our DNA-
binding data, NG being the strongest.
Our dTALEs containing NN guest RVDs had a marked
preference for G over A guest bases. The TALE DNA-
binding code suggests that NN RVDs are paired just as
often with A as G (2,3). Cell-based assays suggest that NN
RVDs tolerate A bases but prefer G (19,20), and
Systematic Evolution of Ligands by Exponential
Enrichment (SELEX) results (8) further support this G
preference, although A is preferred in some contexts.
However, the energetic difference for individual repeats
appears to be small. As 10 guest repeats in the three
host contexts, NN RVD dTALEs discriminated G from
A substrates by 17- to 49-fold, but the average discrimin-
ation per repeat is on the order of one kilojoule per mole,
a factor of 2-3 in equilibrium constant. Further, NN RVD
dTALEs bound A substrates more tightly than the corres-
ponding ‘A-specific’ NI RVD dTALEs, 2.6- to 3.8-fold,
suggesting that ‘off-target’ A recognition by NN RVDs is
a significant concern.
Another clear result was the poor performance of the
NK repeat. The NK guest proteins displayed the lowest
affinities and activation levels of all the dTALEs. Our data
provide a biochemical rationale for the low activities of
NK-rich dTALEs in cell-based studies (19,20,26,27),
strongly indicating that NN is a better choice than NK
for G recognition. Recently, the novel NH RVD has
shown promise for superior G specificity (19,20),
although quantitative affinity measurements have yet to
be reported.
The weak affinity of NI repeats for A agrees with the
activation data of others (20,25), although the quantity
and density of NI RVDs did not always correlate with
activation levels (25). In at least one context, NI appar-
ently encoded G specificity (8). Thus, it seems that the
contribution of NI repeats may vary more by context
than other RVDs. Given its higher affinity, the NN
RVD might be a better choice for A recognition in cases
where G discrimination is not critical.
What are the structural rationales for the different
repeat affinity contributions? In the crystal structures
(4,5,28), only the second RVD residue (position 13)
contacts the recognition base in the major groove. Tight
binding specified by NN and HD RVDs can be
rationalized by the direct H-bonds from Asn13 and
Asp13 to major groove base atoms. The NN preference
for G over A might be due to subtle differences between
the Asn-amide-purine-N7 H-bonds. Perhaps these differ-
ences could be modulated by context, leading to the occa-
sional NN preference for A over G (8).
The high affinity of NG repeats for T reveals something
unexpected about specificity. Crystal structures (4,5,28)
depict the thymine 5-methyl contacting Gly13 alpha
carbon through a van der Waals interaction, whereas the
non-glycine residues of other RVDs would be expected to
clash with this base. The RVD comparisons here suggest
that the NG RVD not only provides a void for the
5-methyl, but the T 5-methyl-Gly alpha carbon interaction
actually enhances binding over NN and HD repeats.
Although this van der Waals interaction may be more fa-
vorable than the NN or HD hydrogen bonds, another idea
is that other TALE repeat residues flanking the RVD
provide the binding energy rather than the RVD itself.
A prime candidate is adjacent Gln 17, the only repeat
residue that directly contacts the phosphate backbone.
In this light, the phosphate contact mediates DNA
affinity, modulated by the RVD loop. Cognate inter-
actions would fix and orient the loop position favorably
for the Gln-phosphate hydrogen bond, whereas
non-cognate interactions exert a disruptive effect.
A significant novel insight of this study is the clear
affinity bias of TALEs for the 50-end of the target
sequence. The dTALEs dAvrBs3
111-42
, III-HDp and
III-NGp bound targets with blocks of 30substitutions
10 - to 180-fold better than targets with analogous 50sub-
stitutions, sometimes retaining sufficient affinity for
potent activity. For example, III-NGp bound a target
with just 10 matching 50-end repeats with a K
D
of
2.5 nM (Supplementary Table S6). As such relatively
short 10-mer sites might occur thousands of times in a
eukaryotic genome, this polarized promiscuity indicates
large potential for off-target events. This result suggests
that the 50end of the dTALE target sequences should be
selected for their uniqueness in particular.
In contrast to dTALEs, the differences between 50and
30target modifications in the natural TALE AvrBs3
111-42
were small, 1- to 3-fold (Figure 4F, Supplementary
Table S6). Furthermore, unlike the dTALEs described
here, AvrBs3 variants with substituted or additional
repeats at the C-terminus showed high specificity for the
cognate 30target modifications in cell-based reporter
assays (7). Why is this? One reason may be the asymmetric
base composition of the target site. The 30side, rich in T
and C, might be expected to contribute more to affinity
than the A- and T-rich 50side, with the greater relative
effect of 30-side substitutions offsetting the polarity effect.
However, this explanation does not rationalize the
polarized behavior of dAvrBs3
111-42
, which is designed
to recognize the same target. Even more curiously,
AvrBs3
111-42
was much more sensitive to substitutions of
the invariant 50-T than dAvrBs3
111-42
(13 - to 20-fold vs 2 -
to 3-fold, respectively), but dAvrBs3
111-42
was much more
sensitive to the 5m3 modification than AvrBs3
111-42
(345-fold vs 29-fold, respectively, Supplementary Table
S6). These results may suggest that natural features like
the non-standard RVDs and non-canonical RVD-base
alignments play a yet-to-be-understood role in distributing
affinity contributions more evenly over the target site
(see later in the text).
The polarization of the TALE–DNA interaction was
not apparent in earlier SELEX studies (8). Perhaps, the
Nucleic Acids Research, 2013 9
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
SELEX procedure was optimized to probe specificity
rather than relative affinities of different positions,
which would require collecting sequence data at every
SELEX cycle (29). In contrast, the single enrichment
cycle used in Bind-n-Seq is ideal for revealing relative
affinity contributions of different RVD positions.
DNA binding by dTALEs was sensitive to the N-terminal
and C-terminal scaffold boundaries. The 111-42 framework
is stable enough to allow production of soluble dTALEs in
milligram amounts and provided binding and activation
behavior comparable with AvrBs3
254-180
.Usingthe
dHAX3/DNA co-crystal structure (4) as a guide, TALE–
DNA contacts involve at least 27 C-terminal residues.
However, another AvrBs3 deletion variant retaining only
20 C-terminal residues exhibited full DNA-binding activity
(data not shown), and a TALEN construct containing just
two C-terminal residues was fully functional (30). Thus, the
C-terminus seems tolerant to deletions.
The N-terminal extension seems more critical. The
PthXol-DNA (3UGM) crystal structure depicts one add-
itional repeat-like module, the ‘-1 repeat’, contacting the
conserved 50-T, but no well-ordered structures or DNA
contacts are observed N-terminal to that (5). Nonetheless,
a 111-residue N-terminal extension was required for full
DNA-binding and ATF activity, whereas 94 residues
reduced DNA binding by 50-fold. Miller et al. (8) used
‘optimized’ TALENs with a 102-residue extension, suggest-
ing that just eight extra residues confer optimal DNA
binding. What is the function of the N-terminal extension?
Secondary structure predictions and proteolysis by Factor
Xa, which cleaves arginine residues in unstructured
peptides, suggest that this region is a boundary between
ordered and disordered segments. Comprising potentially
one or two additional 34-residue repeats, the N-terminus
may be an organizing center for DNA binding. One idea is
that this region, including the -1 repeat, forms a folding unit
that contacts the 50-T (5). This productive DNA-binding
interaction then initiates nucleation of the repeat super-
helical filament that wraps around the target. Recent
work by Gao et al. (31) provides support for this idea by
demonstrating that the TALE N-terminal region 148-288
autonomously binds DNA non-specifically.
Our finding that the 50-repeat interactions contribute
more to DNA affinity is consistent with the model of an
N-terminal organizing center. If the N-terminus serves as
an anchor, then the diminishing contribution of the more
distant repeats could result from registration mismatch
between the repeat and DNA helices. Variations in
helical pitch and geometries of adjacent-repeat and
DNA base-step transitions would compound with
increasing distance from the 50-T, de-phasing the
protein–DNA interaction and degrading the quality of
the contacts. Indeed, in the PthXol-DNA structure, the
last three RVD repeats do not contact the DNA, even
though the DNA sites are available for binding.
Extending this idea further, the local geometry of par-
ticular DNA sequences might be more or less compatible
with the TALE superhelix geometry, leading to an add-
itional level of TALE DNA discrimination through
indirect readout. Runs of the same nucleotides, in particu-
lar polydA/polydT (32,33), have characteristic helical
parameters and deformability that differ from those for
‘average’ B-DNA. Registration mismatches could explain
the dramatic reductions in activation with increasing run
lengths described previously (20). We attempted to
minimize homopolymeric runs in our host-guest system
out of concern about this possibility. The insensitivity of
III-NGp binding to the 3m3 target truncation may be a
manifestation of this phenomenon, as the last four target
residues are T. This idea finds structural support in the
dHAX3/DNA co-crystal structure (3VPT), which
contains a T
3
run. While, in the first NG-T interaction,
the T 5-Methyl and Gly13 C-alpha contact each other
closely, 3.7 A
˚, either of the subsequent NG repeats take
on unusual RVD loop conformations that increase the
5-methyl-C-alpha separation by 2 A
˚, out of van der
Waals contact distance. Perhaps these RVD conform-
ational variations serve to re-phase the downstream
contacts but sacrifice affinity to do so. Utilization of
non-standard repeats like NS and N*, and non-canonical
‘mismatched’ RVD-base combinations may carry out the
same function, by loosening local structural constraints to
realign the TAL and DNA helices. Taken in this light, it is
perhaps not surprising that dAvrBs3
111-42
and
AvrBs3
111-42
differ in their sensitivity to substitutions at
the 50and 30ends. It may be that a number of apparent
dTALE specificity anomalies may be rooted in the regis-
tration differences between TALEs and DNA.
The overall implication is that for some proportion of
dTALEs, we do not yet have rules that reliably predict
their interaction with their targets. However, from surveys
of TALE efficacy, it should be possible to deduce rules for
the more complex behavior. The discrepancies between
DNA-binding affinity and transcriptional activation
suggest that affinity is a necessary, but not sufficient,
property for highly active synthetic dTALEs. High specifi-
city for the target is another desirable trait. It may be that, as
with zinc fingers (34), some dTALEs with very high affinities
may prove to be problematic. Consider that III-NGp, which
binds with low nanomolar affinity to targets containing only
the first 10 of 19 bases, could occupy thousands of perfect
10-base-pair matches expected in a mammalian genome.
Such an example underscores the potential for significant
off-target activity by some dTALEs.
Overall, these first systematic biochemical measure-
ments of TALE DNA affinities reveal the potential for
both high affinity and high variability, as well as
complexities underlying the TALE DNA-binding code.
These measurements will also provide calibration for bio-
engineering of regulatory networks that are constructed
using the TALE–DNA platform.
SUPPLEMENTARY DATA
Supplementary Data are available at NAR online: Supple-
mentary Tables 1–6 and Supplementary Figures 1–4.
ACKNOWLEDGEMENTS
We kindly thank Adam Bogdanove, Iowa State
University, for the PthXo1 clone in pAH103, and the
10 Nucleic Acids Research, 2013
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
Chris Fraser Lab, MCB, UC Davis, for providing purified
TEV protease. We thank Jose Zarasoga for assistance in
protein production.
FUNDING
National Institutes of Health (NIH) [GM097073 to D.J.S.
and E.P.B.]. Funding for open access charge: NIH
[GM097073].
Conflict of interest statement. None declared.
REFERENCES
1. Boch,J. and Bonas,U. (2010) Xanthomonas AvrBs3 Family-Type
III Effectors: Discovery and Function. Annu. Rev. Phytopathol.,
48, 419–436.
2. Boch,J., Scholze,H., Schornack,S., Landgraf,A., Hahn,S., Kay,S.,
Lahaye,T., Nickstadt,A. and Bonas,U. (2009) Breaking the code
of DNA binding specificity of TAL-type III effectors. Science,
326, 1509–1512.
3. Moscou,M.J. and Bogdanove,A.J. (2009) A simple cipher governs
DNA recognition by TAL effectors. Science,326, 1501.
4. Deng,D., Yan,C., Pan,X., Mahfouz,M., Wang,J., Zhu,J.K., Shi,Y.
and Yan,N. (2012) Structural basis for sequence-specific
recognition of DNA by TAL effectors. Science,335, 720–723.
5. Mak,A.N., Bradley,P., Cernadas,R.A., Bogdanove,A.J. and
Stoddard,B.L. (2012) The crystal structure of TAL effector
PthXo1 bound to its DNA target. Science,335, 716–719.
6. Christian,M., Cermak,T., Doyle,E.L., Schmidt,C., Zhang,F.,
Hummel,A., Bogdanove,A.J. and Voytas,D.F. (2010) Targeting
DNA double-strand breaks with TAL effector nucleases. Genetics,
186, 757–761.
7. Morbitzer,R., Romer,P., Boch,J. and Lahaye,T. (2010) Regulation
of selected genome loci using de novo-engineered transcription
activator-like effector (TALE)-type transcription factors. Proc.
Natl Acad. Sci. USA,107, 21617–21622.
8. Miller,J.C., Tan,S., Qiao,G., Barlow,K.A., Wang,J., Xia,D.F.,
Meng,X., Paschon,D.E., Leung,E., Hinkley,S.J. et al. (2011) A
TALE nuclease architecture for efficient genome editing. Nat.
Biotechnol.,29, 143–148.
9. Mercer,A.C., Gaj,T., Fuller,R.P. and Barbas,C.F. 3rd (2012)
Chimeric TALE recombinases with programmable DNA sequence
specificity. Nucleic Acids Res.,40, 11163–11172.
10. Perez-Pinera,P., Ousterout,D.G. and Gersbach,C.A. (2012)
Advances in targeted genome editing. Curr. Opin. Chem. Biol.,16,
268–277.
11. Garg,A., Lohmueller,J.J., Silver,P.A. and Armel,T.Z. (2012)
Engineering synthetic TAL effectors with orthogonal target sites.
Nucleic Acids Res.,40, 7584–7595.
12. Morbitzer,R., Elsaesser,J., Hausner,J. and Lahaye,T. (2011)
Assembly of custom TALE-type DNA binding domains by
modular cloning. Nucleic Acids Res.,39, 5790–5799.
13. Zykovich,A., Korf,I. and Segal,D.J. (2009) Bind-n-Seq:
high-throughput analysis of in vitro protein-DNA interactions
using massively parallel sequencing. Nucleic Acids Res.,37, e151.
14. Szczepek,M., Brondani,V., Buchel,J., Serrano,L., Segal,D.J. and
Cathomen,T. (2007) Structure-based redesign of the dimerization
interface reduces the toxicity of zinc-finger nucleases. Nat.
Biotechnol.,25, 786–793.
15. Beerli,R.R., Segal,D.J., Dreier,B. and Barbas,C.F. 3rd (1998)
Toward controlling gene expression at will: specific regulation of
the erbB-2/HER-2 promoter by using polydactyl zinc finger
proteins constructed from modular building blocks. Proc. Natl
Acad. Sci. USA,95, 14628–14633.
16. Kim,M.S., Stybayeva,G., Lee,J.Y., Revzin,A. and Segal,D.J.
(2011) A zinc finger protein array for the visual detection of
specific DNA sequences for diagnostic applications. Nucleic acids
research,39, e29.
17. Cuff,J.A., Clamp,M.E., Siddiqui,A.S., Finlay,M. and Barton,G.J.
(1998) JPred: a consensus secondary structure prediction server.
Bioinformatics,14, 892–893.
18. Schornack,S., Meyer,A., Romer,P., Jordan,T. and Lahaye,T.
(2006) Gene-for-gene-mediated recognition of nuclear-targeted
AvrBs3-like bacterial effector proteins. J. Plant Physiol.,163,
256–272.
19. Cong,L., Zhou,R., Kuo,Y.C., Cunniff,M. and Zhang,F. (2012)
Comprehensive interrogation of natural TALE DNA-binding
modules and transcriptional repressor domains. Nat. Commun.,3,
968.
20. Streubel,J., Blucher,C., Landgraf,A. and Boch,J. (2012) TAL
effector RVD specificities and efficiencies. Nat. Biotechnol.,30,
593–595.
21. Sun,N., Liang,J., Abil,Z. and Zhao,H. (2012) Optimized TAL
effector nucleases (TALENs) for use in treatment of sickle cell
disease. Mol. Biosyst.,8, 1255–1263.
22. Cermak,T., Doyle,E.L., Christian,M., Wang,L., Zhang,Y.,
Schmidt,C., Baller,J.A., Somia,N.V., Bogdanove,A.J. and
Voytas,D.F. (2011) Efficient design and assembly of custom
TALEN and other TAL effector-based constructs for DNA
targeting. Nucleic Acids Res.,39, e82.
23. Reyon,D., Tsai,S.Q., Khayter,C., Foden,J.A., Sander,J.D. and
Joung,J.K. (2012) FLASH assembly of TALENs for
high-throughput genome editing. Nat. Biotechnol.,30,
460–465.
24. Silva,G., Poirot,L., Galetto,R., Smith,J., Montoya,G.,
Duchateau,P. and Paques,F. (2011) Meganucleases and other
tools for targeted genome engineering: perspectives and challenges
for gene therapy. Curr. Gene Ther.,11, 11–27.
25. Zhang,F., Cong,L., Lodato,S., Kosuri,S., Church,G.M. and
Arlotta,P. (2011) Efficient construction of sequence-specific TAL
effectors for modulating mammalian transcription. Nat.
Biotechnol.,29, 149–153.
26. Huang,P., Xiao,A., Zhou,M., Zhu,Z., Lin,S. and Zhang,B. (2011)
Heritable gene targeting in zebrafish using customized TALENs.
Nat. Biotechnol.,29, 699–700.
27. Christian,M.L., Demorest,Z.L., Starker,C.G., Osborn,M.J.,
Nyquist,M.D., Zhang,Y., Carlson,D.F., Bradley,P.,
Bogdanove,A.J. and Voytas,D.F. (2012) Targeting G with TAL
Effectors: A Comparison of Activities of TALENs Constructed
with NN and NK Repeat Variable Di-Residues. PloS One,7,
e45383.
28. Deng,D., Yin,P., Yan,C., Pan,X., Gong,X., Qi,S., Xie,T.,
Mahfouz,M., Zhu,J.K., Yan,N. et al. (2012) Recognition of
methylated DNA by TAL effectors. Cell Res.,22, 1502–1504.
29. Roulet,E., Busso,S., Camargo,A.A., Simpson,A.J., Mermod,N.
and Bucher,P. (2002) High-throughput SELEX SAGE method for
quantitative modeling of transcription-factor binding sites. Nat.
Biotechnol.,20, 831–835.
30. Mussolino,C., Morbitzer,R., Lutge,F., Dannemann,N., Lahaye,T.
and Cathomen,T. (2011) A novel TALE nuclease scaffold enables
high genome editing activity in combination with low toxicity.
Nucleic Acids Res.,39, 9283–9293.
31. Gao,H., Wu,X., Chai,J. and Han,Z. (2012) Crystal structure of a
TALE protein reveals an extended N-terminal DNA binding
region. Cell Res.,22, 1716–1720.
32. Nelson,H.C., Finch,J.T., Luisi,B.F. and Klug,A. (1987) The
structure of an oligo(dA).oligo(dT) tract and its biological
implications. Nature,330, 221–226.
33. Peck,L.J. and Wang,J.C. (1981) Sequence dependence of the
helical repeat of DNA in solution. Nature,292, 375–378.
34. Pattanayak,V., Ramirez,C.L., Joung,J.K. and Liu,D.R. (2011)
Revealing off-target cleavage specificities of zinc-finger nucleases
by in vitro selection. Nat. Methods,8, 765–770.
Nucleic Acids Research, 2013 11
at University of California, Davis - Library on February 26, 2013http://nar.oxfordjournals.org/Downloaded from
1
SUPPLEMENTARY DATA
Quantitative Analysis of TALE-DNA Interactions Suggests Polarity Effects
Joshua F. Meckler1†, Mital S. Bhakta1†, Moon-Soo Kim, Robert Ovadia2, Chris H. Habrian2,
Artem Zykovich1, Abigail Yu1, Sarah H. Lockwood1, Robert Morbitzer3, Janett Elsäesser3,
Thomas Lahaye3, David J. Segal1* and Enoch P. Baldwin2*
1Genome Center and Department of Biochemistry and Molecular Medicine, 2Department of
Molecular and Cellular Biology, University of California, Davis, CA 95616, 3Department of
Biology, Institute of Genetics, Ludwig-Maximilians-University Munich, 82152 Martinsried,
Germany
§Current address: Department of Chemistry, 1906 College Heights Blvd., Western Kentucky
University, Bowling Green, KY 42101.
These authors contributed equally to this work
*Address correspondence to: djsegal@ucdavis.edu and epbaldwin@ucdavis.edu
2
Table of Contents
Figure S1: Sequences of dTALEs used in this study. ................................................................ 3
A) N- and C-terminal truncations reported by several studies. ........................................................ 3
B) Construct sequences for bacterial expression for in vitro assays................................................... 4
C) Construct sequences for eukaryotic expression for reporter assays............................................. 9
Figure S2: Coomassie-stained protein gel of representative dTALEs. .................................. 11
Figure S3. Representative EMSA data. .................................................................................... 12
Figure S4: Correlation of affinity with reporter gene activation. .......................................... 13
Table S1: Primers used to create N- and C-terminal truncations.......................................... 14
Table S2: Primers used in EMSA assays. ................................................................................. 15
Table S3: Primers for construction of ATF reporter plasmids. ............................................. 16
Table S4: EMSA and ATF data for G/A specificity of the NN RVD. .................................... 19
Table S5: EMSA and ATF data probing the requirement of a 5’T. ...................................... 19
Table S6: EMSA and ATF data for polarity effects. ............................................................... 20
References.................................................................................................................................... 21
3
Figure S1: Sequences of dTALEs used in this study.
A) N- and C-terminal truncations reported by several studies.
this study
Mussolino et al. (1) Miller et al. (2)
Sun et al. (3) PthXo1 structure (4)
Zhang et al. (5) dHax3 structure (6)
BamHI
254-this study
|287-Mussolino et al, Sun et al
||
MDPI
RPRRPSPARELLPGPQPDRVQPTADRGVSAPAGSPLDGLPARRTVSRTRL 201
207 Sun et al
PSPPAPSPAFSAGSFSDLLRPFDPSLLDTSLLDSMPAVGTPHTAAAPAEW 151
N2-Zhang et al
Factor Xa| 111-this study
153-Mussolino et al || | Δ152-Miller et al
| || | |135-Sun et al
DEAQSALRAADDPPPTVRVAVTAARPPRAKPAPRRRAAQPSDASPAAQVD 101
N3-Zhang et al N5-Zhang et al
| 94 this study | N6-Zhang et al
| | 125-Sun et al | | N7-Zhang et al
| | | PthXo1 Struct | | |
| | | | N4-Zhang et al | | |
LRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAALG 051
dHax3 Struct
| N8-Zhang et al
| | 50-Sun et al
| | |49-Mussolino et al
TVAVTYQHIITALPEATHEDIVGVGKQWSGARALEALLTDAGELRGPPLQ 001
LDTGQ… --- TALE REPEATS --- …SRPDP
+28-Miller et al 47-Mussolino et al
| 31-Sun et al | Factor Xa
| |C4-Zhang et al | | 42 this study
| || Factor Xa| | |
17-Mussolino et al | | | Factor Xa
| | || | | | |
ALAALTNDHLVALACLGGRPAMDAVKKGLPHAPELIRRVNRRIGERTSHR 050
+63-Miller et al, Sun et al
|C3-Zhang et al
VADYAQVVRVLEFFQCHSHPAYAFDEAMTQFGMSRNGLVQLFRRVGVTEL 100
117-Sun et al
EARGGTLPPASQRWDRILQASGMKRAKPSPTSAQTPDQASLHAFADSLER 150
163-Sun et al (imprecise due to low conservation with AvrXa10)
| 180 this study
| | 200-Sun et al
DLDAPSPMHEGDQTRASSRKRSRSDRAVTGPSAQQAVEVRVPEQRDALHL 200
BamHI
231-Mussolino et al
PLSWRVKRPRTRIWGGLPDPGTPMAADLAASSTVMWEQDADPFAGAADDF 250
PAFNEEELAWLMELLPQ*
4
B) Construct sequences for bacterial expression for in vitro assays
Factor Xa site: IEGR^
TEV protease site: ENLYFQ^S
XhoI: CTCGAG
BamHI: GGATCC
StuI: AGGCCT
AatII: GACGTC
AgeI: ACCGGT
HindIII: AAGCTT
His6-purification tag: HHHHHH
Construct Schema:
MBP(not shown)-Factor Xa-TEV-XhoIBamHI-StuI-TALE_repeats-AatII-AgeI-His6-
Stop-HindIII
AvrBs3254-180 in pMal-TEV
ATCGAGGGAAGGCTCGAAAATCTTTATTTTCAGTCTCTCGAGGATCCCATTCGTTCGCGCACACCAAGTCCTGCCCG
CGAGCTTCTGCCCGGACCCCAACCCGATGGGGTTCAGCCGACTGCAGATCGTGGGGTGTCTCCGCCTGCCGGCGGCC
CCCTGGATGGCTTGCCCGCTCGGCGGACGATGTCCCGGACCCGGCTGCCATCTCCCCCTGCCCCCTCACCTGCGTTC
TCGGCGGGCAGCTTCAGTGACCTGTTACGTCAGTTCGATCCGTCACTTTTTAATACATCGCTTTTTGATTCATTGCC
TCCCTTCGGCGCTCACCATACAGAGGCTGCCACAGGCGAGTGGGATGAGGTGCAATCGGGTCTGCGGGCAGCCGACG
CCCCCCCACCCACCATGCGCGTGGCTGTCACTGCCGCGCGGCCGCCGCGCGCCAAGCCGGCGCCGCGACGACGTGCT
GCGCAACCCTCCGACGCTTCGCCGGCCGCGCAGGTGGATCTACGCACGCTCGGCTACAGCCAGCAGCAACAGGAGAA
GATCAAACCGAAGGTTCGTTCGACAGTGGCGCAGCACCACGAGGCACTGGTCGGCCATGGGTTTACACACGCGCACA
TCGTTGCGCTCAGCCAACACCCGGCAGCGTTAGGGACCGTCGCTGTCAAGTATCAGGACATGATCGCAGCGTTGCCA
GAGGCGACACACGAAGCGATCGTTGGCGTCGGCAAACAGTGGTCCGGCGCACGCGCTCTGGAGGCCTTGCTCACGGT
GGCGGGAGAGTTGAGAGGTCCACCGTTACAGTTGGACACAGGCCAACTTCTCAAGATTGCAAAACGTGGCGGCGTGA
CCGCAGTGGAGGCAGTGCATGCATGGCGCAATGCACTGACGGGTGCCCCCCTGAACCTGACCCCGGAGCAGGTGGTG
GCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCA
TGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGC
TGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAG
CAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGC
CATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATG
GCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTG
TTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCA
GGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCA
TCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGC
CTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTT
GCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGG
CGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATC
GCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCT
GACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGC
CGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAGCAGGCG
CTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGC
CAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGA
CCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCG
GTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCT
GGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCA
GCAATGGCGGCGGCAGGCCGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACC
CCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGT
GCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGCGGCAGGCCGGCGCTGG
AGAGCATTGTTGCCCAGTTATCTCGCCCTGATCCGGCGTTGGCCGCGTTGACCAACGACCACCTCGTCGCCTTGGCC
TGCCTCGGCGGACGTCCTGCGCTGGATGCAGTGAAAAAGGGATTGCCGCACGCGCCGGCCTTGATCAAAAGAACCAA
TCGCCGTATTCCCGAACGCACATCCCATCGCGTTGCCGACCACGCGCAAGTGGTTCGCGTGCTGGGTTTTTTCCAGT
GCCACTCCCACCCAGCGCAAGCATTTGATGACGCCATGACGCAGTTCGGGATGAGCAGGCACGGGTTGTTACAGCTC
5
TTTCGCAGAGTGGGCGTCACCGAACTCGAAGCCCGCAGTGGAACGCTCCCCCCAGCCTCGCAGCGTTGGGACCGTAT
CCTCCAGGCATCAGGGATGAAAAGGGCCAAACCGTCCCCTACTTCAACTCAAACGCCGGATCAGGCGTCTTTGCATG
CATTCGCCGATTCGCTGGAGCGTGACCTTGATGCGCCTAGCCCAATGCACGAGGGAGATCAGACGCGGGCAAGCAGC
CGTAAACGGTCCCGATCGGATCGTGCTGTCACCGGTCATCACCATCACCATCACTGAAAGCTT
IEGRLENLYFQSLEDPIRSRTPSPARELLPGPQPDGVQPTADRGVSPPAGGPLDGLPARR
TMSRTRLPSPPAPSPAFSAGSFSDLLRQFDPSLFNTSLFDSLPPFGAHHTEAATGEWDEV
QSGLRAADAPPPTMRVAVTAARPPRAKPAPRRRAAQPSDASPAAQVDLRTLGYSQQQQEK
IKPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVKYQDMIAALPEATHEAIVG
VGKQWSGARALEALLTVAGELRGPPLQ
LDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNSGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNSGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNSGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGRPALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGRPALESIVAQLSRPDPALAALTNDHLVALACLGGRPALDAVKKGLPHAPALIKRTNRRIPER
TSHRVADHAQVVRVLGFFQCHSHPAQAFDDAMTQFGMSRHGLLQLFRRVGVTELEARSGTLPPA
SQRWDRILQASGMKRAKPSPTSTQTPDQASLHAFADSLERDLDAPSPMHEGDQTRASSRK
RSRSDRAVTGHHHHHH*KL
AvrBs3111-42 in pMal-TEV
ATCGAGGGAAGGCTCGAAAATCTTTATTTTCAGTCTCTCGAGATGGATCCCTCCGACGCTTCGCCGGCCGCGCAGGT
GGATCTACGCACGCTCGGCTACAGTCAGCAGCAGCAAGAGAAGATCAAACCGAAGGTGCGTTCGACAGTGGCGCAGC
ACCACGAGGCACTGGTGGGCCATGGGTTTACACACGCGCACATCGTTGCGCTCAGCCAACACCCGGCAGCGTTAGGG
ACCGTCGCTGTCACGTATCAGCACATAATCACGGCGTTGCCAGAGGCGACACACGAAGACATCGTTGGCGTCGGCAA
ACAGTGGTCCGGCGCACGCGCCCTGGAGGCCTTGCTCACGGTGGCGGGAGAGTTGAGAGGTCCACCGTTACAGTTGG
ACACAGGCCAACTTCTCAAGATTGCAAAACGTGGCGGCGTGACCGCAGTGGAGGCAGTGCATGCATGGCGCAATGCA
CTGACGGGTGCCCCCCTGAACCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCT
GGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCA
GCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACC
CCGCAGCAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGT
GCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGG
AGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGC
AATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCC
GGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGC
TGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAG
ACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCA
CGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGG
AGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTG
TGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGAC
GGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATA
6
GCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAG
CAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTG
CCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGG
TGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGAT
GGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCA
GGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCC
AGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGCGGCAGGCCGGCGCTGGAGACGGTG
CAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGG
CGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGG
TGGTGGCCATCGCCAGCAATGGCGGCGGCAGGCCGGCGCTGGAGAGCATTGTTGCCCAGTTATCTCGCCCTGATCCG
GCGTTGGCCGCGTTGACCAACGACCACCTCGTCGCCTTGGCCTGCCTCGGCGGACGTCCTGCCATGGATGCAGTGAA
AAAGGGACTGCCGCACGCGCCGGAATTGATCAGAAGAGTCAATCGCCGTCCGGATCCTACCGGTCATCACCATCACC
ATCACTGAAAGCTT
IEGRLENLYFQSLEMDPSDASPAAQVDLRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHG
FTHAHIVALSQHPAALGTVAVTYQHIITALPEATHEDIVGVGKQWSGARALEALLTVAGE
LRGPPLQ
LDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNSGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNSGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNSGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGRPALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGRPALESIVAQLSRPDPALAALTNDHLVALACLGGRPAMDAVKKGLPHAPELIRRVNRRPDPT
GHHHHHH*KL
AvrBs394-42 in pMal-TEV
ATCGAGGGAAGGCTCGAAAATCTTTATTTTCAGTCTCTCGAGATGGATCCCAGTCAGCAGCAGCAAGAGAAGATCAA
ACCGAAGGTGCGTTCGACAGTGGCGCAGCACCACGAGGCACTGGTGGGCCATGGGTTTACACACGCGCACATCGTTG
CGCTCAGCCAACACCCGGCAGCGTTAGGGACCGTCGCTGTCACGTATCAGCACATAATCACGGCGTTGCCAGAGGCG
ACACACGAAGACATCGTTGGCGTCGGCAAACAGTGGTCCGGCGCACGCGCCCTGGAGGCCTTGCTCACGGTGGCGGG
AGAGTTGAGAGGTCCACCGTTACAGTTGGACACAGGCCAACTTCTCAAGATTGCAAAACGTGGCGGCGTGACCGCAG
TGGAGGCAGTGCATGCATGGCGCAATGCACTGACGGGTGCCCCCCTGAACCTGACCCCGGAGCAGGTGGTGGCCATC
GCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCT
GACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGC
CGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAGCAGGCG
CTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGC
CAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGA
CCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCG
GTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCT
GGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCA
GCAATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACC
CCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGT
7
GCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGG
AGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGC
AATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCC
GGAGCAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGC
TGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATAGCGGTGGCAAGCAGGCGCTGGAG
ACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCA
CGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGG
AGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTG
TGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGAC
GGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATG
GCGGCGGCAGGCCGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAG
CAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTG
CCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGCGGCAGGCCGGCGCTGGAGAGCA
TTGTTGCCCAGTTATCTCGCCCTGATCCGGCGTTGGCCGCGTTGACCAACGACCACCTCGTCGCCTTGGCCTGCCTC
GGCGGACGTCCTGCCATGGATGCAGTGAAAAAGGGACTGCCGCACGCGCCGGAATTGATCAGAAGAGTCAATCGCCG
TCCGGATCCTACCGGTCATCACCATCACCATCACTGAAAGCTT
IEGRLENLYFQSLEMDPSQQQQEKIKPKVRSTVAQHHEALVGHGFTHAHIVALSQHPAAL
GTVAVTYQHIITALPEATHEDIVGVGKQWSGARALEALLTVAGELRGPPLQ
LDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNSGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQALLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNSGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNSGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGRPALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGRPALESIVAQLSRPDPALAALTNDHLVALACLGGRPAMDAVKKGLPHAPELIRRVNRRPDPT
GHHHHHH*KL
II-NGp111-42 in pMal-TEV
ATCGAGGGAAGGCTCGAAAATCTTTATTTTCAGTCTCTCGAGATGGATCCCTCCGACGCTTCGCCGGCCGCGCAGGT
GGATCTACGCACGCTCGGCTACAGTCAGCAGCAGCAAGAGAAGATCAAACCGAAGGTGCGTTCGACAGTGGCGCAGC
ACCACGAGGCACTGGTGGGCCATGGGTTTACACACGCGCACATCGTTGCGCTCAGCCAACACCCGGCAGCGTTAGGG
ACCGTCGCTGTCACGTATCAGCACATAATCACGGCGTTGCCAGAGGCGACACACGAAGACATCGTTGGCGTCGGCAA
ACAGTGGTCCGGCGCACGCGCCCTGGAGGCCTTGCTCACGGTGGCGGGAGAGTTGAGAGGTCCACCGTTACAGTTGG
ACACAGGCCAACTTCTCAAGATTGCAAAACGTGGCGGCGTGACCGCAGTGGAGGCAGTGCATGCATGGCGCAATGCA
CTGACGGGTGCCCCCCTGAACCTTACGCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCT
GGAGACGGTGCAGCGGCTGCTTCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCA
GCAATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGATTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACC
CCGGAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACTGTCCAGCGGCTGTTGCCGGT
GCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTTG
AGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGC
AATGGCGGTGGCAAGCAGGCTCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCC
GGAGCAGGTGGTGGCCATCGCCAGCCACGACGGGGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGC
8
TGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGCGGCAAGCAGGCGCTGGAG
ACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCAAGCAA
TAAGGGTGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGG
AGCAGGTGGTGGCAATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTG
TGCCAGGCCCATGGCCTGACCCCGCAACAGGTGGTAGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAGAC
GGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACACCCCAGCAGGTGGTAGCGATCGCCAGCAATG
GCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGCTTCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAG
CAGGTGGTGGCCATCGCCAGCCACGATGGCGGCAAGCAGGCGCTGGAGACGGTGCAGCGATTGTTGCCGGTGCTGTG
CCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACTG
TCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATGGC
GGTGGCAAGCAGGCGCTTGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCA
GGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCTCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCC
AGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATAAGGGGGGCAAGCAGGCGCTGGAGACGGTG
CAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGG
CGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACACCCCAGCAGG
TGGTGGCCATCGCCAGCAATGGCGGCGGCAGGCCGGCGCTGGAGAGCATTGTTGCCCAGTTATCTCGCCCTGATCCG
GCGTTGGCCGCGTTGACCAACGACCACCTCGTCGCCTTGGCCTGCCTCGGCGGACGTCCTGCCATGGATGCAGTGAA
AAAGGGACTGCCGCACGCGCCGGAATTGATCAGAAGAGTCAATCGCCGTCCGGATCCTACCGGTCATCACCATCACC
ATCACTGAAAGCTT
IEGRLENLYFQSLEMDPSDASPAAQVDLRTLGYSQQQQEKIKPKVRSTVAQHHEALVGHG
FTHAHIVALSQHPAALGTVAVTYQHIITALPEATHEDIVGVGKQWSGARALEALLTVAGE
LRGPPLQ
LDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNIGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNKGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNIGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNKGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQALLPVLCQAHG
LTPQQVVAIASNGGGRPALESIVAQLSRPDPALAALTNDHLVALACLGGRPAMDAVKKGLPHAPELIRRVNRRPDPT
GHHHHHH*KL
9
C) Construct sequences for eukaryotic expression for reporter assays
HAtag: YPYDVPDYA
NLS: PKKKRKV
XhoI: CTCGAG
StuI: AGGCCT
AatII: GACGTC
AgeI: ACCGGT
VP64: DALDDFDLDMLDALDDFDLDMLDALDDFDLDMLDALDDFDLDML
PstI: CTGCAG
Construct Schema:
HAtag-NLS-XhoI-StuI-TALE_repeats-AatII-AgeI-VP64-Stop-PstI
I-NGp111-42 in pPGK-VP64
ATGTACCCATACGATGTCCCAGACTACGCGAATTCCCCGGGGATCCCAGGCATGGGGCCCAAAAAGAAACGCAAAGT
TGGGCGCCTCGAGATGGATCCCTCCGACGCTTCGCCGGCCGCGCAGGTGGATCTACGCACGCTCGGCTACAGTCAGC
AGCAGCAAGAGAAGATCAAACCGAAGGTGCGTTCGACAGTGGCGCAGCACCACGAGGCACTGGTGGGCCATGGGTTT
ACACACGCGCACATCGTTGCGCTCAGCCAACACCCGGCAGCGTTAGGGACCGTCGCTGTCACGTATCAGCACATAAT
CACGGCGTTGCCAGAGGCGACACACGAAGACATCGTTGGCGTCGGCAAACAGTGGTCCGGCGCACGCGCCCTGGAGG
CCTTGCTCACGGTGGCGGGAGAGTTGAGAGGTCCACCGTTACAGTTGGACACAGGCCAACTTCTCAAGATTGCAAAA
CGTGGCGGCGTGACCGCAGTGGAGGCAGTGCATGCATGGCGCAATGCACTGACGGGTGCCCCCCTGAACCTTACGCC
GCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGCTTCCGGTGC
TGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAG
ACGGTGCAGCGATTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAA
TGGCGGTGGCAAGCAGGCGCTGGAGACTGTCCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGG
AGCAGGTGGTGGCCATCGCCAGCCACGACGGTGGCAAGCAGGCGCTTGAGACGGTGCAGCGGCTGTTGCCGGTGCTG
TGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCTCTGGAGAC
GGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATA
AGGGGGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAG
CAGGTGGTGGCCATCGCCAGCAATGGCGGCGGCAAGCAGGCGCTGGAGACGGTGCAGGCGCTGTTGCCGGTGCTGTG
CCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCAAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGG
TGCAGGCGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCAATCGCCAGCAATGGC
GGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAACA
GGTGGTAGCCATCGCCAGCAATATTGGTGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCC
AGGCCCATGGCCTGACACCCCAGCAGGTGGTAGCGATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACGGTG
CAGCGGCTGCTTCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGG
TGGCAAGCAGGCGCTGGAGACGGTGCAGCGATTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGG
TGGTGGCCATCGCCAGCAATGGCGGTGGCAAGCAGGCGCTGGAGACTGTCCAGCGGCTGTTGCCGGTGCTGTGCCAG
GCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATAAGGGTGGCAAGCAGGCGCTTGAGACGGTGCA
GCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTGGTGGCCATCGCCAGCAATGGCGGTG
GCAAGCAGGCTCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACCCCGGAGCAGGTG
GTGGCCATCGCCAGCCACGACGGGGGCAAGCAGGCGCTGGAGACGGTGCAGCGGCTGTTGCCGGTGCTGTGCCAGGC
CCATGGCCTGACCCCGCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGCGGCAAGCAGGCGCTGGAGACGGTGCAGG
CGCTGTTGCCGGTGCTGTGCCAGGCCCATGGCCTGACACCCCAGCAGGTGGTGGCCATCGCCAGCAATGGCGGCGGC
AGGCCGGCGCTGGAGAGCATTGTTGCCCAGTTATCTCGCCCTGATCCGGCGTTGGCCGCGTTGACCAACGACCACCT
CGTCGCCTTGGCCTGCCTCGGCGGACGTCCTGCCATGGATGCAGTGAAAAAGGGACTGCCGCACGCGCCGGAATTGA
TCAGAAGAGTCAATCGCCGTCCGGATCCTACCGGTGCGGCCGCCGACGCTTTGGATGACTTTGATTTGGACATGCTG
GATGCTCTAGATGACTTCGACCTGGATATGCTGGACGCACTTGACGACTTTGACCTCGACATGCTAGACGCTCTGGA
CGACTTCGATCTAGACATGCTCTAAGTCGACCTGCAG
MYPYDVPDYANSPGIPGMGPKKKRKVGRLEMDPSDASPAAQVDLRTLGYSQQQQEKIKPK
VRSTVAQHHEALVGHGFTHAHIVALSQHPAALGTVAVTYQHIITALPEATHEDIVGVGKQ
WSGARALEALLTVAGELRGPPLQ
LDTGQLLKIAKRGGVTAVEAVHAWRNALTGAPLN
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
10
LTPEQVVAIASNIGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNKGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQALLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNIGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNKGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
LTPEQVVAIASHDGGKQALETVQRLLPVLCQAHG
LTPQQVVAIASNGGGKQALETVQALLPVLCQAHGLTPQQVVAIASNGGGRPALESIVAQ
LSRPDPALAALTNDHLVALACLGGRPAMDAVKKGLPHAPELIRRVNRRPDPTGAAADALD
DFDLDMLDALDDFDLDMLDALDDFDLDMLDALDDFDLDML*VDLQ
11
Figure S2: Coomassie-stained protein gel of representative dTALEs.
A) Coomassie-stained SDS PAGE gel shows purified AvrBs3254-180 after treatment with various
concentrations of Factor Xa protease. B) Purified dTALE111-42 proteins run at the expected 100
kilodalton size. Proteins shown are diluted 1:5 from stock. Typical protein yields were between
10 and 30 µM.
A
B
I-NGp II-NIp II-NKp II-NNp
MBP-
TALE111-42
fusion
KDa
250
150
100
75
50
35
12
Figure S3. Representative EMSA data.
Sample data showing the EMSA dynamic range of 104. The TALEs used are shown and unless
stated, the cognate DNA targets are used. The shifted TALE-DNA band is indicated with an
arrow while the unshifted DNA is indicated with a “*”. The bottom panel shows representative
data for a mixed site target used in polarity experiments. The sharpness of the transition
(apparent nH = 3-4) is consistent over 4 logs of protein concentration, and therefore is not due to
lack of equilibration or protein concentration-dependent aggregation. We do not know its origin.
13
Figure S4: Correlation of affinity with reporter gene activation.
The relationship between apparent KD and fold activation of the reporter gene was not linear.
The best fit model of these data was a log-log relationship. The correlation of log2(Fold
Activation) versus log10(Apparent KD (pM)) had an R2 = 0.68 and a P = 0.0002. Host context is
indicated by color, guest RVD by letter; G, NG; N, NN; D, HD; I, NI; K, NK.
14
Table S1: Primers used to create N- and C-terminal truncations.
EcoRI: GAATTC
5'SfiI: GGCCGCTAAGGCC
3’SfiI: GGCCAAGCTGGCC
XhoI: CTCGAG
BamHI: GGATCC
SmaI: CCCGGG
AgeI: ACCGGT
>PthXoI_5’111-f
5’-
AATAGGAGGTGCACCGAATTCGTGGCCGCTAAGGCCCTCGAGATGGATCCTCCGACGCTTCGCCGGCCGCGCAGGT
>PthXoI_5’94-f
5’-
AATAGGAGGTGCACCGAATTCGTGGCCGCTAAGGCCCTCGAGATGGATCCAGTCAGCAGCAGCAAGAGAAGATCAAA
CC
>PthXoI_5’end-r
5’-CGACGAGGTGGTCGTTGGTCAACGCCCGGGCTGTAACGGCGGACCTCTCAACTC
>PthXoI_3’end-f
5’-GAGTTGAGAGGTCCGCCGTTACAGCCCGGGCGTTGACCAACGACCACCTCGTCG
>PthXoI_3’42-r
5’-
CAAGAAAGCTGGGTCGAATTCGGCCAAGCTGGCCTTACCGGTAGGATCCGGACGGCGATTGACTCTTCTGATCAATT
C
15
Table S2: Primers used in EMSA assays.
Biotinylated forward primer /5Biosg/CCTCTTCGCTATTACGCCAGC
Reverse primer CACCCTGACTCGAGTACGATCGAACGTTC
For RVD affinity studies
I-A CCTCTTCGCTATTACGCCAGC TAAACAGATAAATAGACAA GAACGTTCGATCGTACTCGAGTCAGGGTG
I-G CCTCTTCGCTATTACGCCAGC TGAGCGGGTGAGTGGGCGG GAACGTTCGATCGTACTCGAGTCAGGGTG
I-C CCTCTTCGCTATTACGCCAGC TCACCCGCTCACTCGCCCC GAACGTTCGATCGTACTCGAGTCAGGGTG
I-T CCTCTTCGCTATTACGCCAGC TTATCTGTTTATTTGTCTT GAACGTTCGATCGTACTCGAGTCAGGGTG
II-A CCTCTTCGCTATTACGCCAGC TAAATACAGAAACATAGAA GAACGTTCGATCGTACTCGAGTCAGGGTG
II-G CCTCTTCGCTATTACGCCAGC TGAGTGCGGGAGCGTGGGG GAACGTTCGATCGTACTCGAGTCAGGGTG
II-C CCTCTTCGCTATTACGCCAGC TCACTCCCGCACCCTCGCC GAACGTTCGATCGTACTCGAGTCAGGGTG
II-T CCTCTTCGCTATTACGCCAGC TTATTTCTGTATCTTTGTT GAACGTTCGATCGTACTCGAGTCAGGGTG
III-A CCTCTTCGCTATTACGCCAGC TACAAAGACATAGAAATAA GAACGTTCGATCGTACTCGAGTCAGGGTG
III-G CCTCTTCGCTATTACGCCAGC TGCGAGGGCGTGGGAGTGG GAACGTTCGATCGTACTCGAGTCAGGGTG
III-C CCTCTTCGCTATTACGCCAGC TCCCACGCCCTCGCACTCC GAACGTTCGATCGTACTCGAGTCAGGGTG
III-T CCTCTTCGCTATTACGCCAGC TTCTATGTCTTTGTATTTT GAACGTTCGATCGTACTCGAGTCAGGGTG
Zif268 CCTCTTCGCTATTACGCCAGC GCGTGGGCGT GAACGTTCGATCGTACTCGAGTCAGGGTG
For 5’T studies
Bs3 CCTCTTCGCTATTACGCCAGC TATATAAACCTAACCATCC GAACGTTCGATCGTACTCGAGTCAGGGTG
5’A CCTCTTCGCTATTACGCCAGC AATATAAACCTAACCATCC GAACGTTCGATCGTACTCGAGTCAGGGTG
5’C CCTCTTCGCTATTACGCCAGC CATATAAACCTAACCATCC GAACGTTCGATCGTACTCGAGTCAGGGTG
5’G CCTCTTCGCTATTACGCCAGC GATATAAACCTAACCATCC GAACGTTCGATCGTACTCGAGTCAGGGTG
For polarity studies (B = C,G,T; D = A,G,T; H = A,C,T; V = A,C,G)
Bs3 variants
3m3 CCTCTTCGCTATTACGCCAGC TATATAAACCTAACCAVDD GAACGTTCGATCGTACTCGAGTCAGGGTG
3m6 CCTCTTCGCTATTACGCCAGC TATATAAACCTAADDBVDD GAACGTTCGATCGTACTCGAGTCAGGGTG
3m9 CCTCTTCGCTATTACGCCAGC TATATAAACCVBBDDBVDD GAACGTTCGATCGTACTCGAGTCAGGGTG
5m3 CCTCTTCGCTATTACGCCAGC TBVBTAAACCTAACCATCC GAACGTTCGATCGTACTCGAGTCAGGGTG
5m6 CCTCTTCGCTATTACGCCAGC TBVBVBBACCTAACCATCC GAACGTTCGATCGTACTCGAGTCAGGGTG
5m9 CCTCTTCGCTATTACGCCAGC TBVBVBBBDDTAACCATCC GAACGTTCGATCGTACTCGAGTCAGGGTG
III-C variants
3m3 CCTCTTCGCTATTACGCCAGC TCCCACGCCCTCGCACVDD GAACGTTCGATCGTACTCGAGTCAGGGTG
3m6 CCTCTTCGCTATTACGCCAGC TCCCACGCCCTCGDBDVDD GAACGTTCGATCGTACTCGAGTCAGGGTG
3m9 CCTCTTCGCTATTACGCCAGC TCCCACGCCCVDHDBDVDD GAACGTTCGATCGTACTCGAGTCAGGGTG
5m3 CCTCTTCGCTATTACGCCAGC TDDDACGCCCTCGCACTCC GAACGTTCGATCGTACTCGAGTCAGGGTG
5m6 CCTCTTCGCTATTACGCCAGC TDDDBDHCCCTCGCACTCC GAACGTTCGATCGTACTCGAGTCAGGGTG
5m9 CCTCTTCGCTATTACGCCAGC TDDDBDHDDDTCGCACTCC GAACGTTCGATCGTACTCGAGTCAGGGTG
III-T variants
3m3 CCTCTTCGCTATTACGCCAGC TTCTATGTCTTTGTATVVV GAACGTTCGATCGTACTCGAGTCAGGGTG
3m6 CCTCTTCGCTATTACGCCAGC TTCTATGTCTTTGVBVVVV GAACGTTCGATCGTACTCGAGTCAGGGTG
3m9 CCTCTTCGCTATTACGCCAGC TTCTATGTCTVVHVBVVVV GAACGTTCGATCGTACTCGAGTCAGGGTG
5m3 CCTCTTCGCTATTACGCCAGC TVDVATGTCTTTGTATTTT GAACGTTCGATCGTACTCGAGTCAGGGTG
5m6 CCTCTTCGCTATTACGCCAGC TVDVBVHTCTTTGTATTTT GAACGTTCGATCGTACTCGAGTCAGGGTG
5m9 CCTCTTCGCTATTACGCCAGC TVDVBVHVDVTTGTATTTT GAACGTTCGATCGTACTCGAGTCAGGGTG
16
Table S3: Primers for construction of ATF reporter plasmids.
XhoI: CTCGAG
NotI: GCGGCCGC
For RVD studies
>pGL3-control-Not-F
5’- GAGGAGGCGGCCGCAATAAAATATCTTTATTTTC
>PGL3-I-A-R
5’- CTCCTCCTCGAGTTGTCTATTTATCTGTTTACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-I-G-R
5’- CTCCTCCTCGAGCCGCCCACTCACCCGCTCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-I-C-R
5’- CTCCTCCTCGAGGGGGCGAGTGAGCGGGTGACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-I-T-R
5’- CTCCTCCTCGAGAAGACAAATAAACAGATAACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-II-A-R
5’- CTCCTCCTCGAGTTCTATGTTTCTGTATTTACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-II-G-R
5’- CTCCTCCTCGAGCCCCACGCTCCCGCACTCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-II-C-R
5’- CTCCTCCTCGAGGGCGAGGGTGCGGGAGTGACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-II-T-R
5’- CTCCTCCTCGAGAACAAAGATACAGAAATAACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-A-R
5’- CTCCTCCTCGAGTTATTTCTATGTCTTTGTACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-G-R
5’- CTCCTCCTCGAGCCACTCCCACGCCCTCGCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-C-R
5’- CTCCTCCTCGAGGGAGTGCGAGGGCGTGGGACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-T-R
5’- CTCCTCCTCGAGAAAATACAAAGACATAGAACCCGGGCTAGCACGCGTAAGAGCTC -3’
For AvrBs3 variant studies
>PGL3-Bs3-wt-R
5’- CTCCTCCTCGAGGGATGGTTAGGTTTATATACCCGGGCTAGCACGCGTAAGAGCTC -3’
>pGL3-Bs3-UPA-R
5’- CTCCTCCTCGAGAGAGGGTTAGGTTTATATACCCGGGCTAGCACGCGTAAGAGCTC -3’
17
For AvrBs3 5’T studies
>PGL3-Bs3-5A-R
5’- CTCCTCCTCGAGGGATGGTTAGGTTTATATTCCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-Bs3-5C-R
5’- CTCCTCCTCGAGGGATGGTTAGGTTTATATGCCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-Bs3-5G-R
5’- CTCCTCCTCGAGGGATGGTTAGGTTTATATCCCCGGGCTAGCACGCGTAAGAGCTC -3’
For polarity studies
>PGL3-Bs3-3m3-R
5’- CTCCTCCTCGAGCCCTGGTTAGGTTTATATACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-Bs3-3m6-R
5’- CTCCTCCTCGAGCCCCCCTTAGGTTTATATACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-Bs3-3m9-R
5’- CTCCTCCTCGAGCCCCCCAACGGTTTATATACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-Bs3-5m3-R
5’- CTCCTCCTCGAGGGATGGTTAGGTTTAACCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-Bs3-5m6-R
5’- CTCCTCCTCGAGGGATGGTTAGGTAACACCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-Bs3-5m9-R
5’- CTCCTCCTCGAGGGATGGTTACCAAACACCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-T-3m3-R
5’- CTCCTCCTCGAGCCCATACAAAGACATAGAACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-T-3m6-R
5’- CTCCTCCTCGAGCCCCACCAAAGACATAGAACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-T-3m9-R
5’- CTCCTCCTCGAGCCCCACGCCAGACATAGAACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-T-5m3-R
5’- CTCCTCCTCGAGAAAATACAAAGACATCCCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-T-5m6-R
5’- CTCCTCCTCGAGAAAATACAAAGAGCACCCACCCGGGCTAGCACGCGTAAGAGCTC -3’
>PGL3-III-T-5m9-R
5’- CTCCTCCTCGAGAAAATACAACCCGCACCCACCCGGGCTAGCACGCGTAAGAGCTC -3’
Alignment
>PGL3-Bs3-UPA-R TATATAAACCTAACCCTCT
>PGL3-Bs3-wt-R TATATAAACCTAACCATCC
>PGL3-Bs3-5A-R AATATAAACCTAACCATCC
>PGL3-Bs3-5C-R CATATAAACCTAACCATCC
18
>PGL3-Bs3-5G-R GATATAAACCTAACCATCC
>PGL3-I-A-R TAAACAGATAAATAGACAA
>PGL3-I-G-R TGAGCGGGTGAGTGGGCGG
>PGL3-I-C-R TCACCCGCTCACTCGCCCC
>PGL3-I-T-R TTATCTGTTTATTTGTCTT
>PGL3-II-A-R TAAATACAGAAACATAGAA
>PGL3-II-G-R TGAGTGCGGGAGCGTGGGG
>PGL3-II-C-R TCACTCCCGCACCCTCGCC
>PGL3-II-T-R TTATTTCTGTATCTTTGTT
>PGL3-III-A-R TACAAAGACATAGAAATAA
>PGL3-III-G-R TGCGAGGGCGTGGGAGTGG
>PGL3-III-C-R TCCCACGCCCTCGCACTCC
>PGL3-III-T-R TTCTATGTCTTTGTATTTT
>PGL3-Bs3-wt-R TATATAAACCTAACCATCC
>PGL3-Bs3-3m3-R TATATAAACCTAACCAGGG
>PGL3-Bs3-3m6-R TATATAAACCTAAGGGGGG
>PGL3-Bs3-3m9-R TATATAAACCGTTGGGGGG
>PGL3-Bs3-5m3-R TGGTTAAACCTAACCATCC
>PGL3-Bs3-5m6-R TGGTGTTACCTAACCATCC
>PGL3-Bs3-5m9-R TGGTGTTTGGTAACCATCC
>PGL3-III-T-wt-R TTCTATGTCTTTGTATTTT
>PGL3-III-T-3m3-R TTCTATGTCTTTGTATGGG
>PGL3-III-T-3m6-R TTCTATGTCTTTGGTGGGG
>PGL3-III-T-3m9-R TTCTATGTCTGGCGTGGGG
>PGL3-III-T-5m3-R TGGGATGTCTTTGTATTTT
>PGL3-III-T-5m6-R TGGGTGCTCTTTGTATTTT
>PGL3-III-T-5m9-R TGGGTGCGGGTTGTATTTT
19
Table S4: EMSA and ATF data for G/A specificity of the NN RVD.
Table S5: EMSA and ATF data probing the requirement of a 5’T.
* These values of AvrBs3111-42 and dAvrBs3111-42 binding to the Bs3 targets represent the
average of two independent EMSA experiments performed on the same days as other individual
determinations in the table. The values differ from that in Table 1, but were included in the
computation of the Table 1 value.
** Percentage affinities of 5’ A, C and G were calculated using the values for AvrBs3111-42 or
dAvrBs3111-42 binding to Bs3 (5’ T) targets, determined on the same day, as a reference.
20
Table S6: EMSA and ATF data for polarity effects.
21
References
1. Mussolino, C. and Cathomen, T. (2012) TALE nucleases: tailored genome engineering
made easy. Curr Opin Biotechnol.
2. Miller, J.C., Tan, S., Qiao, G., Barlow, K.A., Wang, J., Xia, D.F., Meng, X., Paschon,
D.E., Leung, E., Hinkley, S.J. et al. (2011) A TALE nuclease architecture for efficient
genome editing. Nat Biotechnol, 29, 143-148.
3. Sun, N., Liang, J., Abil, Z. and Zhao, H. (2012) Optimized TAL effector nucleases
(TALENs) for use in treatment of sickle cell disease. Mol Biosyst, 8, 1255-1263.
4. Mak, A.N., Bradley, P., Cernadas, R.A., Bogdanove, A.J. and Stoddard, B.L. (2012) The
crystal structure of TAL effector PthXo1 bound to its DNA target. Science, 335, 716-719.
5. Zhang, F., Cong, L., Lodato, S., Kosuri, S., Church, G.M. and Arlotta, P. (2011) Efficient
construction of sequence-specific TAL effectors for modulating mammalian transcription.
Nat Biotechnol, 29, 149-153.
6. Deng, D., Yan, C., Pan, X., Mahfouz, M., Wang, J., Zhu, J.K., Shi, Y. and Yan, N. (2012)
Structural basis for sequence-specific recognition of DNA by TAL effectors. Science,
335, 720-723.

Supplementary resource (1)

... Depuis, plusieurs études ont testé la spécificité d'accroche d'autres RVD (Anderson et al., 2020;Yang et al., 2014). Cependant, la spécificité d'accroche des RVD sur l'ADN n'est pas si simple car elle peut dépendre de la position et du nombre de RVD, du contexte protéique autour des RVD, des marques épigénétiques présentes sur l'EBE ainsi que du contexte ionique (Anderson et al., 2020;Cuculis et al., 2019;Deng et al., 2012b;Liu et al., 2020;Meckler et al., 2013;Rinaldi et al., 2017;Rogers et al., 2015). Par exemple, les RVD situés en région N terminale de la région répétée contribuent de façon plus importante à l'affinité avec l'EBE que les RVD situés en région C terminale (Meckler et al., 2013). ...
... Cependant, la spécificité d'accroche des RVD sur l'ADN n'est pas si simple car elle peut dépendre de la position et du nombre de RVD, du contexte protéique autour des RVD, des marques épigénétiques présentes sur l'EBE ainsi que du contexte ionique (Anderson et al., 2020;Cuculis et al., 2019;Deng et al., 2012b;Liu et al., 2020;Meckler et al., 2013;Rinaldi et al., 2017;Rogers et al., 2015). Par exemple, les RVD situés en région N terminale de la région répétée contribuent de façon plus importante à l'affinité avec l'EBE que les RVD situés en région C terminale (Meckler et al., 2013). ...
... Par exemple, les effecteurs TAL peuvent s'adapter à des EBE modifiées en mutant leurs répétitions, en portant des répétitions de longueur aberrante ou bien les souches peuvent acquérir d'autres effecteurs TAL capables d'induire la même cible en fixant une autre EBE (Hutin et al., 2015a;. Pour optimiser la durabilité de la résistance, il serait préférable de modifier les premières bases azotées de l'EBE, car elles sont essentielles pour l'accroche de l'effecteur TAL (Meckler et al., 2013;. Aussi, les insertions ou les délétions de plusieurs paires de bases pourraient être plus durables, car les effecteurs TAL doivent alors modifier plusieurs de leurs répétitions pour pouvoir s'adapter à l'EBE modifiée. ...
Thesis
Full-text available
La pathogénie des bactéries du genre Xanthomonas repose en partie sur leur capacité à injecter des effecteurs TAL (transcription activator-like) dans la cellule végétale. Ces effecteurs agissent comme des facteurs de transcription eucaryotes en induisant des gènes de sensibilité chez la plante hôte afin de favoriser l’infection. L’objectif de cette thèse est de mieux comprendre l’interaction Xanthomonas-haricot, et notamment le rôle des effecteurs TAL dans la graisse commune causée par Xanthomonas phaseoli pv. phaseoli et X. citri pv. fuscans. Pour cela, une étude transcriptomique a été menée pour évaluer l’impact de Xanthomonas sur le transcriptome du haricot en contextes résistant et sensible. Cette étude a permis de révéler des gènes et des voies métaboliques en lien avec la résistance ou la sensibilité de la plante.A l’aide d’une combinaison d’approches transcriptomiques et de tests de pouvoir pathogène, les cibles potentielles de deux effecteurs TAL ont été étudiées. Ces approches ont montré que XfuTAL1 et XfuTAL2 étaient importants pour l’agressivité de Xanthomonas et que XfuTAL1 induisait le gène de sensibilité PvAIL1 codant un facteur de transcription. Cette étude a aussi révélé que la structure atypique de XfuTAL2 augmentait sa capacité d’accroche à l’ADN et jouerait potentiellement un rôle dans la pathogénie. Par ailleurs, une nouvelle méthode de phénotypage a été mise au point pour évaluer l’agressivité de Xanthomonas sur haricot. Ensemble, ces découvertes permettent d’apporter de nouvelles perspectives pour le développement de résistances à Xanthomonas chez le haricot et soulignent l'importance des effecteurs TAL dans ce pathosystème.
... One example searched for sequences that differed by at least 3 bp from the dTALE-binding sequence and were absent from all promoter sequences (up to 2 kbp upstream of the ATG) in the human genome (Garg et al., 2012). However, three mismatches may not be sufficient to prevent binding of the dTALE because there is growing evidence that the impact of mismatches on the ability of RVDs to bind cognate bases depends on the combined effects of RVD-type, position within the EBE, overall RVD-composition, and the number of repeats (Juillerat et al., 2015;Meckler et al., 2013;Miller et al., 2015;Rinaldi et al., 2017;Rogers et al., 2015;Streubel et al., 2012). With this in mind, position-dependent base preferences for canonical RVDs (those with amino acid variants HD, NI, NG, or NN) have been evaluated and have been used to rate the impact of specific RVD-base mismatches in the context of the repeat array (Erkes et al., 2019;Miller et al., 2015). ...
... Importantly, mismatches can have different impacts on dTALE-DNA interactions and it is possible that an off-target with 3 or even more mismatches could be bound and induced by a corresponding dTALE. The impact of a mismatch on dTALE-DNA interaction depends on the kind of the RVD-base mismatch and the position of the RVD-base mismatch within the corresponding EBE (Erkes et al., 2019;Meckler et al., 2013;Miller et al., 2015;Rogers et al., 2015). Generally, T 0 -proximal mismatches are less tolerated then T 0 -distal mismatches. ...
Article
Full-text available
In biological discovery and engineering research there is a need to spatially and/or temporally regulate transgene expression. However, the limited availability of promoter sequences that are uniquely active in specific tissue‐types and/or at specific times often precludes co‐expression of multiple transgenes in precisely‐controlled developmental contexts. Here we developed a system for use in rice that comprises synthetic designer transcription activator‐like effectors (dTALEs) and cognate synthetic TALE‐activated promoters (STAPs). The system allows multiple transgenes to be expressed from different STAPs, with the spatial and temporal context determined by a single promoter that drives expression of the dTALE. We show that two different systems – dTALE1‐STAP1 and dTALE2‐STAP2 – can activate STAP‐driven reporter gene expression in stable transgenic rice lines, with transgene transcript levels dependent on both dTALE and STAP sequence identities. The relative strength of individual STAP sequences is consistent between dTALE1 and dTALE2 systems but differs between cell‐types, requiring empirical evaluation in each case. dTALE expression leads to off‐target activation of endogenous genes but the number of genes affected is substantially less than the number impacted by the somaclonal variation that occurs during the regeneration of transformed plants. With the potential to design fully orthogonal dTALEs for any genome of interest, the dTALE‐STAP system thus provides a powerful approach to fine‐tune the expression of multiple transgenes, and to simultaneously introduce different synthetic circuits into distinct developmental contexts.
... TALENs possess two advantages in comparison to ZFNs for genome editing; first is an assembly of functional nuclease needs, lesser time, and experience. Second, compared to the ZFNs, TALENs show higher affinity to the target DNA, are more specific, and have reduced toxicity (Meckler et al. 2013;Mussolino et al. 2014). TALENs are considerably more than ZFNs with highly repetitive structures, making them competent to deliver efficiently into the cell via lentivirus or AAV (single adeno-associated virus) (Holkers et al. 2013). ...
... [30,31] However, depending on the position and number of mismatches, the target sequence recognition by TALEs is adversely affected. [32][33][34][35] Therefore, the association constant for DNA-TALE complex formation is expected to be regulated by the appropriate introduction of mismatches into the TALE recognition sequence. ...
Article
Full-text available
Transcriptional activator‐like effector (TALE), a DNA‐binding protein, is widely used in genome editing. However, the recognition of the target sequence by the TALE is adversely affected by the number of mismatches. Therefore, the association constant of DNA‐TALE complex formation can be controlled by appropriately introducing a mismatch into the TALE recognition sequence. This study aimed to construct a TALE that can distinguish a single nucleotide difference. Our results show that a single mismatch present in repeats 2 or 3 of TALE did not interfere with the complex formation with DNA, whereas continuous mismatches present in repeats 2 and 3 significantly reduced association with the target DNA. Based on these findings, we constructed a detection system of the one nucleotide difference in gene with high accuracy and constructed a TALE‐nuclease (TALEN) that selectively cleaves DNA with a single mismatch.
... Moreover, the construction of zinc-finger arrays is difficult, making it tedious to assemble a functional nuclease, which limits the use of ZFNs as an efficient gene-editing tool [149]. TALENs are a highly specific, low cytotoxic, and flexible gene-editing tool, due to their increased and precise affinity for target bases of DNA [172,173]. However, TALENs are large proteins with highly repetitive structures, making it difficult to efficiently deliver them to cells [174]. ...
Article
Full-text available
Prion diseases are fatal infectious neurodegenerative disorders affecting both humans and animals. They are caused by the misfolded isoform of the cellular prion protein (PrPC), PrPSc, and currently no options exist to prevent or cure prion diseases. Chronic wasting disease (CWD) in deer, elk and other cervids is considered the most contagious prion disease, with extensive shedding of infectivity into the environment. Cell culture models provide a versatile platform for convenient quantification of prions, for studying the molecular and cellular biology of prions, and for performing high-throughput screening of potential therapeutic compounds. Unfortunately, only a very limited number of cell lines are available that facilitate robust and persistent propagation of CWD prions. Gene-editing using programmable nucleases (e.g., CRISPR-Cas9 (CC9)) has proven to be a valuable tool for high precision site-specific gene modification, including gene deletion, insertion, and replacement. CC9-based gene editing was used recently for replacing the PrP gene in mouse and cell culture models, as efficient prion propagation usually requires matching sequence homology between infecting prions and prion protein in the recipient host. As expected, such gene-editing proved to be useful for developing CWD models. Several transgenic mouse models were available that propagate CWD prions effectively, however, mostly fail to reproduce CWD pathogenesis as found in the cervid host, including CWD prion shedding. This is different for the few currently available knock-in mouse models that seem to do so. In this review, we discuss the available in vitro and in vivo models of CWD, and the impact of gene-editing strategies.
... TALE proteins are significantly elongated in DNA-free mode in comparison to the bound state [60Å versus 35Å for 11.5 repeats (27)]. Initial DNA contact and nonspecific DNA binding is facilitated by the N-terminal domain of a TALE (61) and the first few repeats close to the N-terminal domain are more important for binding than later ones (62)(63)(64). If two TALE binding sites overlap, the TALE with a blocked N-terminal region is readily displaced (65). ...
Article
Full-text available
Transcription activator-like effectors (TALEs) are bacterial proteins with a programmable DNA-binding domain, which turned them into exceptional tools for biotechnology. TALEs contain a central array of consecutive 34 amino acid long repeats to bind DNA in a simple one-repeat-to-one-nucleotide manner. However, a few naturally occurring aberrant repeat variants break this strict binding mechanism, allowing for the recognition of an additional sequence with a −1 nucleotide frameshift. The limits and implications of this extended TALE binding mode are largely unexplored. Here, we analyse the complete diversity of natural and artificially engineered aberrant repeats for their impact on the DNA binding of TALEs. Surprisingly, TALEs with several aberrant repeats can loop out multiple repeats simultaneously without losing DNA-binding capacity. We also characterized members of the only natural TALE class harbouring two aberrant repeats and confirmed that their target is the major virulence factor OsSWEET13 from rice. In an aberrant TALE repeat, the position and nature of the amino acid sequence strongly influence its function. We explored the tolerance of TALE repeats towards alterations further and demonstrate that inserts as large as GFP can be tolerated without disrupting DNA binding. This illustrates the extraordinary DNA-binding capacity of TALEs and opens new uses in biotechnology.
Article
TALEs (transcription activator-like effectors) in plant-pathogenic Xanthomonas bacteria activate expression of plant genes and support infection or cause a resistance response. PthA4AT is a TALE with a particularly short DNA-binding domain harbouring only 7.5-repeats which triggers cell death in Nicotiana benthamiana; however, the genetic basis for this remains unknown. To identify possible target genes of PthA4AT that mediate cell death in N. benthamiana, we exploited the modularity of TALEs to stepwise enhance their specificity and reduce potential target sites. Substitutions of individual repeats suggested that PthA4AT-dependent cell death is sequence-specific. Stepwise addition of repeats to the C-terminal or N-terminal end of the repeat region narrowed the sequence requirements in promoters of target genes. Transcriptome profiling and in silico target prediction allowed the isolation of two cell death-inducer genes, which encode a patatin-like protein and a bifunctional monodehydroascorbate reductase/carbonic anhydrase protein. These two proteins are not linked to known TALE-dependent resistance genes. Our results show that the aberrant expression of different endogenous plant genes can cause a cell death reaction, which supports the hypothesis that TALE-dependent executor resistance genes can originate from various plant processes. Our strategy further demonstrates the use of TALEs to scan genomes for genes triggering cell death and other relevant phenotypes.
Article
Recently, biologists have gained access to several far-field fluorescence nanoscopy (FN) technologies that allow the observation of cellular components with ~20 nm resolution. FN is revolutionizing cell biology by enabling the visualization of previously inaccessible subcellular details. While technological advances in microscopy are critical to the field, optimal sample preparation and labeling are equally important and often overlooked in FN experiments. In this review, we provide an overview of the methodological and experimental factors that must be considered when performing FN. We present key concepts related to the selection of affinity-based labels, dyes, multiplexing, live cell imaging approaches, and quantitative microscopy. Consideration of these factors greatly enhances the effectiveness of FN, making it an exquisite tool for numerous biological applications.
Article
Cornea, a dome-shaped and transparent front part of the eye, affords 2/3rd refraction and barrier functions. Globally, corneal diseases are the leading cause of vision impairment. Loss of corneal function including opacification involve the complex crosstalk and perturbation between a variety of cytokines, chemokines and growth factors generated by corneal keratocytes, epithelial cells, lacrimal tissues, nerves, and immune cells. Conventional small-molecule drugs can treat mild-to-moderate traumatic corneal pathology but requires frequent application and often fails to treat severe pathologies. The corneal transplant surgery is a standard of care to restore vision in patients. However, declining availability and rising demand of donor corneas are major concerns to maintain ophthalmic care. Thus, the development of efficient and safe nonsurgical methods to cure corneal disorders and restore vision in vivo is highly desired. Gene-based therapy has huge potential to cure corneal blindness. To achieve a nonimmunogenic, safe and sustained therapeutic response, the selection of a relevant genes, gene editing methods and suitable delivery vectors are vital. This article describes corneal structural and functional features, mechanistic understanding of gene therapy vectors, gene editing methods, gene delivery tools, and status of gene therapy for treating corneal disorders, diseases, and genetic dystrophies.
Chapter
The detection of target nucleotide is important for the accurate diagnosis of tumor development, genetic diseases, and infectious diseases and for food and environmental safety monitoring. To perform rapid, sensitive and accurate identification of such nucleotides, various biosensing principles have been reported. This chapter focuses on the detection principles relies on programmable nucleic acid-binding proteins, including zinc-finger protein, transcription activator-like effector protein, and clustered regularly interspaced short palindromic repeats-associated protein to summarize their characteristics and to discuss future directions.
Article
Full-text available
Cell death and differentiation is a monthly research journal focused on the exciting field of programmed cell death and apoptosis. It provides a single accessible source of information for both scientists and clinicians, keeping them up-to-date with advances in the field. It encompasses programmed cell death, cell death induced by toxic agents, differentiation and the interrelation of these with cell proliferation.
Article
Full-text available
The DNA binding domain of Transcription Activator-Like (TAL) effectors can easily be engineered to have new DNA sequence specificities. Consequently, engineered TAL effector proteins have become important reagents for manipulating genomes in vivo. DNA binding by TAL effectors is mediated by arrays of 34 amino acid repeats. In each repeat, one of two amino acids (repeat variable di-residues, RVDs) contacts a base in the DNA target. RVDs with specificity for C, T and A have been described; however, among RVDs that target G, the RVD NN also binds A, and NK is rare among naturally occurring TAL effectors. Here we show that TAL effector nucleases (TALENs) made with NK to specify G have less activity than their NN-containing counterparts: fourteen of fifteen TALEN pairs made with NN showed more activity in a yeast recombination assay than otherwise identical TALENs made with NK. Activity was assayed for three of these TALEN pairs in human cells, and the results paralleled the yeast data. The in vivo data is explained by in vitro measurements of binding affinity demonstrating that NK-containing TAL effectors have less affinity for targets with G than their NN-containing counterparts. On targets for which G was substituted with A, higher G-specificity was observed for NK-containing TALENs. TALENs with different N- and C-terminal truncations were also tested on targets that differed in the length of the spacer between the two TALEN binding sites. TALENs with C-termini of either 63 or 231 amino acids after the repeat array cleaved targets across a broad range of spacer lengths - from 14 to 33 bp. TALENs with only 18 aa after the repeat array, however, showed a clear optimum for spacers of 13 to 16 bp. The data presented here provide useful guidelines for increasing the specificity and activity of engineered TAL effector proteins.
Article
Full-text available
Site-specific recombinases are powerful tools for genome engineering. Hyperactivated variants of the resolvase/invertase family of serine recombinases function without accessory factors, and thus can be re-targeted to sequences of interest by replacing native DNA-binding domains (DBDs) with engineered zinc-finger proteins (ZFPs). However, imperfect modularity with particular domains, lack of high-affinity binding to all DNA triplets, and difficulty in construction has hindered the widespread adoption of ZFPs in unspecialized laboratories. The discovery of a novel type of DBD in transcription activator-like effector (TALE) proteins from Xanthomonas provides an alternative to ZFPs. Here we describe chimeric TALE recombinases (TALERs): engineered fusions between a hyperactivated catalytic domain from the DNA invertase Gin and an optimized TALE architecture. We use a library of incrementally truncated TALE variants to identify TALER fusions that modify DNA with efficiency and specificity comparable to zinc-finger recombinases in bacterial cells. We also show that TALERs recombine DNA in mammalian cells. The TALER architecture described herein provides a platform for insertion of customized TALE domains, thus significantly expanding the targeting capacity of engineered recombinases and their potential applications in biotechnology and medicine.
Article
Full-text available
Cell death and differentiation is a monthly research journal focused on the exciting field of programmed cell death and apoptosis. It provides a single accessible source of information for both scientists and clinicians, keeping them up-to-date with advances in the field. It encompasses programmed cell death, cell death induced by toxic agents, differentiation and the interrelation of these with cell proliferation.
Article
Full-text available
Transcription activator-like effectors are sequence-specific DNA-binding proteins that harbour modular, repetitive DNA-binding domains. Transcription activator-like effectors have enabled the creation of customizable designer transcriptional factors and sequence-specific nucleases for genome engineering. Here we report two improvements of the transcription activator-like effector toolbox for achieving efficient activation and repression of endogenous gene expression in mammalian cells. We show that the naturally occurring repeat-variable diresidue Asn-His (NH) has high biological activity and specificity for guanine, a highly prevalent base in mammalian genomes. We also report an effective transcription activator-like effector transcriptional repressor architecture for targeted inhibition of transcription in mammalian cells. These findings will improve the precision and effectiveness of genome engineering that can be achieved using transcription activator-like effectors.
Article
Full-text available
The ability to engineer biological circuits that process and respond to complex cellular signals has the potential to impact many areas of biology and medicine. Transcriptional activator-like effectors (TALEs) have emerged as an attractive component for engineering these circuits, as TALEs can be designed de novo to target a given DNA sequence. Currently, however, the use of TALEs is limited by degeneracy in the site-specific manner by which they recognize DNA. Here, we propose an algorithm to computationally address this problem. We apply our algorithm to design 180 TALEs targeting 20 bp cognate binding sites that are at least 3 nt mismatches away from all 20 bp sequences in putative 2 kb human promoter regions. We generated eight of these synthetic TALE activators and showed that each is able to activate transcription from a targeted reporter. Importantly, we show that these proteins do not activate synthetic reporters containing mismatches similar to those present in the genome nor a set of endogenous genes predicted to be the most likely targets in vivo. Finally, we generated and characterized TALE repressors comprised of our orthogonal DNA binding domains and further combined them with shRNAs to accomplish near complete repression of target gene expression.
Article
New technologies have recently emerged that enable targeted editing of genomes in diverse systems. This includes precise manipulation of gene sequences in their natural chromosomal context and addition of transgenes to specific genomic loci. This progress has been facilitated by advances in engineering targeted nucleases with programmable, site-specific DNA-binding domains, including zinc finger proteins and transcription activator-like effectors (TALEs). Recent improvements have enhanced nuclease performance, accelerated nuclease assembly, and lowered the cost of genome editing. These advances are driving new approaches to many areas of biotechnology, including biopharmaceutical production, agriculture, creation of transgenic organisms and cell lines, and studies of genome structure, regulation, and function. Genome editing is also being investigated in preclinical and clinical gene therapies for many diseases.
Article
Engineered transcription activator–like effector nucleases (TALENs) have shown promise as facile and broadly applicable genome editing tools. However, no publicly available high-throughput method for constructing TALENs has been published, and large-scale assessments of the success rate and targeting range of the technology remain lacking. Here we describe the fast ligation-based automatable solid-phase high-throughput (FLASH) system, a rapid and cost-effective method for large-scale assembly of TALENs. We tested 48 FLASH-assembled TALEN pairs in a human cell–based EGFP reporter system and found that all 48 possessed efficient gene-modification activities. We also used FLASH to assemble TALENs for 96 endogenous human genes implicated in cancer and/or epigenetic regulation and found that 84 pairs were able to efficiently introduce targeted alterations. Our results establish the robustness of TALEN technology and demonstrate that FLASH facilitates high-throughput genome editing at a scale not currently possible with other genome modification technologies.
Article
Custom-made designer nucleases have evolved into an indispensable platform to precisely alter complex genomes for basic research, biotechnology, synthetic biology, or human gene therapy. In this review we describe how transcription activator-like effector nucleases (TALENs) have rapidly developed into a chief technology for targeted genome editing in different model organisms as well as human stem cells. We summarize the technological background and provide an overview of the current state-of-the-art of TALENs with regard to activity and specificity of these nucleases for targeted genome engineering.