ArticlePDF Available

PlantProm: A database of plant promoter sequences

Authors:

Abstract and Figures

PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DB contains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions (−200 : +51) with TSS on the fixed position +201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support ftp: vector machine approach for promoter identification is under development. PlantProm DB is available at http://mendel.cs.rhul.ac.uk/ and http://www.softberry.com/.
Content may be subject to copyright.
PlantProm: a database of plant promoter sequences
Ilham A. Shahmuradov, Alex J. Gammerman, John M. Hancock, Peter M. Bramley
1
and
Victor V. Solovyev
2,
*
Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK,
1
School of
Biological Sciences, Royal Holloway, University of London, UK and
2
Softberry Inc., 116 Radio Circle, Suite 400, Mount
Kisco, NY 10549, USA
Received August 15, 2002, Revised September 25, 2002, Accepted October 2, 2002
ABSTRACT
PlantProm DB, a plant promoter database, is an
annotated, non-redundant collection of proximal
promoter sequences for RNA polymerase II with
experimentally determined transcription start
site(s), TSS, from various plant species. The first
release (2002.01) of PlantProm DB contains 305
entries including 71, 220 and 14 promoters from
monocot, dicot and other plants, respectively. It
provides DNA sequence of the promoter regions
(200 : þ51) with TSS on the fixed position þ201,
taxonomic/promoter type classification of promo-
ters and Nucleotide Frequency Matrices (NFM) for
promoter elements: TATA-box, CCAAT-box and
TSS-motif (Inr). Analysis of TSS-motifs revealed
that their composition is different in dicots and
monocots, as well as for TATA and TATA-less
promoters. The database serves as learning set in
developing plant promoter prediction programs.
One such program (TSSP) based on discriminant
analysis has been created by Softberry Inc. and the
application of a support vector machine approach
for promoter identification is under development.
PlantProm DB is available at http://mendel.cs.
rhul.ac.uk/ and http://www.softberry.com/.
INTRODUCTION
Draft nuclear genome sequences of Arabidopsis thaliana (1)
and Oryza sativa (2,3), representing dicotyledonous and
monocotyledonous higher plants, respectively, have been
published. In addition, the putative gene contents of these
genomes, predicted mostly by computer methods, are available
(2,3,4; ftp://ftp.ncbi.nih.gov/genbank/genomes/A_thaliana;
http://www.tigr.org/tdb/e2k1/ath1; http://mendel.cs.rhul.ac.uk/
Arabidopsis). However, as both computer programs and
experimental approaches for gene discovery have known
limitations, we are still far from a fine picture of genome
architecture. In particular, for all widely used gene prediction
methods, one of the difficulties is accurate detection of the first
(non-coding or partially coding) exon. The most accurate
approach to solve this problem is to use information on full-
length cDNAs. Unfortunately, no such information is available
for most plant genes. Therefore, as well as being of special
importance in understanding the regulation of gene expression,
identification of plant promoters may serve as an essential
element in gene annotation as well as in developing
computational promoter prediction approaches. Currently,
promoter identification is one of the most challenging
problems in computational biology.
The term ‘promoter is used to designate a region in the
genome sequence upstream of a gene transcription start site
(TSS), although sequences downstream of TSS may also affect
transcription initiation. Promoter elements select the transcrip-
tion initiation point, transcription specificity and rate.
Depending on the distance from the TSS, the terms of
‘proximal promoter’ (several hundreds nucleotides around the
TSS) and ‘distal promoter’ (thousands and more nucleotides
upstream of the TSS) are also used. Both proximal and distal
promoters include sets of various elements participating in the
complex process of cell-, issue-, organ-, developmental stage-
and environmental factors-specific regulation of transcription.
Most promoter elements regulating TSS selection are localized
in the proximal promoter.
To date, there are a number of databases with information on
cis-acting elements that control the transcription initiation by
binding corresponding nuclear factors. These include
TRANSFAC (5), TRRD (6), ooTFD (7), COMPEL (8),
PlantCARE (9), PLACE (10) and RegSite (http://softberry.
com). The last three databases are plant-oriented collections of
transcription regulatory elements. The Eukaryotic Promoter
Database (EPD) is only established collection of sequences of
eukaryotic Pol II promoters (11). The latest release (#71)
includes a total of 1402 entries, mainly of promoters from
animals, with only about 200 from plant species.
In the course of development of a new computer method for
predicting Pol II promoters of plant genes, we have collected
Pol II promoter sequences from various plants. These data are
incorporated on a new bioinformatics web server (http://
*To whom correspondence should be addressed. Email: victor@softberry.com
Present address:
John M. Hancock, MRC Mammalian Genetics Unit, Harwell, Oxfordshire, UK
114–117 Nucleic Acids Research, 2003, Vol. 31, No. 1
#
2003 Oxford University Press
DOI: 10.1093/nar/gkg041
www.mendel.cs.rhul.ac.uk) developed by the Department of
Computer Science at Royal Holloway, University of London,
in collaboration with Softberry Inc. (USA). It is designed to
present information about plant genomes, genes and new
approaches to their analysis. This article describes the criteria
used for the promoter data collecting procedure, specic
features of plant promoter sequences and Plant Promoter
Database (PlantProm DB).
Description of PlantProm DB
Criteria for selecting promoter sequences. For collecting
plant gene promoters the following rules were followed.
(i) There is experimental evidence of the TSS position(s) of
the gene, published in the literature. For genes with
multiple TSSs the nearest to the CDS start position is
taken, if no additional information on the predominance
of one of them is available (positions of other TSSs are
given in the name line of the sequence written in the
FASTA format).
(ii) The length of known promoter sequence upstream of
chosen TSS is 200 bp or more; all stored promoter
sequences are the same length, 251 bp, where the position
201 corresponds to the TSS, i.e. collected sequences
occupy the region (200 : þ51), with the TSS in the
position þ1, and, thus, present proximal promoters
mentioned above.
(iii) An entry corresponds to the gene mapped on the genomic
sequences.
(iv) Various alleles of a gene are presented in the database by a
single entry.
(v) Genes with more than one non-allelic copy in the genome
as well as paralogous genes are taken as different entries.
Information content of the database
The annotated, non-redundant PlantProm DBL (release
2002.01) has 305 entries including 71, 220 and 14 promoters
for RNA polymerase II from monocot, dicot and other plants,
respectively. It provides the following information on plant
promoters with experimentally known transcription start
site(s):
(i) DNA sequence of the promoter region (200 : þ51);
(ii) Nucleotide Frequency Matrices (NFM) for canonical
promoter elements (TATA-box, CCAAT-box and TSS-
motif or Initiator element, Inr);
(iii) Taxonomic and promoter type classication of promoters.
To compute nucleotide frequency matrices for various
promoter elements, a pairwise comparison of a region
[50 : þ1) of 305 plant promoters has been performed and
one of the couple of promoters showing more than 90%
homology has been excluded from the initial collection. As a
result, 4 promoters were excluded and are denoted by
Excluded in the name line of these promoters sequences.
In simple implementation of Expectation Maximization
(EM) algorithm (12), we considered the sequence of motif
X ¼ (x
1
,x
2
, ..., x
l
), where l is the motif length. If P
i
(x
j
) is the
empiric frequency of the nucleotide x
j
in position i (computed
on previous iteration), then the weight of this motif is
computed as
W ðX Þ¼
log
Q
P
i
ðx
j
Þ
0:25
Using the EM procedure for 10 iterations, the initial
collection of 305 (301 unrelated) promoters was divided into
the 2 classes: 175 (171 unrelated) TATA promoters and 130
TATA-less promoters. In calculations of TATA matrices, the
Table 1. Nucleotide frequencies matrix for TATA box from 171 unrelated plant promoters
a
<2 <112345678>1 >2
A 0.28 0.16 0.03 0.95 0.00 1.00 0.62 0.97 0.38 0.73 0.13 0.30
C 0.27 0.63 0.01 0.00 0.04 0.00 0.00 0.00 0.01 0.08 0.42 0.42
G 0.17 0.05 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.10 0.28 0.16
T 0.28 0.16 0.96 0.05 0.96 0.00 0.38 0.01 0.61 0.09 0.18 0.11
c T A T A A/T A T/A A
a
The mean distance between TATA box and TSS is 26 bp.
Table 2. Nucleotide frequencies matrix for CCAAT box from 131 unrelated plant promoters
a
<4 <3 <2 <112345>1 >2 >3 >4
A 0.31 0.34 0.27 0.30 0.31 0.00 1.00 1.00 0.00 0.28 0.32 0.29 0.40
C 0.19 0.17 0.16 0.18 0.34 1.00 0.00 0.00 0.00 0.20 0.20 0.25 0.17
G 0.20 0.20 0.27 0.21 0.15 0.00 0.00 0.00 0.00 0.20 0.18 0.15 0.15
T 0.30 0.29 0.30 0.31 0.20 0.00 0.00 0.00 1.00 0.32 0.30 0.31 0.28
n CAAT
a
The mean distance between CCAAT box and TSS is 75 bp.
Nucleic Acids Research, 2003, Vol. 31, No. 1 115
allowed variation of a distance between the right boundary of
the TATA-core box and the TSS was 18: 7 40 bp and only
TATAWAWA-core was used for calculating the weight. As an
initial TATA-box matrix, the TATA-matrix computed for 134
plant promoters from EPD (http://www.epd.isb-sib.ch/) was
used. The computed TATA-matrix (Table 1) is in a good
agreement with the TATA-matrix from EPD.
For computation of the CCAAT-box matrix, we considered
the possible distance between the right boundary of CCAAT-
core and the TSS within 50: 7 100 bp. The CCAAT-core was
used for weight calculation and, in accordance with the
available data (13), CCAAT boxes were identied on both
DNA strands. The CAAT matrix is presented in Table 2.
The TSS-motif matrix of 5 bp in length has been computed,
where the 3rd nucleotide was the annotated (anTSS). No strong
consensus was revealed. When the EM approach was used to
analyze all possible pentanucleotides with an assumed TSS
(asTSS) location in the range (anTSS 7 2 : anTSS þ 2), it was
observed that the composition of asTSS-motifs is different in
dicot and monocot plants (Tables 3 and 4), as well as for TATA
and TATA-less promoters (Tables 5 and 6). This nding seems
to be a novel feature of plant promoters.
PlantProm DB, release 2002.01, is available at the web sites
http://mendel.cs.rhul.ac.uk and http://www.softberry.com. The
database will be regularly updated by collection and analysis of
new experimental data on plant promoters as it becomes
available in the literature. PlantProm DB serves as a learning
set in developing plant promoter prediction programs. One
such program (TSSP), based on discriminant analysis of
sequence features and plant regulatory motifs (RegSiteDB),
has been developed by Softberry Inc. (http://www.softberry.
com/berry.phtml?topic=promoter). The application of a sup-
port vector machine approach for promoter identication is
under development.
ACKNOWLEDGEMENTS
PlantProm DB is funded by grant 111/BIO14428 Pattern
Recognition Techniques for Gene Identication in Plant
Genomic Sequences, from the UK Biotechnology and
Biological Sciences Research Council (BBSRC) and is
designed and maintained at Royal Holloway, University of
London in collaboration with Softberry Inc. (USA).
REFERENCES
1. The Arabidopsis Genome Initiative (2000) Analysis of the genome
sequence of the owering plant Arabidopsis thaliana. Nature, 408,
796815.
2. Yu,J., Hu,S., Wang,J., Wong,G.K., Li,S., Liu,B., Deng,Y., Dai,L., Zhou,Y.,
Zhang,X., Cao,M. et al. (2002) A draft sequence of the rice genome (Oryza
sativa L. ssp. indica). Science, 296,7992.
3. Goff,S.A., Ricke,D., Lan,T.-H., Presting,G., Wang,R., Dunn,M.,
Glaze-brook,J., Sessions,A., Oeller,P., Varma,H., Hadley,D. et al. (2002)
A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).
Science, 296,92100.
4. Schoof,H., Zaccaria,P., Gundlach,H., Lemcke,K., Rudd,S., Kolesov,G.,
Arnold,R., Mewes,H.W. and Mayer,K.F. (2002) MIPS Arabidopsis thaliana
Database (MAtDB): an integrated biological knowledge resource based on
the rst complete plant genome. Nucleic Acids Res., 30,9193.
5. Wingender,E., Chen,X., Fricke,E., Geffers,R., Hehl,R., Liebich,I.,
Krull,M., Matys,V., Michael,H., Ohnha¨user,R., Pru¨ß,M., Schacherer,F.,
Thiele,S. and Urbach,S. (2001) The TRANSFAC system on gene
expression regulation. Nucleic Acids Res., 29, 281283.
6. Kolchanov,N.A., Ignatieva,E.V., Ananko,E.A., Pdkolodnaya,O.A.,
Stepanenko,I.L., Merkulova,T.I., Pozdyakov,M.A., Podkolodnny,N.L.,
Naumochkin,A.N. and Romashchenko,A.G. (2002) Transcription
regulatory Regions Database (TRRD): its status in 2002. Nucleic Acids
Res., 30, 312317.
7. Ghosh,D. (2000) Object-oriented Transcription Factors Database (ooTFD).
Nucleic Acids Res., 28, 308310.
Table 3. Nucleotide frequencies matrix for a TSS-motif from 217 unrelated
dicot plants promoters
a
4 3 2 1 þ1 þ2 þ3 þ4
A 0.341 0.249 0.286 0.005 0.604 0.475 0.226 0.272
C 0.184 0.286 0.041 0.507 0.332 0.028 0.359 0.240
G 0.101 0.124 0.041 0.161 0.065 0.101 0.129 0.198
T 0.373 0.341 0.631 0.327 0.000 0.396 0.286 0.290
W n T/a C/t A/c w
a
In 75 cases, the high scoring TSS coincided with the annotated TSS.
Table 4. Nucleotide frequencies matrix for a TSS-motif from 70 unrelated
monocot plants promoters
a
4 3 2 1 þ1 þ2 þ3 þ4
A 0.114 0.214 0.557 0.157 0.186 0.000 0.871 0.143
C 0.443 0.286 0.114 0.386 0.314 0.786 0.114 0.371
G 0.186 0.200 0.143 0.257 0.200 0.143 0.014 0.171
T 0.257 0.300 0.186 0.200 0.300 0.071 0.000 0.314
aNnCA
a
In 17 cases, the high scoring TSS coincided with the annotated TSS.
Table 5. Nucleotide frequencies matrix for a TSS-motif from 171 unrelated
TATA promoters of plants
a
4 3 2 1 þ1 þ2 þ3 þ4
A 0.322 0.263 0.099 0.035 0.865 0.246 0.345 0.368
C 0.251 0.222 0.234 0.719 0.023 0.292 0.421 0.257
G 0.117 0.152 0.111 0.105 0.023 0.105 0.082 0.146
T 0.310 0.363 0.556 0.140 0.088 0.357 0.152 0.228
T/c C A n M
a
In 64 cases, the high scoring TSS coincided with the annotated TSS.
Table 6. Nucleotide frequencies matrix for a TSS-motif from 130 unrelated
TATA-less promoters of plants
a
4 3 2 1 þ1 þ2 þ3 þ4
A 0.385 0.215 0.262 0.023 0.554 0.438 0.331 0.231
C 0.231 0.246 0.231 0.315 0.323 0.292 0.015 0.262
G 0.146 0.200 0.000 0.269 0.123 0.054 0.208 0.215
T 0.238 0.338 0.508 0.392 0.000 0.215 0.446 0.292
T/a/c Y A/c a/c/t t/a/g
a
In 46 cases, the high scoring TSS coincided with the annotated TSS.
116 Nucleic Acids Research, 2003, Vol. 31, No. 1
8. Kel-Margoulis,O.V., Kel,A.E., Reuter,I., Deineko,I.V. and Wingender,E.
(2002) TRANSCompel: a database on composite regulatory elements in
eukaryotic genes. Nucleic Acids Res., 30, 332334.
9. Lescot,M., De
´
hais,P., Thijs,G., Marchal,K., Moreau,Y., Van de Peer,Y.,
Rouze
´
,P. and Rombauts,P. (2002) PlantCARE, a database of plant
cis-acting regulatory elements and a portal to tools for in silico analysis of
promoter sequences. Nucleic Acids Res., 30, 325327.
10. Higo,K., Ugawa,Y., Iwamoto,M. and Korenaga,T. (1999) Plant cis-acting
regulatory DNA elements (PLACE) database. Nucleic Acids Res., 27,
297300.
11. Praz,V., Pe
´
rier,R., Bonnard,C. and Bucher,P. (2002) The eukaryotic
promoter database, EPD: new entry types and linkes to gene expression
data. Nucleic Acids Res., 30, 322324.
12. Cardon,L. and Stormo,G. (1992) Expectation maximization algorithm for
identifying protein-binding sites with variable lengths from unaligned
DNA fragments. J. Mol. Biol., 5, 159170.
13. Mantovani,R. (1998) A survey of 178 NF-Y binding CCAAT boxes.
Nucleic Acids Res., 26, 11351143.
Nucleic Acids Research, 2003, Vol. 31, No. 1 117
... Nucleotide frequency matrices within TSS in dicot and monocot plants and different promoters were described by Shahmuradov et al. [95]. The analysis of promoters revealed the sequence WnT/aC/tA/cw (−4 to +2) in 217 unrelated dicot plants, and aN-nCA (−2 to +3) in 70 unrelated monocots [95]. ...
... Nucleotide frequency matrices within TSS in dicot and monocot plants and different promoters were described by Shahmuradov et al. [95]. The analysis of promoters revealed the sequence WnT/aC/tA/cw (−4 to +2) in 217 unrelated dicot plants, and aN-nCA (−2 to +3) in 70 unrelated monocots [95]. Similarly, 171 TATA-box promoters showed T/cCAnM, while 130 TATA-less promoters indicated T/a/cYA/ca/c/tt/a/g (−2 to +3), where W=A or T, N=any nucleotide of A, C, T, or G, Y=C or T, and M=A or C [95]. ...
... The analysis of promoters revealed the sequence WnT/aC/tA/cw (−4 to +2) in 217 unrelated dicot plants, and aN-nCA (−2 to +3) in 70 unrelated monocots [95]. Similarly, 171 TATA-box promoters showed T/cCAnM, while 130 TATA-less promoters indicated T/a/cYA/ca/c/tt/a/g (−2 to +3), where W=A or T, N=any nucleotide of A, C, T, or G, Y=C or T, and M=A or C [95]. A general rule of TSS in A. thaliana and rice is the localization of Y (C or T) at −1 and R (A or G) at +1 [96]. ...
Article
Full-text available
This article examines the structure and functions of the plant synthetic promoters frequently used to precisely regulate complex regulatory routes. It details the composition of native promoters and their interacting proteins to provide a better understanding of the tasks associated with synthetic promoter development. The production of synthetic promoters is performed by relatively small libraries produced generally by basic molecular or genetic engineering methods such as cis-element shuffling or domain swapping. The article also describes the preparation of large-scale libraries supported by synthetic DNA fragments, directed evolution, and machine or deep-learning methodologies. The broader application of novel, synthetic promoters reduces the prevalence of homology-based gene silencing or improves the stability of transgenes. A particularly interesting group of synthetic promoters are bidirectional forms, which can enable the expression of up to eight genes by one regulatory element. The introduction and controlled expression of several genes after one transgenic event strongly decreases the frequency of such problems as complex segregation patterns and the random integration of multiple transgenes. These complications are commonly observed during the transgenic crop development enabled by traditional, multistep transformation using genetic constructs containing a single gene. As previously tested DNA promoter fragments demonstrate low complexity and homology, their abundance can be increased by using orthogonal expression systems composed of synthetic promoters and trans-factors that do not occur in nature or arise from different species. Their structure, functions, and applications are rendered in the article. Among them are presented orthogonal systems based on transcription activator-like effectors (dTALEs), synthetic dTALE activated promoters (STAPs) and dCas9-dependent artificial trans-factors (ATFs). Synthetic plant promoters are valuable tools for providing precise spatiotemporal regulation and introducing logic gates into the complex genetic traits that are important for basic research studies and their application in crop plant development. Precisely regulated metabolic routes are less prone to undesirable feedback regulation and energy waste, thus improving the efficiency of transgenic crops.
... The newest version of PLAZA contains millions of genes with functional annotations, although they are not suitable for direct utilization without experimental verification (Van Bel et al., 2022). PPDB, PlantCARE, and PlantProm provide numerous regulatory elements with known functions (Lescot et al., 2002;Shahmuradov et al., 2003;Kusunoki and Yamamoto, 2017); however, they lack experimentally validated quantitative strength information. Researchers are likely to find it inconvenient to collect, evaluate, and organize scattered bioparts from individual sources for the design and construction of plant synthetic biosystems. ...
... Currently, most existing plant databases, such as PlantGDB (Dong et al., 2004), PlantProm DB (Shahmuradov et al., 2003), and PPDB (Kusunoki and Yamamoto, 2017), provide homologbased annotations of potential functional sequences. However, these sequences lack experimental validation and cannot easily be used for synthetic design. ...
... In contrast to databases that contain well-defined but monotypic bioparts, such as plant promoters in the PlantProm DB (Shahmuradov et al., 2003), UGTs in the pUGTdb , and CYPs in the PCPD , the PSBD integrates various types of bioparts to meet a wider range of needs in plant synthetic biology research (Table 1). Users can find both catalytic enzymes for building metabolic pathways and regulatory elements for refining gene expression. ...
Article
Full-text available
Plant synthetic biology research requires diverse bioparts that facilitate the redesign and construction of new-to-nature biological devices or systems in plants. Limited by few well-characterized bioparts for plant chassis, the development of plant synthetic biology lags behind that of its microbial counterpart. Here, we constructed a web-based Plant Synthetic BioDatabase (PSBD), which currently categorizes 1677 catalytic bioparts and 384 regulatory elements and provides information on 309 species and 850 chemicals. Online bioinformatics tools including local BLAST, chem similarity, phylogenetic analysis, and visual strength are provided to assist with the rational design of genetic circuits for manipulation of gene expression in planta. We demonstrated the utility of the PSBD by functionally characterizing taxadiene synthase 2 and its quantitative regulation in tobacco leaves. More powerful synthetic devices were then assembled to amplify the transcriptional signals, enabling enhanced expression of flavivirus non-structure 1 proteins in plants. The PSBD is expected to be an integrative and user-centered platform that provides a one-stop service for diverse applications in plant synthetic biology research.
... PlantProm DB is a database and can be accessed at (http://mendel.cs.rhul.ac.uk/) [36]. PlantProm DB, a database of promoter sequences for RNA polymerase II, comprised of well-characterized not-redundant group of proximal promoter sequences. ...
Article
Full-text available
Bioinformatics plays a role, in the field of plant science today. With an increase in data volume, there is a growing demand for tools and methods for managing, visualizing, implementing, evaluating, modeling, and predicting this data. However many biology researchers may lack familiarity with the bioinformatics resources, which can lead to missed opportunities and misinterpretation of the data. In this review article, we highlighted the web resources that offer analysis capabilities for plant research data including genomics, transcriptomics, comparative genomics, bio-ontologies, sequence and structural comparisons plant disease related databases well as proteomics databases. Additionally we provide insights into integrated modules found within these resources that are specifically tailored for analyzing plant associated data. Overall this review aims to assist plant researchers in accessing bioinformatics resources for their data analysis needs while promoting the use of bioinformatics tools to effectively address experimental challenges, within the field of plant sciences.
... Templates included cDNA, mRNA, and gDNA (genomic DNA of W. cibaria). The promoter and terminator analysis of PemKI were performed by BPROM program (Shahmuradov, 2003). ...
Article
Full-text available
The toxin-antitoxin (TA) system plays a key role in bacteria escaping antibiotic stress with persistence, however, the mechanisms by which persistence is controlled remain poorly understood. Weissella cibaria, a novel probiotic, can enters a persistent state upon encountering ciprofloxacin stress. Conversely, it resumes from the persistence when ciprofloxacin stress is relieved or removed. Here, it was found that PemIK TA system played a role in transitioning between these two states. And the PemIK was consisted of PemK, an endonuclease toxic to mRNA, and antitoxin PemI which neutralized its toxicity. The PemK specifically cleaved the U↓AUU in mRNA encoding enzymes involved in glycolysis, TCA cycle and respiratory chain pathways. This cleavage event subsequently disrupted the crucial cellular processes such as hydrogen transfer, electron transfer, NADH and FADH2 synthesis, ultimately leading to a decrease in ATP levels and an increase in membrane depolarization and persister frequency. Notably, Arg24 was a critical active residue for PemK, its mutation significantly reduced the mRNA cleavage activity and the adverse effects on metabolism. These insights provided a clue to comprehensively understand the mechanism by which PemIK induced the persistence of W. cibaria to escape ciprofloxacin stress, thereby highlighting another novel aspect PemIK respond for antibiotic stress.
... Nucleotide frequency matrices within TSS in dicot and monocot plants and different promoters were described by Shahmuradov et al. (2003) [95]. Analysis of promoters from 217 unrelated dicot plants revealed the sequence WnT/aC/tA/cw (-4 to +2), while in 70 unrelated monocot plant promoters the TSS sequence was aNnCA (-2 to +3) [95]. In the same way, the 171 TATA-box promoters showed T/cCAnM, while 130 TATA-less promoters indicated T/a/cYA/ca/c/ /a/g (-2 to +3), where W=A or T, N=any nucleotide of A, C, T, or G, Y=C or T, and M=A or C [95]. ...
Preprint
Full-text available
The article features the structure and functions of plant synthetic promoters frequently exercised to precisely regulate complex regulatory routes. The composition of plant native promoters together with interacting proteins is presented to provide a better understanding of tasks associated with synthetic promoter development. The production of synthetic promoters is conferred on relatively small libraries produced generally by basic molecular or genetic engineering methods such as cis-element shuffling or domain swapping. Moreover, the preparation of large-scale libraries supported by synthetic DNA fragments, directed evolution, and machine or deep learning methodologies is presented. A particularly interesting group of synthetic promoters are bidirectional forms that enable the putative expression of up to 6–8 genes by one regulatory element. The introduction and controlled expression of several genes after one transgenic event strongly decreases the frequency of such problems as complex segregation patterns and random integration of multiple transgenes. These complications are commonly observed during transgenic crop development through traditional, multistep transformation by genetic constructs containing a single gene. Another path to solving problems associated with the low complexity and homology of already tested DNA fragments is through orthogonal expression systems composed of synthetic promoters and trans-factors that do not occur in nature or arise from different species. Their structure, functions, and applications are rendered in the article.
... (accessed on 20 August 2023)) [37] databases. The gene transcription start site (TSS) and promoter region were indicated using the TSSPlant and TSS tools PlantProm DB database [38] (accessed on 10 October 2023). Nucleotide sequence analysis, primer design, and localization of regulatory elements were performed with Vector NTI Advance 9.0 (Invitrogen, Waltham, MA, USA) and visualized ( Figure S1). ...
Article
Full-text available
The coordination of activities between nuclei and organelles in plant cells involves information exchange, in which phytohormones may play essential roles. Therefore, the dissection of the mechanisms of hormone-related integration between phytohormones and mitochondria is an important and challenging task. Here, we found that inputs from multiple hormones may cause changes in the transcript accumulation of mitochondrial-encoded genes and nuclear genes encoding mitochondrial (mt) proteins. In particular, treatments with exogenous hormones induced changes in the GUS expression in the reporter line possessing a 5′-deletion fragment of the RPOTmp promoter. These changes corresponded in part to the up- or downregulation of RPOTmp in wild-type plants, which affects the transcription of mt-encoded genes, implying that the promoter fragment of the RPOTmp gene is functionally involved in the responses to IAA (indole-3-acetic acid), ACC (1-aminocyclopropane-1-carboxylic acid), and ABA (abscisic acid). Hormone-dependent modulations in the expression of mt-encoded genes can also be mediated through mitochondrial transcription termination factors 15, 17, and 18 of the mTERF family and genes for tetratricopeptide repeat proteins that are coexpressed with mTERF genes, in addition to SWIB5 encoding a mitochondrial SWI/SNF (nucleosome remodeling) complex B protein. These genes specifically respond to hormone treatment, displaying both negative and positive regulation in a context-dependent manner. According to bioinformatic resources, their promoter region possesses putative cis-acting elements involved in responses to phytohormones. Alternatively, the hormone-related transcriptional activity of these genes may be modulated indirectly, which is especially relevant for brassinosteroids (BS). In general, the results of this study indicate that hormones are essential mediators that are able to cause alterations in the transcript accumulation of mt-related nuclear genes, which, in turn, trigger the expression of mt genes.
... CPE-DB [130] Animal-eRNAdb [131] Promoter EPD [118] PlantProm (plant promoter) [132] TransGene Promoters, TGP [133] Osteo-Promoter Database (OPD) skeletal cells [134] Osiris [135] TiProD [136] PromoterCAD (mammalian promoter/enhancer) [137] EPDNew [138] PPD [139] Methods on EPI study identifying an enhancer or promoter, the above methods either improve the input layer of DNA feature vector representation (for example, dna2vec) or neural network architectures or change the activation functions. Tables 1 and 2 list the available deep-learning-based methods for detecting enhancers and promoters. ...
Article
Full-text available
Background As parts of the cis‐regulatory mechanism of the human genome, interactions between distal enhancers and proximal promoters play a crucial role. Enhancers, promoters, and enhancer‐promoter interactions (EPIs) can be detected using many sequencing technologies and computation models. However, a systematic review that summarizes these EPI identification methods and that can help researchers apply and optimize them is still needed. Results In this review, we first emphasize the role of EPIs in regulating gene expression and describe a generic framework for predicting enhancer‐promoter interaction. Next, we review prediction methods for enhancers, promoters, loops, and enhancer‐promoter interactions using different data features that have emerged since 2010, and we summarize the websites available for obtaining enhancers, promoters, and enhancer‐promoter interaction datasets. Finally, we review the application of the methods for identifying EPIs in diseases such as cancer. Conclusions The advance of computer technology has allowed traditional machine learning, and deep learning methods to be used to predict enhancer, promoter, and EPIs from genetic, genomic, and epigenomic features. In the past decade, models based on deep learning, especially transfer learning, have been proposed for directly predicting enhancer‐promoter interactions from DNA sequences, and these models can reduce the parameter training time required of bioinformatics researchers. We believe this review can provide detailed research frameworks for researchers who are beginning to study enhancers, promoters, and their interactions.
... [36] databases. The gene transcription start site (TSS) and promoter region were indicated using the TSSPlant and TSS tools PlantProm DB database [37]. Nucleotide sequence analysis, primer design and localization of regulatory elements were performed with Vector NTI Advance 9.0 (Invitrogen) and visualized ( Figure S1). ...
Preprint
Full-text available
Coordination of activities between nuclei and organelles in plant cells involves information exchange, in which phytohormones may play an essential role. Therefore, dissection of the mechanisms of hormone-related integration between phytohormones and mitochondria is an important and challenging task. Here, we found that inputs from multiple hormones may cause changes in transcript accumulation of mitochondrial-encoded genes and nuclear genes encoding mitochondrial (mt) proteins. In particular, treatments with exogenous hormones induced changes in GUS expression in the reporter line possessing a 5'-deletion fragment of the RPOTmp promoter. These changes corresponded in part to up- or downregulation of RPOTmp in wild-type plants, which affected the transcription of mt-encoded genes, implying that promoter fragments of the RPOTmp gene are functionally involved in responses to IAA (indole-3-acetic acid), ACC (1-aminocyclopropane-1-carboxylic acid), and ABA (abscisic acid). Hormone-dependent modulations in the expression of mt-encoded genes can also be mediated through mitochondrial transcription termination factors 15, 17, and 18 of the mTERF family and genes for tetratricopeptide repeat proteins that are coexpressed with mTERF genes, in addition to SWIB5 encoding a mitochondrial SWI/SNF (nucleosome remodelling) complex B protein. These genes specifically responded to hormone treatment, displaying both negative and positive regulation in a context-dependent manner. According to bioinformatic resources, their promoter regions possess putative cis-acting elements involved in responses to phytohormones. Alternatively, hormone-related transcriptional activity of these genes may be modulated indirectly, which is especially relevant for brassinosteroids (BS). In general, the results of the study indicate that hormones are essential mediators that are able to cause alterations in the transcript accumulation of mt-related nuclear genes, which in turn trigger the expression of mt genes.
Article
Leaves are vital organs for photosynthesis, gas exchange, and light energy absorption in plants and play a crucial role in synthesizing essential nutrients. Watermelon is a widely cultivated crop worldwide and has a range of leaf shapes, including non-lobed (entire), semi-lobed, and highly lobed varieties. However, few studies have focused on lobed leaf traits in watermelons. Therefore, our study aimed to elucidate the genetic basis of leaf lobing regulation in watermelons, focusing on lines cultivated in Korea. We analyzed the F2 progeny derived from crosses between the highly lobed SIT463ST and the non-lobed PS137 watermelon line. We discovered that lobed leaf traits in watermelons are controlled by a single incomplete dominant gene. Additionally, we used next-generation sequencing-based bulked segregant analysis sequencing to identify a candidate genomic region spanning 24.06–24.09 Mb (25 kb) on Chr. 4. Three single nucleotide polymorphism markers (LL-4-238, LL-4-956, and LL-4-252) were developed that were closely associated with the non-lobed leaf margin shape in the F2 population. Analysis of the expression levels of these genes in the fine mapping region revealed only synonymous substitutions in the coding regions, suggesting that mutations in potential promoter regions may regulate gene expression, thereby affecting leaf morphology. These findings provide valuable insights into the molecular factors underlying non-lobed leaf traits in watermelons and offer potential targets for genetic improvement.
Chapter
Full-text available
Crop improvement refers to the systematic approach of discovering and selecting plants that possess advantageous alleles for specific target genes. The foundation of crop improvement initiatives typically relies on the fundamental concepts of genetic diversity and the genetic architecture of agricultural plants. Allele mining is a contemporary and efficacious technique utilized for the identification of naturally occurring allelic variations within genes that exhibit advantageous characteristics. Consequently, the utilization of allele mining has significant potential as a feasible approach for enhancing crop-related endeavors. The gene pool of a plant exhibits a substantial degree of genetic variety, characterized by the presence of a multitude of mechanism genes. The utilization of genetic variants for the detection and separation of novel alleles of genes that display favorable traits from the current gene pool, and their subsequent incorporation into the development of improved cultivars through the application of marker-assisted selection, is of utmost importance.
Article
Full-text available
ooTFD is an object-oriented database for the representation of information pertaining to transcription factors, the proteins and biochemical entities which play a central role in the regulation of gene expression. Given the recent explosion of genome sequence information, and that a large percentage of proteins encoded by fully sequenced genomes fall into this category, information pertaining to this class of molecules may become an essential aspect of biology and of genomics in the 21st century. In the past year, there was a small increase in the size of this database, and a number of new tools to facilitate data access and analysis have been added at the MIRAGE (Molecular Informatics Resource for the Analysis of Gene Expression) web site. ooTFD and associated tools and resources can be accessed at http://www.ifti.org/
Article
Full-text available
PLACE (http://www.dna.affrc.go.jp/htdocs/PLACE/) is a database of motifs found in plant cis-acting regulatory DNA elements, all from previously published reports. It covers vascular plants only. In addition to the motifs originally reported, their variations in other genes or in other plant species reported later are also compiled. The PLACE database also contains a brief description of each motif and relevant literature with PubMed ID numbers. This report summarizes the present status of this database and available tools.
Article
Full-text available
Place (http://www.dna.affrc.go.jp/htdocs/PLACE/) is a database of nucleotide sequence motifs found in plant cis-acting regulatory DNA elements. Motifs were extracted from previously published reports on genes in vascular plants. In addition to the motifs originally reported, their variations in other genes or in other plant species in later reports are also compiled. Documents for each motif in the PLACE database contains, in addition to a motif sequence, a brief definition and description of each motif, and relevant literature with PubMed ID numbers and GenBank accession numbers where available. Users can search their query sequences for cis-elements using the Signal Scan program at our web site. The results will be reported in one of the three forms. Clicking the PLACE accession numbers in the result report will open the pertinent motif document. Clicking the PubMed or GenBank accession number in the document will allow users to access to these databases, and to read the abstract of the literature or the annotation in the DNA database. This report summarizes the present status of this database and available tools.
Article
Full-text available
The TRANSFAC database on transcription factors and their DNA-binding sites and profiles (http://www.gene-regulation.de/) has been quantitatively extended and supplemented by a number of modules. These modules give information about pathologically relevant mutations in regulatory regions and transcription factor genes (PathoDB), scaffold/matrix attached regions (S/MARt DB), signal transduction (TRANSPATH) and gene expression sources (CYTOMER). Altogether, these distinct database modules constitute the TRANSFAC system. They are accompanied by a number of program routines for identifying potential transcription factor binding sites or for localizing individual components in the regulatory network of a cell.
Article
Full-text available
Originating from COMPEL, the TRANSCompel® database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH™ for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html.
Article
The TRANSFAC database on transcription factors and their DNA-binding sites and profiles (http://www.gene-regulation.de/) has been quantitatively extended and supplemented by a number of modules. These modules give information about pathologically relevant mutations in regulatory regions and transcription factor genes (PathoDB), scaffold/matrix attached regions (S/MARt DB), signal transduction (TRANSPATH) and gene expression sources (CYTOMER). Altogether, these distinct database modules constitute the TRANSFAC system. They are accompanied by a number of program routines for identifying potential transcription factor binding sites or for localizing individual components in the regulatory network of a cell.
Article
An Expectation Maximization algorithm for identification of DNA binding sites is presented. The approach predicts the location of binding regions while allowing variable length spacers within the sites. In addition to predicting the most likely spacer length for a set of DNA fragments, the method identifies individual sites that differ in spacer size. No alignment of DNA sequences is necessary. The method is illustrated by application to 231 Escherichia coli DNA fragments known to contain promoters with variable spacings between their consensus regions. Maximum-likelihood tests of the differences between the spacing classes indicate that the consensus regions of the spacing classes are not distinct. Further tests suggest that several positions within the spacing region may contribute to promoter specificity.
Article
The CCAAT box is one of the most common elements in eukaryotic promoters, found in the forward or reverse orientation. Among the various DNA binding proteins that interact with this sequence, only NF-Y (CBF, HAP2/3/4/5) has been shown to absolutely require all 5 nt. Analysis of a database with 178 bona fide NF-Y binding sites in 96 unrelated promoters confirms this need and points to specific additional flanking nucleotides (C, Pu, Pu on the 5′-side and C/G, A/G, G, A/C, G on the 3′-side) required for efficient binding. The frequency of CCAAT boxes appears to be relatively higher in TATA-less promoters, particularly in the reverse ATTGG orientation. In TATA-containing promoters the CCAAT box is preferentially located in the −80/−100 region (mean position −89) and is not found nearer to the Start site than −50. In TATA-less promoters it is usually closer to the +1 signal (at −66 on average) and is sometimes present in proximity to the Cap site. The consensus and location of NF-Y binding sites parallel almost perfectly a previous general statistical study on CCAAT boxes in 502 unrelated promoters. This is an indication that NF-Y is the major, if not the sole, CCAAT box recognizing protein and that it might serve different roles in TATA-containing and TATA-less promoters.
Article
ooTFD (object-oriented Transcription Factors Database) is an object-oriented successor to TFD. This database is aimed at capturing information regarding the polypeptide interactions which comprise and define the properties of transcription factors. ooTFD contains information about transcription factor binding sites, as well as composite relationships within transcription factors, which frequently occur as multisubunit proteins that form a complex interface to cellular processes outside the transcription machinery through protein–protein interactions. In the past year, a few additions and changes were made to this database and associated tools, which are accessible through the IFTI-MIRAGE web site at http://www.ifti.org/