ArticlePDF Available

PlantProm: A database of plant promoter sequences

February 2003
Nucleic Acids Research 31(1):114-7

February 2003
31(1):114-7

DOI:10.1093/nar/gkg041

Source
PubMed

Authors:

Ilham Ayub Shahmuradov

Institute of Molecular Biology and Biotechnologies, ANAS

Alex Gammerman

Royal Holloway, University of London

John M Hancock

University of Ljubljana

Peter Bramley

Royal Holloway

Show all 5 authorsHide

PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DB contains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions (−200 : +51) with TSS on the fixed position +201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support ftp: vector machine approach for promoter identification is under development. PlantProm DB is available at http://mendel.cs.rhul.ac.uk/ and http://www.softberry.com/.

. Nucleotide frequencies matrix for TATA box from 171 unrelated plant promoters a

…

. Nucleotide frequencies matrix for CCAAT box from 131 unrelated plant promoters a

…

. Nucleotide frequencies matrix for a TSS-motif from 217 unrelated dicot plants' promoters a

…

Figures - uploaded by John M Hancock

Content may be subject to copyright.

Content uploaded by John M Hancock

Content may be subject to copyright.

PlantProm: a database of plant promoter sequences

Ilham A. Shahmuradov, Alex J. Gammerman, John M. Hancock, Peter M. Bramley

and

Victor V. Solovyev

Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, TW20 0EX, UK,

School of

Biological Sciences, Royal Holloway, University of London, UK and

Softberry Inc., 116 Radio Circle, Suite 400, Mount

Kisco, NY 10549, USA

Received August 15, 2002, Revised September 25, 2002, Accepted October 2, 2002

ABSTRACT

PlantProm DB, a plant promoter database, is an

annotated, non-redundant collection of proximal

promoter sequences for RNA polymerase II with

experimentally determined transcription start

site(s), TSS, from various plant species. The ﬁrst

release (2002.01) of PlantProm DB contains 305

entries including 71, 220 and 14 promoters from

monocot, dicot and other plants, respectively. It

provides DNA sequence of the promoter regions

(200 : þ51) with TSS on the ﬁxed position þ201,

taxonomic/promoter type classiﬁcation of promo-

ters and Nucleotide Frequency Matrices (NFM) for

promoter elements: TATA-box, CCAAT-box and

TSS-motif (Inr). Analysis of TSS-motifs revealed

that their composition is different in dicots and

monocots, as well as for TATA and TATA-less

promoters. The database serves as learning set in

developing plant promoter prediction programs.

One such program (TSSP) based on discriminant

analysis has been created by Softberry Inc. and the

application of a support vector machine approach

for promoter identiﬁcation is under development.

PlantProm DB is available at http://mendel.cs.

rhul.ac.uk/ and http://www.softberry.com/.

INTRODUCTION

Draft nuclear genome sequences of Arabidopsis thaliana (1)

and Oryza sativa (2,3), representing dicotyledonous and

monocotyledonous higher plants, respectively, have been

published. In addition, the putative gene contents of these

genomes, predicted mostly by computer methods, are available

(2,3,4; ftp://ftp.ncbi.nih.gov/genbank/genomes/A_thaliana;

http://www.tigr.org/tdb/e2k1/ath1; http://mendel.cs.rhul.ac.uk/

Arabidopsis). However, as both computer programs and

experimental approaches for gene discovery have known

limitations, we are still far from a ﬁne picture of genome

architecture. In particular, for all widely used gene prediction

methods, one of the difﬁculties is accurate detection of the ﬁrst

(non-coding or partially coding) exon. The most accurate

approach to solve this problem is to use information on full-

length cDNAs. Unfortunately, no such information is available

for most plant genes. Therefore, as well as being of special

importance in understanding the regulation of gene expression,

identiﬁcation of plant promoters may serve as an essential

element in gene annotation as well as in developing

computational promoter prediction approaches. Currently,

promoter identiﬁcation is one of the most challenging

problems in computational biology.

The term ‘promoter’ is used to designate a region in the

genome sequence upstream of a gene transcription start site

(TSS), although sequences downstream of TSS may also affect

transcription initiation. Promoter elements select the transcrip-

tion initiation point, transcription speciﬁcity and rate.

Depending on the distance from the TSS, the terms of

‘proximal promoter’ (several hundreds nucleotides around the

TSS) and ‘distal promoter’ (thousands and more nucleotides

upstream of the TSS) are also used. Both proximal and distal

promoters include sets of various elements participating in the

complex process of cell-, issue-, organ-, developmental stage-

and environmental factors-speciﬁc regulation of transcription.

Most promoter elements regulating TSS selection are localized

in the proximal promoter.

To date, there are a number of databases with information on

cis-acting elements that control the transcription initiation by

binding corresponding nuclear factors. These include

TRANSFAC (5), TRRD (6), ooTFD (7), COMPEL (8),

PlantCARE (9), PLACE (10) and RegSite (http://softberry.

com). The last three databases are plant-oriented collections of

transcription regulatory elements. The Eukaryotic Promoter

Database (EPD) is only established collection of sequences of

eukaryotic Pol II promoters (11). The latest release (#71)

includes a total of 1402 entries, mainly of promoters from

animals, with only about 200 from plant species.

In the course of development of a new computer method for

predicting Pol II promoters of plant genes, we have collected

Pol II promoter sequences from various plants. These data are

incorporated on a new bioinformatics web server (http://

*To whom correspondence should be addressed. Email: victor@softberry.com

Present address:

John M. Hancock, MRC Mammalian Genetics Unit, Harwell, Oxfordshire, UK

114–117 Nucleic Acids Research, 2003, Vol. 31, No. 1

2003 Oxford University Press

DOI: 10.1093/nar/gkg041

www.mendel.cs.rhul.ac.uk) developed by the Department of

Computer Science at Royal Holloway, University of London,

in collaboration with Softberry Inc. (USA). It is designed to

present information about plant genomes, genes and new

approaches to their analysis. This article describes the criteria

used for the promoter data collecting procedure, speciﬁc

features of plant promoter sequences and Plant Promoter

Database (PlantProm DB).

Description of PlantProm DB

Criteria for selecting promoter sequences. For collecting

plant gene promoters the following rules were followed.

(i) There is experimental evidence of the TSS position(s) of

the gene, published in the literature. For genes with

multiple TSSs the nearest to the CDS start position is

taken, if no additional information on the predominance

of one of them is available (positions of other TSSs are

given in the name line of the sequence written in the

FASTA format).

(ii) The length of known promoter sequence upstream of

chosen TSS is 200 bp or more; all stored promoter

sequences are the same length, 251 bp, where the position

201 corresponds to the TSS, i.e. collected sequences

occupy the region (200 : þ51), with the TSS in the

position þ1, and, thus, present proximal promoters

mentioned above.

(iii) An entry corresponds to the gene mapped on the genomic

sequences.

(iv) Various alleles of a gene are presented in the database by a

single entry.

(v) Genes with more than one non-allelic copy in the genome

as well as paralogous genes are taken as different entries.

Information content of the database

The annotated, non-redundant PlantProm DBL (release

2002.01) has 305 entries including 71, 220 and 14 promoters

for RNA polymerase II from monocot, dicot and other plants,

respectively. It provides the following information on plant

promoters with experimentally known transcription start

site(s):

(i) DNA sequence of the promoter region (200 : þ51);

(ii) Nucleotide Frequency Matrices (NFM) for canonical

promoter elements (TATA-box, CCAAT-box and TSS-

motif or Initiator element, Inr);

(iii) Taxonomic and promoter type classiﬁcation of promoters.

To compute nucleotide frequency matrices for various

promoter elements, a pairwise comparison of a region

[50 : þ1) of 305 plant promoters has been performed and

one of the couple of promoters showing more than 90%

homology has been excluded from the initial collection. As a

result, 4 promoters were excluded and are denoted by

‘Excluded’ in the name line of these promoters sequences.

In simple implementation of Expectation Maximization

(EM) algorithm (12), we considered the sequence of motif

X ¼ (x

, ..., x

), where l is the motif length. If P

) is the

empiric frequency of the nucleotide x

in position i (computed

on previous iteration), then the weight of this motif is

computed as

W ðX Þ¼

log

ðx

0:25

Using the EM procedure for 10 iterations, the initial

collection of 305 (301 unrelated) promoters was divided into

the 2 classes: 175 (171 unrelated) TATA promoters and 130

TATA-less promoters. In calculations of TATA matrices, the

Table 1. Nucleotide frequencies matrix for TATA box from 171 unrelated plant promoters

<2 <112345678>1 >2

A 0.28 0.16 0.03 0.95 0.00 1.00 0.62 0.97 0.38 0.73 0.13 0.30

C 0.27 0.63 0.01 0.00 0.04 0.00 0.00 0.00 0.01 0.08 0.42 0.42

G 0.17 0.05 0.00 0.00 0.00 0.00 0.00 0.02 0.00 0.10 0.28 0.16

T 0.28 0.16 0.96 0.05 0.96 0.00 0.38 0.01 0.61 0.09 0.18 0.11

c T A T A A/T A T/A A

The mean distance between TATA box and TSS is 26 bp.

Table 2. Nucleotide frequencies matrix for CCAAT box from 131 unrelated plant promoters

<4 <3 <2 <112345>1 >2 >3 >4

A 0.31 0.34 0.27 0.30 0.31 0.00 1.00 1.00 0.00 0.28 0.32 0.29 0.40

C 0.19 0.17 0.16 0.18 0.34 1.00 0.00 0.00 0.00 0.20 0.20 0.25 0.17

G 0.20 0.20 0.27 0.21 0.15 0.00 0.00 0.00 0.00 0.20 0.18 0.15 0.15

T 0.30 0.29 0.30 0.31 0.20 0.00 0.00 0.00 1.00 0.32 0.30 0.31 0.28

n CAAT

The mean distance between CCAAT box and TSS is 75 bp.

Nucleic Acids Research, 2003, Vol. 31, No. 1 115

allowed variation of a distance between the right boundary of

the TATA-core box and the TSS was 18: 7 40 bp and only

TATAWAWA-core was used for calculating the weight. As an

initial TATA-box matrix, the TATA-matrix computed for 134

plant promoters from EPD (http://www.epd.isb-sib.ch/) was

used. The computed TATA-matrix (Table 1) is in a good

agreement with the TATA-matrix from EPD.

For computation of the CCAAT-box matrix, we considered

the possible distance between the right boundary of CCAAT-

core and the TSS within 50: 7 100 bp. The CCAAT-core was

used for weight calculation and, in accordance with the

available data (13), CCAAT boxes were identiﬁed on both

DNA strands. The CAAT matrix is presented in Table 2.

The TSS-motif matrix of 5 bp in length has been computed,

where the 3rd nucleotide was the annotated (anTSS). No strong

consensus was revealed. When the EM approach was used to

analyze all possible pentanucleotides with an assumed TSS

(asTSS) location in the range (anTSS 7 2 : anTSS þ 2), it was

observed that the composition of asTSS-motifs is different in

dicot and monocot plants (Tables 3 and 4), as well as for TATA

and TATA-less promoters (Tables 5 and 6). This ﬁnding seems

to be a novel feature of plant promoters.

PlantProm DB, release 2002.01, is available at the web sites

http://mendel.cs.rhul.ac.uk and http://www.softberry.com. The

database will be regularly updated by collection and analysis of

new experimental data on plant promoters as it becomes

available in the literature. PlantProm DB serves as a learning

set in developing plant promoter prediction programs. One

such program (TSSP), based on discriminant analysis of

sequence features and plant regulatory motifs (RegSiteDB),

has been developed by Softberry Inc. (http://www.softberry.

com/berry.phtml?topic=promoter). The application of a sup-

port vector machine approach for promoter identiﬁcation is

under development.

ACKNOWLEDGEMENTS

PlantProm DB is funded by grant 111/BIO14428 ‘Pattern

Recognition Techniques for Gene Identiﬁcation in Plant

Genomic Sequences’, from the UK Biotechnology and

Biological Sciences Research Council (BBSRC) and is

designed and maintained at Royal Holloway, University of

London in collaboration with Softberry Inc. (USA).

REFERENCES

1. The Arabidopsis Genome Initiative (2000) Analysis of the genome

sequence of the ﬂowering plant Arabidopsis thaliana. Nature, 408,

796–815.

2. Yu,J., Hu,S., Wang,J., Wong,G.K., Li,S., Liu,B., Deng,Y., Dai,L., Zhou,Y.,

Zhang,X., Cao,M. et al. (2002) A draft sequence of the rice genome (Oryza

sativa L. ssp. indica). Science, 296,79–92.

3. Goff,S.A., Ricke,D., Lan,T.-H., Presting,G., Wang,R., Dunn,M.,

Glaze-brook,J., Sessions,A., Oeller,P., Varma,H., Hadley,D. et al. (2002)

A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).

Science, 296,92–100.

4. Schoof,H., Zaccaria,P., Gundlach,H., Lemcke,K., Rudd,S., Kolesov,G.,

Arnold,R., Mewes,H.W. and Mayer,K.F. (2002) MIPS Arabidopsis thaliana

Database (MAtDB): an integrated biological knowledge resource based on

the ﬁrst complete plant genome. Nucleic Acids Res., 30,91–93.

5. Wingender,E., Chen,X., Fricke,E., Geffers,R., Hehl,R., Liebich,I.,

Krull,M., Matys,V., Michael,H., Ohnha¨user,R., Pru¨ß,M., Schacherer,F.,

Thiele,S. and Urbach,S. (2001) The TRANSFAC system on gene

expression regulation. Nucleic Acids Res., 29, 281–283.

6. Kolchanov,N.A., Ignatieva,E.V., Ananko,E.A., Pdkolodnaya,O.A.,

Stepanenko,I.L., Merkulova,T.I., Pozdyakov,M.A., Podkolodnny,N.L.,

Naumochkin,A.N. and Romashchenko,A.G. (2002) Transcription

regulatory Regions Database (TRRD): its status in 2002. Nucleic Acids

Res., 30, 312–317.

7. Ghosh,D. (2000) Object-oriented Transcription Factors Database (ooTFD).

Nucleic Acids Res., 28, 308–310.

Table 3. Nucleotide frequencies matrix for a TSS-motif from 217 unrelated

dicot plants’ promoters

4 3 2 1 þ1 þ2 þ3 þ4

A 0.341 0.249 0.286 0.005 0.604 0.475 0.226 0.272

C 0.184 0.286 0.041 0.507 0.332 0.028 0.359 0.240

G 0.101 0.124 0.041 0.161 0.065 0.101 0.129 0.198

T 0.373 0.341 0.631 0.327 0.000 0.396 0.286 0.290

W n T/a C/t A/c w

In 75 cases, the high scoring TSS coincided with the annotated TSS.

Table 4. Nucleotide frequencies matrix for a TSS-motif from 70 unrelated

monocot plants’ promoters

4 3 2 1 þ1 þ2 þ3 þ4

A 0.114 0.214 0.557 0.157 0.186 0.000 0.871 0.143

C 0.443 0.286 0.114 0.386 0.314 0.786 0.114 0.371

G 0.186 0.200 0.143 0.257 0.200 0.143 0.014 0.171

T 0.257 0.300 0.186 0.200 0.300 0.071 0.000 0.314

aNnCA

In 17 cases, the high scoring TSS coincided with the annotated TSS.

Table 5. Nucleotide frequencies matrix for a TSS-motif from 171 unrelated

TATA promoters of plants

4 3 2 1 þ1 þ2 þ3 þ4

A 0.322 0.263 0.099 0.035 0.865 0.246 0.345 0.368

C 0.251 0.222 0.234 0.719 0.023 0.292 0.421 0.257

G 0.117 0.152 0.111 0.105 0.023 0.105 0.082 0.146

T 0.310 0.363 0.556 0.140 0.088 0.357 0.152 0.228

T/c C A n M

In 64 cases, the high scoring TSS coincided with the annotated TSS.

Table 6. Nucleotide frequencies matrix for a TSS-motif from 130 unrelated

TATA-less promoters of plants

4 3 2 1 þ1 þ2 þ3 þ4

A 0.385 0.215 0.262 0.023 0.554 0.438 0.331 0.231

C 0.231 0.246 0.231 0.315 0.323 0.292 0.015 0.262

G 0.146 0.200 0.000 0.269 0.123 0.054 0.208 0.215

T 0.238 0.338 0.508 0.392 0.000 0.215 0.446 0.292

T/a/c Y A/c a/c/t t/a/g

In 46 cases, the high scoring TSS coincided with the annotated TSS.

116 Nucleic Acids Research, 2003, Vol. 31, No. 1

8. Kel-Margoulis,O.V., Kel,A.E., Reuter,I., Deineko,I.V. and Wingender,E.

(2002) TRANSCompel: a database on composite regulatory elements in

eukaryotic genes. Nucleic Acids Res., 30, 332–334.

9. Lescot,M., De

hais,P., Thijs,G., Marchal,K., Moreau,Y., Van de Peer,Y.,

Rouze

,P. and Rombauts,P. (2002) PlantCARE, a database of plant

cis-acting regulatory elements and a portal to tools for in silico analysis of

promoter sequences. Nucleic Acids Res., 30, 325–327.

10. Higo,K., Ugawa,Y., Iwamoto,M. and Korenaga,T. (1999) Plant cis-acting

regulatory DNA elements (PLACE) database. Nucleic Acids Res., 27,

297–300.

11. Praz,V., Pe

rier,R., Bonnard,C. and Bucher,P. (2002) The eukaryotic

promoter database, EPD: new entry types and linkes to gene expression

data. Nucleic Acids Res., 30, 322–324.

12. Cardon,L. and Stormo,G. (1992) Expectation maximization algorithm for

identifying protein-binding sites with variable lengths from unaligned

DNA fragments. J. Mol. Biol., 5, 159–170.

13. Mantovani,R. (1998) A survey of 178 NF-Y binding CCAAT boxes.

Nucleic Acids Res., 26, 1135–1143.

Nucleic Acids Research, 2003, Vol. 31, No. 1 117

Plant Synthetic Promoters

Article

Full-text available

Jun 2024

This article examines the structure and functions of the plant synthetic promoters frequently used to precisely regulate complex regulatory routes. It details the composition of native promoters and their interacting proteins to provide a better understanding of the tasks associated with synthetic promoter development. The production of synthetic promoters is performed by relatively small libraries produced generally by basic molecular or genetic engineering methods such as cis-element shuffling or domain swapping. The article also describes the preparation of large-scale libraries supported by synthetic DNA fragments, directed evolution, and machine or deep-learning methodologies. The broader application of novel, synthetic promoters reduces the prevalence of homology-based gene silencing or improves the stability of transgenes. A particularly interesting group of synthetic promoters are bidirectional forms, which can enable the expression of up to eight genes by one regulatory element. The introduction and controlled expression of several genes after one transgenic event strongly decreases the frequency of such problems as complex segregation patterns and the random integration of multiple transgenes. These complications are commonly observed during the transgenic crop development enabled by traditional, multistep transformation using genetic constructs containing a single gene. As previously tested DNA promoter fragments demonstrate low complexity and homology, their abundance can be increased by using orthogonal expression systems composed of synthetic promoters and trans-factors that do not occur in nature or arise from different species. Their structure, functions, and applications are rendered in the article. Among them are presented orthogonal systems based on transcription activator-like effectors (dTALEs), synthetic dTALE activated promoters (STAPs) and dCas9-dependent artificial trans-factors (ATFs). Synthetic plant promoters are valuable tools for providing precise spatiotemporal regulation and introducing logic gates into the complex genetic traits that are important for basic research studies and their application in crop plant development. Precisely regulated metabolic routes are less prone to undesirable feedback regulation and energy waste, thus improving the efficiency of transgenic crops.

An Integrative Database and Its Application for Plant Synthetic Biology Research

Article

Full-text available

Jan 2024

Plant synthetic biology research requires diverse bioparts that facilitate the redesign and construction of new-to-nature biological devices or systems in plants. Limited by few well-characterized bioparts for plant chassis, the development of plant synthetic biology lags behind that of its microbial counterpart. Here, we constructed a web-based Plant Synthetic BioDatabase (PSBD), which currently categorizes 1677 catalytic bioparts and 384 regulatory elements and provides information on 309 species and 850 chemicals. Online bioinformatics tools including local BLAST, chem similarity, phylogenetic analysis, and visual strength are provided to assist with the rational design of genetic circuits for manipulation of gene expression in planta. We demonstrated the utility of the PSBD by functionally characterizing taxadiene synthase 2 and its quantitative regulation in tobacco leaves. More powerful synthetic devices were then assembled to amplify the transcriptional signals, enabling enhanced expression of flavivirus non-structure 1 proteins in plants. The PSBD is expected to be an integrative and user-centered platform that provides a one-stop service for diverse applications in plant synthetic biology research.

Journal of Population Therapeutics & Clinical Pharmacology EXPLORING PLANT RESEARCH WITH THE HELP OF VITAL BIOINFORMATICS DATABASES AND ONLINE RESOURCES

Article

Full-text available

Jan 2024

Bioinformatics plays a role, in the field of plant science today. With an increase in data volume, there is a growing demand for tools and methods for managing, visualizing, implementing, evaluating, modeling, and predicting this data. However many biology researchers may lack familiarity with the bioinformatics resources, which can lead to missed opportunities and misinterpretation of the data. In this review article, we highlighted the web resources that offer analysis capabilities for plant research data including genomics, transcriptomics, comparative genomics, bio-ontologies, sequence and structural comparisons plant disease related databases well as proteomics databases. Additionally we provide insights into integrated modules found within these resources that are specifically tailored for analyzing plant associated data. Overall this review aims to assist plant researchers in accessing bioinformatics resources for their data analysis needs while promoting the use of bioinformatics tools to effectively address experimental challenges, within the field of plant sciences.

PemK’s Arg24 is a crucial residue for PemIK toxin–antitoxin system to induce the persistence of Weissella cibaria against ciprofloxacin stress

Article

Full-text available

May 2024

The toxin-antitoxin (TA) system plays a key role in bacteria escaping antibiotic stress with persistence, however, the mechanisms by which persistence is controlled remain poorly understood. Weissella cibaria, a novel probiotic, can enters a persistent state upon encountering ciprofloxacin stress. Conversely, it resumes from the persistence when ciprofloxacin stress is relieved or removed. Here, it was found that PemIK TA system played a role in transitioning between these two states. And the PemIK was consisted of PemK, an endonuclease toxic to mRNA, and antitoxin PemI which neutralized its toxicity. The PemK specifically cleaved the U↓AUU in mRNA encoding enzymes involved in glycolysis, TCA cycle and respiratory chain pathways. This cleavage event subsequently disrupted the crucial cellular processes such as hydrogen transfer, electron transfer, NADH and FADH2 synthesis, ultimately leading to a decrease in ATP levels and an increase in membrane depolarization and persister frequency. Notably, Arg24 was a critical active residue for PemK, its mutation significantly reduced the mRNA cleavage activity and the adverse effects on metabolism. These insights provided a clue to comprehensively understand the mechanism by which PemIK induced the persistence of W. cibaria to escape ciprofloxacin stress, thereby highlighting another novel aspect PemIK respond for antibiotic stress.

Plant Synthetic Promoters

Preprint

Full-text available

Apr 2024

The article features the structure and functions of plant synthetic promoters frequently exercised to precisely regulate complex regulatory routes. The composition of plant native promoters together with interacting proteins is presented to provide a better understanding of tasks associated with synthetic promoter development. The production of synthetic promoters is conferred on relatively small libraries produced generally by basic molecular or genetic engineering methods such as cis-element shuffling or domain swapping. Moreover, the preparation of large-scale libraries supported by synthetic DNA fragments, directed evolution, and machine or deep learning methodologies is presented. A particularly interesting group of synthetic promoters are bidirectional forms that enable the putative expression of up to 6–8 genes by one regulatory element. The introduction and controlled expression of several genes after one transgenic event strongly decreases the frequency of such problems as complex segregation patterns and random integration of multiple transgenes. These complications are commonly observed during transgenic crop development through traditional, multistep transformation by genetic constructs containing a single gene. Another path to solving problems associated with the low complexity and homology of already tested DNA fragments is through orthogonal expression systems composed of synthetic promoters and trans-factors that do not occur in nature or arise from different species. Their structure, functions, and applications are rendered in the article.

Phytohormones as Regulators of Mitochondrial Gene Expression in Arabidopsis thaliana

Article

Full-text available

Nov 2023
INT J MOL SCI

The coordination of activities between nuclei and organelles in plant cells involves information exchange, in which phytohormones may play essential roles. Therefore, the dissection of the mechanisms of hormone-related integration between phytohormones and mitochondria is an important and challenging task. Here, we found that inputs from multiple hormones may cause changes in the transcript accumulation of mitochondrial-encoded genes and nuclear genes encoding mitochondrial (mt) proteins. In particular, treatments with exogenous hormones induced changes in the GUS expression in the reporter line possessing a 5′-deletion fragment of the RPOTmp promoter. These changes corresponded in part to the up- or downregulation of RPOTmp in wild-type plants, which affects the transcription of mt-encoded genes, implying that the promoter fragment of the RPOTmp gene is functionally involved in the responses to IAA (indole-3-acetic acid), ACC (1-aminocyclopropane-1-carboxylic acid), and ABA (abscisic acid). Hormone-dependent modulations in the expression of mt-encoded genes can also be mediated through mitochondrial transcription termination factors 15, 17, and 18 of the mTERF family and genes for tetratricopeptide repeat proteins that are coexpressed with mTERF genes, in addition to SWIB5 encoding a mitochondrial SWI/SNF (nucleosome remodeling) complex B protein. These genes specifically respond to hormone treatment, displaying both negative and positive regulation in a context-dependent manner. According to bioinformatic resources, their promoter region possesses putative cis-acting elements involved in responses to phytohormones. Alternatively, the hormone-related transcriptional activity of these genes may be modulated indirectly, which is especially relevant for brassinosteroids (BS). In general, the results of this study indicate that hormones are essential mediators that are able to cause alterations in the transcript accumulation of mt-related nuclear genes, which, in turn, trigger the expression of mt genes.

Computational methods for identifying enhancer‐promoter interactions

Article

Full-text available

Jun 2023

Background As parts of the cis‐regulatory mechanism of the human genome, interactions between distal enhancers and proximal promoters play a crucial role. Enhancers, promoters, and enhancer‐promoter interactions (EPIs) can be detected using many sequencing technologies and computation models. However, a systematic review that summarizes these EPI identification methods and that can help researchers apply and optimize them is still needed. Results In this review, we first emphasize the role of EPIs in regulating gene expression and describe a generic framework for predicting enhancer‐promoter interaction. Next, we review prediction methods for enhancers, promoters, loops, and enhancer‐promoter interactions using different data features that have emerged since 2010, and we summarize the websites available for obtaining enhancers, promoters, and enhancer‐promoter interaction datasets. Finally, we review the application of the methods for identifying EPIs in diseases such as cancer. Conclusions The advance of computer technology has allowed traditional machine learning, and deep learning methods to be used to predict enhancer, promoter, and EPIs from genetic, genomic, and epigenomic features. In the past decade, models based on deep learning, especially transfer learning, have been proposed for directly predicting enhancer‐promoter interactions from DNA sequences, and these models can reduce the parameter training time required of bioinformatics researchers. We believe this review can provide detailed research frameworks for researchers who are beginning to study enhancers, promoters, and their interactions.

Phytohormones as Regulators of Mitochondrial Gene Expression in Arabidopsis thaliana

Preprint

Full-text available

Nov 2023

Coordination of activities between nuclei and organelles in plant cells involves information exchange, in which phytohormones may play an essential role. Therefore, dissection of the mechanisms of hormone-related integration between phytohormones and mitochondria is an important and challenging task. Here, we found that inputs from multiple hormones may cause changes in transcript accumulation of mitochondrial-encoded genes and nuclear genes encoding mitochondrial (mt) proteins. In particular, treatments with exogenous hormones induced changes in GUS expression in the reporter line possessing a 5'-deletion fragment of the RPOTmp promoter. These changes corresponded in part to up- or downregulation of RPOTmp in wild-type plants, which affected the transcription of mt-encoded genes, implying that promoter fragments of the RPOTmp gene are functionally involved in responses to IAA (indole-3-acetic acid), ACC (1-aminocyclopropane-1-carboxylic acid), and ABA (abscisic acid). Hormone-dependent modulations in the expression of mt-encoded genes can also be mediated through mitochondrial transcription termination factors 15, 17, and 18 of the mTERF family and genes for tetratricopeptide repeat proteins that are coexpressed with mTERF genes, in addition to SWIB5 encoding a mitochondrial SWI/SNF (nucleosome remodelling) complex B protein. These genes specifically responded to hormone treatment, displaying both negative and positive regulation in a context-dependent manner. According to bioinformatic resources, their promoter regions possess putative cis-acting elements involved in responses to phytohormones. Alternatively, hormone-related transcriptional activity of these genes may be modulated indirectly, which is especially relevant for brassinosteroids (BS). In general, the results of the study indicate that hormones are essential mediators that are able to cause alterations in the transcript accumulation of mt-related nuclear genes, which in turn trigger the expression of mt genes.

Genetic analysis and identification of molecular markers associated with lobed leaf shape in watermelons using next generation sequencing

Article

Apr 2024

Leaves are vital organs for photosynthesis, gas exchange, and light energy absorption in plants and play a crucial role in synthesizing essential nutrients. Watermelon is a widely cultivated crop worldwide and has a range of leaf shapes, including non-lobed (entire), semi-lobed, and highly lobed varieties. However, few studies have focused on lobed leaf traits in watermelons. Therefore, our study aimed to elucidate the genetic basis of leaf lobing regulation in watermelons, focusing on lines cultivated in Korea. We analyzed the F2 progeny derived from crosses between the highly lobed SIT463ST and the non-lobed PS137 watermelon line. We discovered that lobed leaf traits in watermelons are controlled by a single incomplete dominant gene. Additionally, we used next-generation sequencing-based bulked segregant analysis sequencing to identify a candidate genomic region spanning 24.06–24.09 Mb (25 kb) on Chr. 4. Three single nucleotide polymorphism markers (LL-4-238, LL-4-956, and LL-4-252) were developed that were closely associated with the non-lobed leaf margin shape in the F2 population. Analysis of the expression levels of these genes in the fine mapping region revealed only synonymous substitutions in the coding regions, suggesting that mutations in potential promoter regions may regulate gene expression, thereby affecting leaf morphology. These findings provide valuable insights into the molecular factors underlying non-lobed leaf traits in watermelons and offer potential targets for genetic improvement.

Allele Mining and Development of Kompetitive Allele Specific PCR (KASP) Marker in Plant Breeding

Chapter

Full-text available

Nov 2023

Crop improvement refers to the systematic approach of discovering and selecting plants that possess advantageous alleles for specific target genes. The foundation of crop improvement initiatives typically relies on the fundamental concepts of genetic diversity and the genetic architecture of agricultural plants. Allele mining is a contemporary and efficacious technique utilized for the identification of naturally occurring allelic variations within genes that exhibit advantageous characteristics. Consequently, the utilization of allele mining has significant potential as a feasible approach for enhancing crop-related endeavors. The gene pool of a plant exhibits a substantial degree of genetic variety, characterized by the presence of a multitude of mechanism genes. The utilization of genetic variants for the detection and separation of novel alleles of genes that display favorable traits from the current gene pool, and their subsequent incorporation into the development of improved cultivars through the application of marker-assisted selection, is of utmost importance.

Object oriented Transcription Factors Database (ooTFD)

Article

Full-text available

Jan 1999

David Ghosh

ooTFD is an object-oriented database for the representation of information pertaining to transcription factors, the proteins and biochemical entities which play a central role in the regulation of gene expression. Given the recent explosion of genome sequence information, and that a large percentage of proteins encoded by fully sequenced genomes fall into this category, information pertaining to this class of molecules may become an essential aspect of biology and of genomics in the 21st century. In the past year, there was a small increase in the size of this database, and a number of new tools to facilitate data access and analysis have been added at the MIRAGE (Molecular Informatics Resource for the Analysis of Gene Expression) web site. ooTFD and associated tools and resources can be accessed at http://www.ifti.org/

PLACE: a database of plant cis-Acting regulatory DNA elements

Article

Full-text available

Feb 1998

PLACE (http://www.dna.affrc.go.jp/htdocs/PLACE/) is a database of motifs found in plant cis-acting regulatory DNA elements, all from previously published reports. It covers vascular plants only. In addition to the motifs originally reported, their variations in other genes or in other plant species reported later are also compiled. The PLACE database also contains a brief description of each motif and relevant literature with PubMed ID numbers. This report summarizes the present status of this database and available tools.

Plant cis-acting regulatory DNA elements (PLACE) database: 1999

Article

Full-text available

Feb 1999

Place (http://www.dna.affrc.go.jp/htdocs/PLACE/) is a database of nucleotide sequence motifs found in plant cis-acting regulatory DNA elements. Motifs were extracted from previously published reports on genes in vascular plants. In addition to the motifs originally reported, their variations in other genes or in other plant species in later reports are also compiled. Documents for each motif in the PLACE database contains, in addition to a motif sequence, a brief definition and description of each motif, and relevant literature with PubMed ID numbers and GenBank accession numbers where available. Users can search their query sequences for cis-elements using the Signal Scan program at our web site. The results will be reported in one of the three forms. Clicking the PLACE accession numbers in the result report will open the pertinent motif document. Clicking the PubMed or GenBank accession number in the document will allow users to access to these databases, and to read the abstract of the literature or the annotation in the DNA database. This report summarizes the present status of this database and available tools.

The TRANSFAC system on gene expression regulation

Article

Full-text available

Feb 2001
NUCLEIC ACIDS RES

The TRANSFAC database on transcription factors and their DNA-binding sites and profiles (http://www.gene-regulation.de/) has been quantitatively extended and supplemented by a number of modules. These modules give information about pathologically relevant mutations in regulatory regions and transcription factor genes (PathoDB), scaffold/matrix attached regions (S/MARt DB), signal transduction (TRANSPATH) and gene expression sources (CYTOMER). Altogether, these distinct database modules constitute the TRANSFAC system. They are accompanied by a number of program routines for identifying potential transcription factor binding sites or for localizing individual components in the regulatory network of a cell.

TRANSCompel®: A database on composite regulatory elements in eukaryotic genes

Article

Full-text available

Feb 2002
NUCLEIC ACIDS RES

Originating from COMPEL, the TRANSCompel® database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor–DNA and factor–factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH™ for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html.

The TRANSFAC system on gene expression regulation

Article

Jan 2001
NUCLEIC ACIDS RES

E. Wingender

Analyse of the genome sequence of the flowering plant Arabidopsis thaliana

Article

Jan 2000

Expectation maximization algorithm for identifying protein-binding sites with variable lengths from unaligned DNA fragments

Article

Feb 1992

An Expectation Maximization algorithm for identification of DNA binding sites is presented. The approach predicts the location of binding regions while allowing variable length spacers within the sites. In addition to predicting the most likely spacer length for a set of DNA fragments, the method identifies individual sites that differ in spacer size. No alignment of DNA sequences is necessary. The method is illustrated by application to 231 Escherichia coli DNA fragments known to contain promoters with variable spacings between their consensus regions. Maximum-likelihood tests of the differences between the spacing classes indicate that the consensus regions of the spacing classes are not distinct. Further tests suggest that several positions within the spacing region may contribute to promoter specificity.

Mantovani R.. A survey of 178 NF-Y binding CCAAT boxes. Nucleic Acids Res 26: 1135-1143

Article

Apr 1998

Roberto Mantovani

The CCAAT box is one of the most common elements in eukaryotic promoters, found in the forward or reverse orientation. Among the various DNA binding proteins that interact with this sequence, only NF-Y (CBF, HAP2/3/4/5) has been shown to absolutely require all 5 nt. Analysis of a database with 178 bona fide NF-Y binding sites in 96 unrelated promoters confirms this need and points to specific additional flanking nucleotides (C, Pu, Pu on the 5′-side and C/G, A/G, G, A/C, G on the 3′-side) required for efficient binding. The frequency of CCAAT boxes appears to be relatively higher in TATA-less promoters, particularly in the reverse ATTGG orientation. In TATA-containing promoters the CCAAT box is preferentially located in the −80/−100 region (mean position −89) and is not found nearer to the Start site than −50. In TATA-less promoters it is usually closer to the +1 signal (at −66 on average) and is sometimes present in proximity to the Cap site. The consensus and location of NF-Y binding sites parallel almost perfectly a previous general statistical study on CCAAT boxes in 502 unrelated promoters. This is an indication that NF-Y is the major, if not the sole, CCAAT box recognizing protein and that it might serve different roles in TATA-containing and TATA-less promoters.

Object-oriented Transcription Factors Database (ooTFD)

Article

Feb 2000

David Ghosh

ooTFD (object-oriented Transcription Factors Database) is an object-oriented successor to TFD. This database is aimed at capturing information regarding the polypeptide interactions which comprise and define the properties of transcription factors. ooTFD contains information about transcription factor binding sites, as well as composite relationships within transcription factors, which frequently occur as multisubunit proteins that form a complex interface to cellular processes outside the transcription machinery through protein–protein interactions. In the past year, a few additions and changes were made to this database and associated tools, which are accessible through the IFTI-MIRAGE web site at http://www.ifti.org/

PlantProm: A database of plant promoter sequences

Abstract and Figures

Recommended publications

Texture based prelens tear film segmentation in interferometry images

Human pol II promoter prediction

Nucleosome positioning prediction in C. elegans based on increment of diversity combined with quadra...

Multiplex SNP Discrimination