Article

Analysis of the occurrence of promoter-Sites in DNA

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

We show that the occurrence and homology score (1) of promoter-sites in DNA depends upon the base composition of the DNA. We used simple probability theory to calculate the mean homology score expected for all promoter-sites that had a specific match in the canonical hexamers. By using the square root of this mean score as a measure of significance, we objectively classify all promoter-sites which are reported. We tested the theoretical approach in two ways. First, we used the program (PROMSEARCH) to analyze approximately 150,000 base pairs of random sequence DNA with different base compositions and we found excellent agreement with the theoretical predictions. Our second test was the analysis of a number of sequences drawn from the GENBANK DNA sequence database. We have analyzed 20 bacterial and bacteriophage sequences, which consisted of at least one operon, for promoter-sites. We found no absolute preference for promoter-sites within noncoding regions. We show the results of analyzing the phages lambda, T7 and fd, and the E. coli lac operon. The major known promoters in these sequences were all found correctly. We discuss the question of the location of a number of minor promoter-sites and show how PROMSEARCH can be used to help identify the correct location of the promoter. This approach can be applied to the search for any DNA site and should allow greater objectivity when comparing DNA sequences for meaningful subsequences.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Historically there have been a number of attempts to develop a promoter search algorithm which would allow for the efficient detection of Escherichia coli promoter sites within a large DNA sequence (Scherer et al., 1978;Galas et al., 1985;Mulligan et al., 1984;Mulligan and McClure, 1986). Many of these attempts were based largely or entirely on a specific degree of agreement with the -35 and -10 consensus sequences. ...
... Table I11 shows the result of testing five different 4000base random sequences for each of six distinct average base compositions, ranging from 40 to 80% A-T. As Mulligan and McClure (1986) pointed out, a promoter search protocol is likely to be highly sensitive to the average A-T ratio in the input sequence due to the A-T richness of promoters. The production, at random, of sequence satisfying the test criteria is about 1/1000 bases for sequence which is 50% A-T but jumps to 15/1000 at 80% A-T, an A-T level found in parts of the B region of X, for instance (Collins and Coulson, 1984). ...
Article
Full-text available
A computer search protocol for finding Escherichia coli bacterial and phage promoters is presented. This protocol relies heavily on the description of promoter sites developed in the preceding paper (O'Neill, M. C. (1989) J. Biol. Chem. 264, 5522–5530), with particular emphasis on the existence of a distinct consensus sequence for each of the three major spacing classes. The input sequence is tested independently for promoters with 16, 17, or 18 bases separating the −35 and −10 regions. Within a given spacing group, a series of six tests is employed to define possible promoters. These tests were developed empirically to identify members of the known promoter database with high efficiency while producing a minimal level of false positives. The degree to which this aim is met is discussed in the context of searches of random sequence, of pBR322, and of λ.
... Different general approximate matching techniques, as well as the ones, specifically designed for this task, are used for binding motif search (Crochemore and Sagot, 2002, and references therein). Various approaches based on weight matrices (Staden, 1984;Harley and Reynolds, 1987;Mulligan and McClure, 1986), neural networks (Lukashin et al., 1989;Demeler and Zhou, 1991;O'Neill, 1991O'Neill, , 1992Horton and Kanehisa, 1992;Mahadevan and Ghosh, 1994;Pedersen and Engelbrecht, 1995), generalized portrait (Alexandrov and Mironov, 1990), hidden markov models (Pedersen et al., 1996), genetic algorithms (Bailey and Hart, http://citeseer.nj.nec.com/172804.html), syntactic recognition algorithms (Rosenblueth et al., 1996;Leung et al., 2001) and automatic motif discovery techniques (Hertz and Stormo, 1996;Elkan, 1994, 1995;Tompa, 1999;Liu et al., http://bioprospector.stanford.edu/;Kent, ...
... First, both training and test test were very small. Second, because of the way the negative examples were generated, the promoter search protocol is likely to be highly sensitive to the average A/T ratio of the input due to A/T relative richness of promoters (Mulligan and McClure, 1986;O'Neill and Chiafari, 1989). Horton and Kanehisa (1992) reported 'perceptron type neural network for prediction of E.coli σ 70 promoters'. ...
... Another type of software (e.g., Strawberries TSSP [53] and Promsearch [54]) answers the question "Does this sequence contain a promoter? If yes, where are its TF binding sites?" ...
... We reckon that a mix of the two types of software described above is necessary to confirm the presence of TF binding sites and a potential promoter. Although some very good step-by-step methods to safely characterise promoter regions are available in the literature [54], a personal combination of analyses with all these software packages, adjusted to the types of genes and organisms under scrutiny, remains the best option. ...
Article
Recent evolutionary studies clearly indicate that evolution is mainly driven by changes in the complex mechanisms of gene regulation and not solely by polymorphism in protein-encoding genes themselves. After a short description of the cis-regulatory mechanism, we intend in this review to argue that by applying newly available technologies and by merging research areas such as evolutionary and developmental biology, population genetics, ecology and molecular cell biology it is now possible to study evolution in an integrative way. We contend that, by analysing the effects of promoter sequence variation on phenotypic diversity in natural populations, we will soon be able to break the barrier between the study of extant genetic variability and the study of major developmental changes. This will lead to an integrative view of evolution at different scales. Because of their sessile nature and their continuous development, plants must permanently regulate their gene expression to react to their environment, and can, therefore, be considered as a remarkable model for these types of studies.
... The basis of this concept is that the expression of transferred genes depends most importantly on the recognition of the promoter sequence by the transcription machinery. This promoter recognition is dependent on the G?C content of the sequence which may be deduced from the overall genomic G?C content and, subsequently, phylogenetic relatedness (Mulligan and Mcclure 1986). Two studies from Navarre et al. (2006Navarre et al. ( , 2007 found transferred DNA that is G?C-poor compared to the recipient genome may experience silencing and therefore will not be expressed. ...
Article
Full-text available
Plasmids, circular DNA that exist and replicate outside of the host chromosome, have been important in the spread of non-essential genes as well as the rapid evolution of prokaryotes. Recent advances in environmental engineering have aimed to utilize the mobility of plasmids carrying degradative genes to disseminate them into the environment for cost-effective and environmentally friendly remediation of harmful contaminants. Here, we review the knowledge surrounding plasmid transfer and the conditions needed for successful transfer and expression of degradative plasmids. Both abiotic and biotic factors have a great impact on the success of degradative plasmid transfer and expression of the degradative genes of interest. Properties such as ecological growth strategies of bacteria may also contribute to plasmid transfer and may be an important consideration for bioremediation applications. Finally, the methods for detection of conjugation events have greatly improved and the application of these tools can help improve our understanding of conjugation in complex communities. However, it remains clear that more methods for in situ detection of plasmid transfer are needed to help detangle the complexities of conjugation in natural environments to better promote a framework for precision bioremediation.
... These results suggest that GC content may be only a rough proxy for rpoD consensus content (as rpoD consensus sequences are AT-rich), but GC content itself may not be an accurate predictor of library presence/abundance; indeed, in some cases, a genome may have a moderate or relatively high percent GC but also possess an unusually high rpoD consensus content, leading to an underrepresentation in the cosmid library that could not have been predicted from GC content alone ( Figure 6). In our view, these results are also consistent with the previous observation that library bias was more obvious among organisms with low GC content [2] because AT-rich genomes would have an increased number of promoter-like sequences simply by chance [25]. It was reported that there are difficulties associated with cosmid-cloning of very AT-rich genomic DNA [26,27], and even when genomic libraries can be constructed, cosmid clones may be unstable [28][29][30][31], which simply means that foreign DNA fragments are not able to be maintained in the E. coli library host. ...
Preprint
Full-text available
Background: Clone libraries provide researchers with a powerful resource with which to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, as well as allowed the mining of novel enzymes for specific functions of interest. These libraries are often constructed by cloning large-inserts (~30 kb) into a cosmid or fosmid vector. Recently, there have been reports of GC bias in fosmid metagenomic clone libraries, and it was speculated that the bias may be a result of fragmentation and loss of AT-rich sequences during the cloning process. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role in library bias. Results: To explore the possible mechanisms responsible for sequence bias in clone libraries, and in particular whether fragmentation is involved, we constructed a cosmid clone library from a human microbiome sample, and sequenced DNA from three different steps of the library construction process: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final constructed cosmid library, and we provide strong evidence that the sequence bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after the DNA is introduced into E. coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for consensus promoters and found that rpoD/sigma-70 promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the reference genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found that the bias appears to be more closely correlated with the number of rpoD/sigma-70 consensus sequences in the genome than with simple GC content. Conclusions: The GC bias of metagenomic clone libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter consensus sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/sigma-70 consensus-like in E. coli may lead to plasmid instability or loss of insert DNA. Our results suggest that despite widespread use of E. coli to propagate foreign DNA, the effects of in vivo transcriptional activity may be under-appreciated. Further work is required to tease apart the effects of transcription from those of gene product toxicity.
... Change in abundance is depicted on a log scale; CE 0 values indicate zero abundance in the crude extract sample and CL 0 values indicate zero abundance in the cosmid library sample, as predicted by MetaPhlAn have been predicted from GC content alone (Fig. 6). In our view, these results are also consistent with the previous observation that library bias was more obvious among organisms with low GC content [2] because AT-rich genomes would have an increased number of rpoD promoter-like sequences simply by chance [25]. ...
Article
Full-text available
Background Clone libraries provide researchers with a powerful resource to study nucleic acid from diverse sources. Metagenomic clone libraries in particular have aided in studies of microbial biodiversity and function, and allowed the mining of novel enzymes. Libraries are often constructed by cloning large inserts into cosmid or fosmid vectors. Recently, there have been reports of GC bias in fosmid metagenomic libraries, and it was speculated to be a result of fragmentation and loss of AT-rich sequences during cloning. However, evidence in the literature suggests that transcriptional activity or gene product toxicity may play a role. Results To explore possible mechanisms responsible for sequence bias in clone libraries, we constructed a cosmid library from a human microbiome sample and sequenced DNA from different steps during library construction: crude extract DNA, size-selected DNA, and cosmid library DNA. We confirmed a GC bias in the final cosmid library, and we provide evidence that the bias is not due to fragmentation and loss of AT-rich sequences but is likely occurring after DNA is introduced into Escherichia coli. To investigate the influence of strong constitutive transcription, we searched the sequence data for promoters and found that rpoD/σ70 promoter sequences were underrepresented in the cosmid library. Furthermore, when we examined the genomes of taxa that were differentially abundant in the cosmid library relative to the original sample, we found the bias to be more correlated with the number of rpoD/σ70 consensus sequences in the genome than with simple GC content. Conclusions The GC bias of metagenomic libraries does not appear to be due to DNA fragmentation. Rather, analysis of promoter sequences provides support for the hypothesis that strong constitutive transcription from sequences recognized as rpoD/σ70 consensus-like in E. coli may lead to instability, causing loss of the plasmid or loss of the insert DNA that gives rise to the transcription. Despite widespread use of E. coli to propagate foreign DNA in metagenomic libraries, the effects of in vivo transcriptional activity on clone stability are not well understood. Further work is required to tease apart the effects of transcription from those of gene product toxicity. Electronic supplementary material The online version of this article (doi:10.1186/s40168-015-0086-5) contains supplementary material, which is available to authorized users.
... There were a sequence similar to the TATTTA consensus promoter sequence (-10 region) and two sequences, TTAGGT and TTAATA, similar to the consensus RNA polymerase-binding site (23) proximal to appB (Fig. 4). Upstream of the start codon of appB is a potential ribosomebinding site (Fig. 4). ...
Article
Full-text available
TheappBDgenes encoding thesecretion functions forthe110-kDa RTX hemolysin ofActinobacillus pleuropneumoniae havebeencloned andsequenced. Unlike analogous genesfromother RTX determinants, the appBDgenesdonotlieimmediately downstream fromthehemolysin structural gene,appA.Although isolated fromadiverse group ofgram-negative organisms, theappBDgenesandthecharacterized RTXBD genesfrom otherorganisms allexhibit a highdegree ofhomology atboththeDNA andpredicted aminoacid sequence levels. Analysis oftheDNA sequences3'toappAand5'toappBsuggests thatthese regions harborremnant RTX B andA pseudogenes, respectively. Although theappAgene ismostsimilar tothe1ktA gene from Pasteurella haemolytica (Y.F.Chang,R.Young,andD.K.Struck, DNA 8:635-647, 1989), theRTX A pseudogene upstreamfromappBmostclosely resembles thehlyBgene fromEscherichia coli, suggesting thatthe appCAandappBDoperonswere derived fromdifferent ancestral RTX determinants. Porcine pleuropneumonia isa highly contagious respira- torydisease causedbyActinobacillus pleuropneumoniae andisa major cause ofeconomic loss totheswineindustry (25). Sincesterile culture supernatants fromA.pleuropneu- moniaeinduce alocalized pneumonia similar tothat seen in naturally infected pigs(28), itisprobable that one ofthe virulence factors ofA. pleuropneumoniae isa secreted cytotoxin. A likely candidate forthisextracellular toxin is the110-kDa hemolysin produced bypathogenic serotypes of A.pleuropneumoniae (5). Thegeneforthe110-kDa hemolysin hasbeencloned, and DNA sequenceanalysis indicates thatitbelongs totheRTX family ofcytotoxins (4), afamily whichincludes thehemoly- sinsproduced byEscherichia coli (6,11),Proteusvulgaris (15), andMorganella morganii (15)andtheleukotoxins produced byPasteurella haemolytica (13,19)andActinoba- cillus actinomycetemcomitans
... Examples of the use of this program are presented in Fig.5. The same analysis was made earlier using a statistical vector [17]. In spite of similarity between the distinguishing vectors the patterns of promoter localization differ significantly. ...
Article
Full-text available
An algorithm from the pattern recognition theory ‘generalized portrait’ was used to find a distinguishing vector (scoring matrix) for E.coli promoters. We have attempted to solve three closely linked problems: (i) the selection of significant features of the signal; (ii) subsequent multiple alignment and (iii) calculation of the vector coordinates. Promoters with known strength have been successfully ranked in the correct order using this vector. We demonstrate the use of this method in predicting the location of promoters. A revised consensus promoter sequence is also presented.
... Based on these data, a score of 75 is expected to provide the required low level of false negatives across a broad range of genomes. To evaluate the level of false positives, a sequence randomized E. coli genome was analyzed [41]. 25.6% of genes in the native E. coli genome and 18.4% of the genes in the sequence randomized genome were predicted to have s 54 promoters. ...
Article
Full-text available
Heterologous expression of bacterial biosynthetic gene clusters is currently an indispensable tool for characterizing biosynthetic pathways. Development of an effective, general heterologous expression system that can be applied to bioprospecting from metagenomic DNA will enable the discovery of a wealth of new natural products. We have developed a new Escherichia coli-based heterologous expression system for polyketide biosynthetic gene clusters. We have demonstrated the over-expression of the alternative sigma factor σ(54) directly and positively regulates heterologous expression of the oxytetracycline biosynthetic gene cluster in E. coli. Bioinformatics analysis indicates that σ(54) promoters are present in nearly 70% of polyketide and non-ribosomal peptide biosynthetic pathways. We have demonstrated a new mechanism for heterologous expression of the oxytetracycline polyketide biosynthetic pathway, where high-level pleiotropic sigma factors from the heterologous host directly and positively regulate transcription of the non-native biosynthetic gene cluster. Our bioinformatics analysis is consistent with the hypothesis that heterologous expression mediated by the alternative sigma factor σ(54) may be a viable method for the production of additional polyketide products.
... Potential promoters were identified using PCSEARCH (Mulligan and McClure, 1986). Open reading frames were defined using the GCG program (Wisconsin Package, version 10.0). ...
Article
Full-text available
Translational regulation of the stationary phase sigma factor RpoS is mediated by the formation of a double-stranded RNA stem–loop structure in the upstream region of the rpoS messenger RNA, occluding the translation initiation site. The interaction of the rpoS mRNA with a small RNA, DsrA, disrupts the double-strand pairing and allows high levels of translation initiation. We screened a multicopy library of Escherichia coli DNA fragments for novel activators of RpoS translation when DsrA is absent. Clones carrying rprA (RpoS regulator RNA) increased the translation of RpoS. The rprA gene encodes a 106 nucleotide regulatory RNA. As with DsrA, RprA is predicted to form three stem–loops and is highly conserved in Salmonella and Klebsiella species. Thus, at least two small RNAs, DsrA and RprA, participate in the positive regulation of RpoS translation. Unlike DsrA, RprA does not have an extensive region of complementarity to the RpoS leader, leaving its mechanism of action unclear. RprA is non-essential. Mutations in the gene interfere with the induction of RpoS after osmotic shock when DsrA is absent, demonstrating a physiological role for RprA. The existence of two very different small RNA regulators of RpoS translation suggests that such additional regulatory RNAs are likely to exist, both for regulation of RpoS and for regulation of other important cellular components.
... Many prediction approaches have been developed for recognizing E. coli 70 σ promoter because of its representative features. Most of them can be classified into two main groups: content-based methods [2] and signal-based methods [3]. Content-based methods only concern the global content property of the sequences and cannot provide exactly positional information of the signals. ...
Chapter
Promoter identification is an essential task in the research of transcription regulation, but the prediction accuracy of current methods is still far away from what it is expected. An effective and reliable prediction method for prokaryotic promoter regions would be very helpful. We have developed a quadratic discriminant analysis (QDA) method based on feature selection to predict prokaryotic promoter regions, which are classified according to their locations in genome. In order to utilize more characteristic information, we incorporate content features, signal features and structure features of the promoters in the candidate feature set and construct proper statistical models to calculate them. Especially for the main conserved signal features, a composite motif model is adopted, which achieves the optimal parameters by an iterative search algorithm OPSIA. Using the squared Mahalonobis distance as a measure, the discriminating features are selected out from the candidate features through a stepwise procedure and are combined as a multidimensional vector. Then the vector of combined features is further used by QDA to predict the potential promoter regions. The algorithm has been trained and tested on E. coli and B. subtilis promoter datasets by the jackknife method. For E. coli σ 70 promoters located in the non-coding regions, the average prediction accuracy is 85.7%, and for the ones located in the coding regions and several other kinds of prokaryotic promoters, their prediction accuracies are also about 80%. The results indicate that our method is a universal algorithm that outperforms most of the existing approaches based on several performance measurements. Furthermore, the framework of the method is extendable, which can accept more new features to improve the prediction results efficiently. The OPSIA algorithm is also a useful tool to explore composite motifs in newly uncovered promoter sequences.
... The relations between the promoter sequences and their strengths were extensively studied in the 1980s (Mulligan et al., 1984(Mulligan et al., , 1985Mulligan and McClure, 1986;Kobayashi et al., 1990;Szoke et al., 1987;Ayers et al., 1989;O'Neill, 1989;Stefano and Gralla, 1982;Youderian et al., 1982;Gardella et al., 1989;Burr et al., 2000;Strohl, 1992;Kumar et al., 1993;Straney et al., 1994). Several studies used Escherichia coli promoters corresponding to the σ 70 subunit of RNA polymerase. ...
Article
Motivation: The relations between the promoter sequences and their strengths were extensively studied in the 1980s. Although these studies uncovered strong sequence-strength correlations, the cost of their elaborate experimental methods have been too high to be applied to a large number of promoters. On the contrary, a recent increase in the microarray data allows us to compare thousands of gene expressions with their DNA sequences. Results: We studied the relations between the promoter sequences and their strengths using the Escherichia coli microarray data. We modeled those relations using a simple weight matrix, which was optimized with a novel support vector regression method. It was observed that several non-consensus bases in the '-35' and '-10' regions of promoter sequences act positively on the promoter strength and that certain consensus bases have a minor effect on the strength. We analyzed outliers for which the observed gene expressions deviate from the promoter strength predictions, and identified several genes with enhanced expressions due to multiple promoters and genes under strong regulation by transcription factors. Our method is applicable to other procaryotes for which both the promoter sequences and the microarray data are available.
... Therefore, understanding the factors responsible for the low level of transcription and the possible mechanisms of regulation of gene expression in Mycobacteria, involves the examination of the mycobacterial promoter structure and the promoter transcription machinery, including chemical information about the involved RNA molecules (Arnvig et al., 2005; Harshey and Ramakrishnan, 1977). Efforts have been made to develop statistical algorithms for the sequence analysis and motif prediction by searching for homologous regions or by comparing the sequence information with a consensus sequence (O'Neill and Chiafari, 1989 ). Wide variations existing within individual promoter sequences are primarily responsible for the unsatisfactory results yielded by the promoter-site-searching algorithms that in essence perform statistical analysis (Mulligan and McClure, 1986; Mulligan et al., 1984). Therefore, it can be inferred that the recognition of Mps requires a powerful technique capable of unravelling those hidden patterns in the promoter regions, which are difficult to identify directly by sequence alignment. ...
Article
The importance of the promoter sequences in the function regulation of several important mycobacterial pathogens creates the necessity to design simple and fast theoretical models that can predict them. This work proposes two DNA promoter QSAR models based on pseudo-folding lattice network (LN) and star-graphs (SG) topological indices. In addition, a comparative study with the previous RNA electrostatic parameters of thermodynamically-driven secondary structure folding representations has been carried out. The best model of this work was obtained with only two LN stochastic electrostatic potentials and it is characterized by accuracy, selectivity and specificity of 90.87%, 82.96% and 92.95%, respectively. In addition, we pointed out the SG result dependence on the DNA sequence codification and we proposed a QSAR model based on codons and only three SG spectral moments.
... There were a sequence similar to the TATTTA consensus promoter sequence (-10 region) and two sequences, TTAGGT and TTAATA, similar to the consensus RNA polymerase-binding site (23) proximal to appB (Fig. 4). Upstream of the start codon of appB is a potential ribosomebinding site (Fig. 4). ...
Article
Full-text available
The appBD genes encoding the secretion functions for the 110-kDa RTX hemolysin of Actinobacillus pleuropneumoniae have been cloned and sequenced. Unlike analogous genes from other RTX determinants, the appBD genes do not lie immediately downstream from the hemolysin structural gene, appA. Although isolated from a diverse group of gram-negative organisms, the appBD genes and the characterized RTX BD genes from other organisms all exhibit a high degree of homology at both the DNA and predicted amino acid sequence levels. Analysis of the DNA sequences 3' to appA and 5' to appB suggests that these regions harbor remnant RTX B and A pseudogenes, respectively. Although the appA gene is most similar to the lktA gene from Pasteurella haemolytica (Y. F. Chang, R. Young, and D. K. Struck, DNA 8:635-647, 1989), the RTX A pseudogene upstream from appB most closely resembles the hlyB gene from Escherichia coli, suggesting that the appCA and appBD operons were derived from different ancestral RTX determinants.
... Although expression of traI is mostly dependent upon positive activation of the tra operon by the TraJ protein, biochemical experiments have demonstrated that there is significant traJ-independent transcription of tral that occurs separately from traD (6,16). Based on sequence similarity to the E. coli sigma-70 promoter consensus (15), a possible location for this in vivo promoter would be approximately 350 base pairs upstream of the Tral translation startpoint (-10 region at positions 1134 to 1139). Electron microscopy has been used to map a strong RNA polymerase-binding site in this region of tra DNA (13). ...
Article
Full-text available
A 6.9-kilobase region of the Escherichia coli F plasmid containing the 3' half of the traD gene and the entire traI gene (encodes the TraI protein, DNA helicase I and TraI, a polypeptide arising from an internal in-frame translational start in traI) has been sequenced. A previously unidentified open reading frame (tentatively trbH) lies between traD and traI.
... Inspection of BENT-6 reveals a sequence (TACAAT) with homology to the procaryotic promoter consensus hexamer sequences for the -10 region (TATAAT), with only one mismatch (T instead of C) in the least-conserved (49%) position. There is also a sequence 18 bp upstream of the -10 region (ATGATC) with similarity to the consensus (TTGACA) for the -35 region of procaryotic promoters (18,21). Recently, bent DNA sequences, even those with very limited similarity to consensus transcription initiation sites, have been demonstrated to function as promoters in vivo (8). ...
Article
Full-text available
Transposon Tn5 mutagenesis of the Escherichia coli chromosome was used to isolate 21 independent insertion mutations conferring an altered colony color phenotype on MacConkey-glycerol plates. The polymerase chain reaction was used to map 16 of these Tn5 insertions within the glpFK region at 88 min. The most polar Tn5 insertion was shown by nucleotide sequencing to be in the proposed glpF open reading frame. The data suggest that the glpF and glpK genes are in an operon with a bent DNA segment (BENT-6) involved in transcriptional regulation of this operon.
... The CFA/I gene has a typical Shine-Dalgarno sequence, AGGA, located at nucleotide positions 186 to 189. A sequence, TACAAT, located at positions 148 to 153 was tentatively assigned as the -10 sequence of the CFA/I gene promoter on the basis of the results of promoter sequence analyses using the computer program PROMSEARCH (35). No DNA sequence homologous to the consensus -35 sequence was found. ...
Article
Full-text available
The colonization factor antigen I (CFA/I) gene has been isolated and sequenced. The amino acid sequence of CFA/I deduced from the nucleotide sequence is composed of 170 amino acids. The first 23 amino acids are considered to be the signal peptide of the CFA/I protein since they are not present in the protein sequence. Among the remaining amino acids, only two are different from the protein sequence: amino acid position 76 is an aspartic acid instead of an asparagine, and position 97 is a serine instead of an alanine. The CFA/I gene has a typical Shine-Dalgarno sequence located 10 base pairs (bp) upstream from the initiation codon. The sequence TACAAT located 48 bp upstream from the initiation codon was tentatively designated the -10 sequence of the CFA/I gene promoter. No sequences homologous to the consensus -35 promoter sequence was found. A pair of inverted repeat sequences followed by a stretch of eight A's are located 45 bp downstream from the termination codon of the CFA/I gene; this region may be a rho-independent transcriptional terminator.
Chapter
Applications of artificial neural networks in the field of genome research will be reviewed and some more recent developments in neural network research relevant for future applications will be surveyed. The basic definitions for artificial neural networks and neural learning algorithms will be introduced. The applications range from the recognition of translation initiation sites in nucleic acid sequences, the recognition of splice junctions and exons/ introns in mRNA, the detection of uncommon sequences in cDNA, to the prediction of secondary and tertiary structures of proteins from the amino acid sequence, the detection of structural motifs in protein sequences and the classification of protein sequences into functional families. Most applications employ multilayer feedforward networks trained supervised with the backpropagation learning algorithm or self-organising Kohonen maps adapted unsupervised for feature extraction. The most promising developments in neural network research usable in all mentioned applications are new modular network architectures with more problem-tailored connection topologies such as linked receptive fields and recurrent networks with short-term memory capable of modelling any dynamical system using only inductive learning.
Chapter
The computational prediction of regulatory components in genomic DNA is an attractive and complex research field. The main interest is in finding protein coding genes in long stretches of non-mapped DNA. A particularly important segment of gene finding is the location of promoters - a specific group of regulatory components that are just at the beginning of the gene and which initiate the DNA transcription process. The computational methods for promoter recognition are not sufficiently developed yet. Current methods are prone to produce a large number offalse predictions. We present a new method based on clustering the PCA transformed DNA data with further signal processing of the clustered data. The basic technical system consists of eleven neural networks (one SOM ANN and ten GRNNs). On an independent test set the system shows an increased accuracy of recognition with a reduced level offalse positive reporting. A special method of data separation into the training set and test set is used. The results achieved with the extended system appear to be currently the best in the class of those that use neural networks for promoter recognition.
Article
Full-text available
The 1123 topological structure parameters of DNA bases were directly used as descriptors to characterize the sequence of 38 E. coli promoters. For the correspondingly generated high-dimensional feature set, the correlation analysis and binary matrix shuffling filter (BMSF) were successively used to remove the redundancy or useless features, and only 20 features were finally reserved, with definite meanings. Based on reserved features and support vector regression (SVR), a quantitative structure-activity relationship (QSAR) model was established for the analysis of 38 E. coli promoters, and the leave-one-out (LOO) prediction accuracy of this model was of 0.838, superior to that of reference model, i.e. partial least squares (PLS). Referring to the SVR interpretation system, the established QSAR model in this work has extremely significant nonlinear regression, and the relationship between real promoter strength and 11 significant reserved features was directly given out. This work provides an efficient tool for the QSAR analysis of promoters and other similar molecular sequences.
Article
There have been many different approaches employed to define the “consensus” sequence of various DNA binding sites and to use the definition obtained to locate and rank members of a given sequence family. The analysis presented here enlists two of these approaches, each in modified form, to develop a highly efficient search protocol for Escherichia coli promoters and to provide a relative ranking of these sites showing good agreement with in vitro measurements of promoter strength.
Article
Predicting mycobacterial sequences promoter of protein synthesis is important in the study of protein metabolism regulation. This goal is however considered a challenging computational biology task due to low inter-sequences homology. Consequently, a previous work based only on DNA sequence had to use a large input parameter set and multilayered feed-forward ANN architecture trained using the error-back-propagation algorithm to raise an overall accuracy up to 97% [Kalate, et al. 2003. Comput. Biol. Chem. 27, 555–564]. Subsequently, one could expect that a notably simpler model may be derived using parameters based on non-linear structural information. In the present work, a method based on molecular folding negentropies (Θk) is introduced to predict by the first time mycobacterial promoter sequences (mps) from the corresponding RNA secondary structure. The best QSAR equation found was the classification function mps=4.921×0ΘM−1.205, which recognised 126/135 mps (93.3%) and 100% of 245 control sequences (cs). The model have shown a very high Mathew regression coefficient C=0.949. Both average overall accuracy and predictability were 97.6%. Additionally, several machine learning algorithms were applied in order to reaffirm the validity of the LDA model from the chemometrics point of view. This linear model with only one parameter (0ΘM) may be considered the simpler reported up-to-date by large, without lack of accuracy (97%) with respect to Kalate et al.'s model.
Article
The consensus sequence of E.coli promoter elements was determined by the method of random selection. A large collection of hybrid molecules was produced in which random-sequence oligonucleotides were cloned in place of a wild-type promoter element, and functional −10 and −35 E.coli promoter elements were obtained by a genetic selection involving the expression of a structural gene. The DNA sequences and relative levels of function for −10 and −35 elements were determined. The consensus sequences determined by this approach are very similar to those determined by comparing DNA sequences of naturally occuring E.coli promoters. However, no strong correlation is observed between similarity to the consensus and relative level of function. The results are considered in terms of E.coli promoter function and of the general applicability of the random selection method
Article
Translation of the IS 10 transposase gene is known to be very infrequent. We have identified mutations whose genetic properties suggest that they act directly to increase or decrease the intrinsic level of translation initiation. Also, we have analysed in detail the effects of these mutations on IS 10 mRNA using one particular IS 10 derivative. In this case, increases or decreases in translation are accompanied by increases or decreases in both the steady state level and the half-life of transposase mRNA; effects on steady state levels are much more dramatic than effects on message half-life. At wild-type levels of translation initiation, the rate-limiting step in physical decay of full length IS 10 message for a particular IS 10 derivative is shown to be rne-dependent endonucleolytic cleavage; 3′ exonucleases appear to play a secondary role, degrading primary cleavage products. Analysis of interplay between translation mutations and rne function, together with the above observations, suggests that translation stabilizes messages in a general way against rne-dependent endonucleolytic cleavage, and that significant protection may be conferred by one or a few ribosomes. However, dramatic effects of translation on steady state message levels are still observed in an rne mutant and involve the 3′ end of the transcript; we propose that these additional effects reflect translation-mediated stimulation of transcript release.
Article
In this paper, we have investigated the real-world task of recognizing biological concepts in DNA sequences. Recognizing promoters in strings that represent nucleotides (one of A, G, T, or C) has been performed using a novel approach based on combining feature selection (FS) and least square support vector machine (LSSVM). Dimensionality of Escherichia coli promoter gene sequences dataset has 57 attributes and 106 samples including 53 promoters and 53 non-promoters. The proposed system consists of two parts. Firstly, we have used the FS process to reduce the dimensionality of E. coli promoter gene sequences dataset that has 57 attributes. So the dimensionality of this dataset has been reduced to 4 attributes by means of FS process.Secondly, LSSVM classifier algorithm has been run to estimation the E. coli promoter gene sequences. In order to show the performance of the proposed system, we have used the success rate, sensitivity and specificity analysis, 10-fold cross validation, and confusion matrix. Whilst only LSSVM classifier has been obtained 80% success rate using 10-fold cross validation, the proposed system has been obtained 100% success rate for same condition. These obtained results indicate that the proposed approach improve the success rate in recognizing promoters in strings that represent nucleotides.
Article
The field of computational molecular biology and genetics is expanding at an enormous rate. Journals such as CABIOS and Nucleic Acids Research routinely publish articles on computational and mathematical aspects of biology. The purpose of this paper is to provide a bibliographic review of the literature in this area related to DNA mapping and sequence analysis. We have focused on computer and mathematical aspects of molecular biology and genetics (interpreted in a broad sense). Authors are solicited for their additions/corrections to this bibliography. Contact us at the above address.
Article
An Expectation Maximization algorithm for identification of DNA binding sites is presented. The approach predicts the location of binding regions while allowing variable length spacers within the sites. In addition to predicting the most likely spacer length for a set of DNA fragments, the method identifies individual sites that differ in spacer size. No alignment of DNA sequences is necessary. The method is illustrated by application to 231 Escherichia coli DNA fragments known to contain promoters with variable spacings between their consensus regions. Maximum-likelihood tests of the differences between the spacing classes indicate that the consensus regions of the spacing classes are not distinct. Further tests suggest that several positions within the spacing region may contribute to promoter specificity.
Article
Conjugation of catabolic plasmids in contaminated environments is a naturally occurring horizontal gene transfer phenomenon, which could be utilized in genetic bioaugmentation. The potentially important parameters for genetic bioaugmentation include gene regulation of transferred catabolic plasmids that may be controlled by the genetic characteristics of transconjugants as well as environmental conditions that may alter the expression of the contaminant-degrading phenotype. This study showed that both genomic guanine-cytosine contents and phylogenetic characteristics of transconjugants were important in controlling the phenotype functionality of the TOL plasmid. These genetic characteristics had no apparent impact on the stability of the TOL plasmid, which was observed to be highly variable among strains. Within the environmental conditions tested, the addition of glucose resulted in the largest enhancement of the activities of enzymes encoded by the TOL plasmid in all transconjugant strains. Glucose (1 g/L) enhanced the phenotype functionality by up to 16.4 (±2.22), 30.8 (±7.03), and 90.8 (±4.56)-fold in toluene degradation rates, catechol 2,3-dioxygenase enzymatic activities, and xylE gene expression, respectively. These results suggest that genetic limitations of the expression of horizontally acquired genes may be overcome by the presence of alternate carbon substrates. Such observations may be utilized in improving the effectiveness of genetic bioaugmentation.
Article
Full-text available
DNA sequence classification is the activity of determining whether or not an unlabeled sequence S belongs to an existing class C. This paper proposes two new techniques for DNA sequence classification. The first technique works by comparing the unlabeled sequence S with a group of active motifs discovered from the elements of C and by distinction with elements outside of C. The second technique generates and matches gapped fingerprints of S with elements of C. Experimental results obtained by running these algorithms on long and well conserved Alu sequences demonstrate the good performance of the presented methods compared with FASTA. When applied to less conserved and relatively short functional sites such as splice-junctions, a variation of the second technique combining fingerprinting with consensus sequence analysis gives better results than the current classifiers employing text compression and machine learning algorithms.
Article
Several graph representations have been introduced for different data in theoretical biology. For instance, Complex Networks based on Graph theory are used to represent the structure and/or dynamics of different large biological systems such as protein-protein interaction networks. In addition, Randic, Liao, Nandy, Basak, and many others developed some special types of graph-based representations. This special type of graph includes geometrical constrains to node positioning (sequence pseudo-folding rules) in 2D space and adopts final geometrical shapes that resemble lattice-like patterns. Lattice networks have been used to visually depict DNA and protein sequences but they are very flexible. In fact, we can use this technique to create string pseudo-folding lattice representations for any kind of string data. However, despite the proved efficacy of new Lattice-like graph/networks to represent diverse systems, most works focus on only one specific type of biological data. In this work, we review both classic and generalized types of lattice graphs as well as examples that illustrate how to use it in order to represent and compare biological data from different sources. The examples reviewed include the following cases: Protein sequence; Mass Spectra (MS) of protein Peptide Mass Fingerprints (PMF); Molecular Dynamic Trajectory (MDTs) from structural studies; mRNA Microarray data; Single Nucleotide Polymorphisms (SNPs); 1D or 2D-Electrophoresis study of protein Polymorphisms and Protein-research patent and/or copyright information. We used data available from public sources for some examples but for other, we used experimental results reported herein for the first time. This work may break new ground for the application of graph theory in theoretical biology and other areas of biomedical sciences. In addition, we carried out the statistical analysis of 50,000+ cases to seek and validate a new QSAR-like predictor for enzyme sub-classes. The model use as inputs spectral moments of pseudo-folding lattice graphs. Last we illustrated the use of this model to study 4,000+ ESTs of protein sequences present on the parasite Trichinella spiralis.
Article
Horizontal gene transfer (HGT) of plasmids is a naturally occurring phenomenon which could be manipulated for bioremediation applications. Specifically, HGT may prove useful to enhance bioremediation through genetic bioaugmentation. However, because the transfer of a plasmid between donor and recipient cells does not always result in useful functional phenotypes, the conditions under which HGT events result in enhanced degradative capabilities must first be elucidated. The objective of this study was to determine if the addition of alternate carbon substrates could improve toluene degradation in Escherichia coli DH5alpha transconjugants. The addition of glucose (0.5-5 g/L) and Luria-Bertani (LB) broth (10-100%) resulted in enhanced toluene degradation. On average, the toluene degradation rate increased 14.1 (+/-2.1)-fold in the presence of glucose while the maximum increase was 18.4 (+/-1.7)-fold in the presence of 25% LB broth. Gene expression of xyl genes was upregulated in the presence of glucose but not LB broth, which implies different inducing mechanisms by the two types of alternate carbon source. The increased toluene degradation by the addition of glucose or LB broth was persistent over the short-term, suggesting the pulse amendment of an alternative carbon source may be helpful in bioremediation. While the effects of recipient genome GC content and other conditions must still be examined, our results suggest that changes in environmental conditions such as alternate substrate availability may significantly improve the functionality of the transferred phenotypes in HGT and therefore may be an important parameter for genetic bioaugmentation optimization.
Article
Long-range two-body correlations in a DNA sequence should in theory approach a constant value very rapidly with increasing value of the correlation length. It is shown that for most DNA sequences, the long-range correlations exhibit oscillations superimposed on the constant background. These oscillations persist for very large correlation lengths. The oscillations are shown to be three-point cycles and are related to the coding regions in the DNA. A method for discovering the coding regions in DNA sequences is presented. The limitations of the method are discussed.
Article
In Escherichia coli, most of the dUMP that is used as a substrate for thymidylate synthetase is generated from dCTP through the sequential action of dCTP deaminase and dUTPase. Some mutations of the dut (dUTPase) gene are lethal even when the cells are grown in the presence of thymidine, but their lethality can be suppressed by extragenic mutations that can be produced by transposon insertion. Six suppressor mutations were tested, and all were found to belong to the same complementation group. The affected gene was cloned, it was mapped by hybridization with a library of recombinant DNA, and its nucleotide sequence was determined. The gene is at 2,149 kb on the physical map. Its product, a 21.2-kDa polypeptide, was overproduced 1,000-fold via an expression vector and identified as dCTP deaminase, the enzyme affected in previously described dcd mutants. Null mutations in dcd probably suppress the lethality of dut mutations by reducing the accumulation of dUTP, which would otherwise lead to the excessive incorporation of uracil into DNA.
Article
Two R plasmids, pYFC1 and pYFC2, from Pasteurella haemolytica A1 encoding sulfonamide, streptomycin (pYFC1), and ampicillin (pYFC2) resistances have been characterized by restriction endonuclease digestions, subcloning or DNA sequencing. pYFC1 consists of 4225 bp and is 51.9% in AT content. Physical mapping indicated a highly conserved region of restriction sites among pYFC1, RSF1010, pGS05, pFM739, pHD148 and pGS03B. pYFC1 encoded a dihydropteroate synthase (29.8 kDa), and streptomycin kinase (29.6 kDa) which is homologous in nucleotide sequences or deduced amino acid sequence to that encoded by a broad-host range IncQ plasmid RSF1010. Based on the primary structure of pYFC1, the sulfonamide and streptomycin genes are derived from the same ancestor of RSF1010. pYFC2 is similar to the plasmid from P. haemolytica LNPB51 isolated in France by partial restriction enzyme mapping. pYFC1 and pYFC2 have the same size of 4.2 kbp.
Article
The Escherichia coli araFGH operon codes for proteins involved in the L-arabinose high-affinity transport system. Transcriptional regulation of the operon was studied by creating point mutations and deletions in the control region cloned into a GalK expression vector. The transcription start site was confirmed by RNA sequencing of transcripts. The sequences essential for polymerase function were localized by deletions and point mutations. Surprisingly, only a weak -10 consensus sequence, and no -35 sequence is required. Mutation of a guanosine at position -12 greatly reduced promoter activity, which suggests important polymerase interactions with DNA between the usual -10 and -35 positions. A double mutation toward the consensus in the -10 region was required to create a promoter capable of significant AraC-independent transcription. These results show that the araFGH promoter structure is similar to that of the galP1 promoter and is substantially different from that of the araBAD promoter. The effects of 11 mutations within the DNA region thought to bind the cyclic AMP receptor protein correlate well with the CRP consensus binding sequence and confirm that this region is responsible for cyclic AMP regulation. Deletion of the AraC binding site nearest the promoter, araFG1, eliminates arabinose regulation, whereas deletion of the upstream AraC binding site, araFG2, has only a slight effect on promoter activity.
Article
soxR governs a superoxide response regulon that contains the genes for endonuclease IV, Mn2(+)-superoxide dismutase, and glucose 6-phosphate dehydrogenase. The soxR gene encodes a 17-kDa protein; some mutations of this gene cause constitutive overexpression of the regulon. Induction by paraquat (methyl viologen) requires both soxR and a new gene, soxS. soxS is adjacent to soxR, it encodes a 13-kDa protein, and it is required for paraquat resistance. These functions were revealed by studies in which the sequence of the 1.1-kb soxR-soxS region was determined, the 5' ends of the mRNAs were mapped, and complementation tests were performed with soxRS plasmids containing deletions of known sequence. The two genes are divergently transcribed, and the transcripts overlap. The soxS promoter is within the 85-nucleotide intergenic region, whereas the soxR promoter is within soxS. soxS mRNA increases after induction. Both protein products have possible DNA-binding (helix-turn-helix) domains. SoxR contains four cysteines (CX2CXCX5C) that might be part of a sensor region. SoxS shows 17 to 31% homology to the C-terminal portions of members of the AraC family of positive regulators.
Article
Difficulties encountered in the cloning of DNA from Streptococcus pneumoniae and other AT-rich organisms into ColE1-type Escherichia coli vectors have been proposed to be due to the presence of a large number of strong promoter-acting sequences in the donor DNA. The use of transcription terminators has been advocated as a means of reducing instability resulting from disruption of plasmid replication caused by strong promoters. However, neither the existence of promoter-acting sequences of sufficient strength and number to explain the reported cloning difficulties nor their role as a source of instability has been proven. As a direct test of the "strong promoter" hypothesis, we cloned random fragments from S. pneumoniae into an E. coli vector containing transcription terminators, identified strong promoter-acting sequences, and subsequently removed the transcription terminators. We observed that terminator removal resulted in reduced copy numbers for the strongest promoter-acting sequences but not in reduced promoter strengths or altered plasmid stabilities. Our results indicate that promoters strong enough to require transcription terminators for plasmid stability are probably rare in S. pneumoniae DNA.
Article
A computer tool is described for comparison, analysis and search of genetic signals. The method is based on sequence consensus matrices. It assumes that a genetic signal (such as a promoter, enhancer or whatever) is composed of several signal blocks separated from each other by variable distances. A set of programs is presented to perform the analysis. The result of such an analysis is a description of the investigated signal including matrices for each signal block, distances between each block and distribution of the values. Programs are provided to search for a signal using results from previous analysis. The method is able to align large sets of sequences within a few minutes and to check the quality of the alignment. An analysis of E.coli promoters is provided as an example.
Article
F plasmid DNA transfer (tra) gene expression in Escherichia coli is regulated by chromosome- and F-encoded gene products. To study the relationship among these regulatory factors, we constructed low-copy plasmids containing a phi(traY'-'lacZ)hyb gene that couples beta-galactosidase and Lac permease synthesis to the F plasmid traY promoter. Wild-type transformants maintained high levels of beta-galactosidase over a broad range of culture densities. Primer extension analysis of tra mRNA from F'lac and phi(traY'-'lacZ)hyb strains indicated very similar, though not identical, transcription initiation sites. Moreover, phi(traY'-'lacZ)hyb gene expression required both TraJ and SfrA, as does tra gene expression in F+ strains. beta-Galactosidase activity was reduced approximately 30-fold in the absence of TraJ, which could be supplied in cis or in trans. In a two-plasmid system in which TraJ was supplied in trans by a lac-traJ operon fusion, phi(traY'-'lacZ)hyb expression was a linear, saturable function of traJ expression. Enzyme activity was reduced approximately tenfold in sfrA mutants. That reduction could not be attributed to an effect on the TraJ level. Several other cellular or environmental variables had only a modest effect on phi(traY'-'lacZ)hyb expression. Hyperexpression was observed at high cell density (twofold) and in anaerobic cultures (1.2- to 1.5-fold). In contrast, expression was reduced twofold in integration host factor mutants.
Article
Full-text available
Methods for optimizing the prediction of Escherichia coli RNA polymerase promoter sequences by neural networks are presented. A neural network was trained on a set of 80 known promoter sequences combined with different numbers of random sequences. The conserved −10 region and −35 region of the promoter sequences and a combination of these regions were used in three independent training sets. The prediction accuracy of the resulting weight matrix was tested against a separate set of 30 known promoter sequences and 1500 random sequences. The effects of the network's topology, the extent of training, the number of random sequences in the training set and the effects of different data representations were examined and optimized. Accuracies of 100% on the promoter test set and 98.4% on the random test set were achieved with the optimal parameters.
Article
An analysis of the sequence Information contained in a compilation of published binding sites for E. coli integration host factor (IHF) was performed. The sequences of twenty-seven IHF sites were aligned; the base occurrences at each position, the information content, and an extended consensus sequence were obtained for the IHF site. The base occurrences at each position of the IHF site were used with a program written for the Apple Macintosh computers in order to determine the similarity scores for published IHF sites. A linear correlation was found to exist between the logarithm of IHF binding and functional data (relative free energies) and similarity scores for two groups of IHF sites. The MacTargsearch program and its potential usefulness in searching for other sites and predicting their relative activities is discussed.
Article
Seventeen DNA fragments from Streptococcus pneumoniae were randomly cloned in Escherichia coli with selection for promoter activity. The fragments were sequenced and the promoter locations were determined by primer extension analysis. Examination for sites similar to the E. coli major consensus promoter sequence revealed such a site in each of the seventeen fragments, located five to eight base pairs upstream of the point at which transcription was initiated in the E. coli host. Thus, the abundance of promoter activity found in pneumococcal DNA cloned in E. coli hosts arises primarily from sigma-70-type promoter structures. Combined with the observation that such sequences are usually found just upstream of, but not within, pneumococcal genes, this implies that one class (perhaps the major class) of pneumococcal promoters closely resembles the canonical E. coli promoter consensus.
Article
Full-text available
There exists a sequence of 8 nucleotides, highly conserved at the boundary between an exon and an intron, referred to as the 5′ splice site (5′ ss). The boundary between an intron and an exon, the 3′ splice site (3′ ss), also exhibits a highly conserved sequence of 4 nucleotides, preceded by a pyrimidine-rich region. To interpret the deluge of sequence information generated by the human genome project, it is important to be able to identify genes in uncharacterized sequences. Without this capability, the sequence information is of little value. For each sequence in GenBank, the splice sites were located using the information in the annotation preceding the sequence, a FEATURES line being followed by multiple lines indicating different features present in the gene. The percentages of lariat windows with high-scoring branch point sequences in all GenBank categories, with the exception of viruses, are higher than those of random sequences. A notable observation is that even the window that contains the second highest frequency of branch points among the five windows analyzed always had a lower frequency of high scoring branch points than expected for a window with a random sequence.
Article
Optimized weight matrices defining four major eukaryotic promoter elements, the TATA-box, cap signal, CCAAT-, and GC-box, are presented; they were derived by comparative sequence analysis of 502 unrelated RNA polymerase II promoter regions. The new TATA-box and cap signal descriptions differ in several respects from the only hitherto available base frequency Tables. The CCAAT-box matrix, obtained with no prior assumption but CCAAT being the core of the motif, reflects precisely the sequence specificity of the recently discovered nuclear factor NY-I/CP1 but does not include typical recognition sequences of two other purported CCAAT-binding proteins, CTF and CBP. The GC-box description is longer than the previously proposed consensus sequences but is consistent with Sp1 protein-DNA binding data. The notion of a CACCC element distinct from the GC-box seems not to be justified any longer in view of the new weight matrix. Unlike the two fixed-distance elements, neither the CCAAT- nor the GC-box occurs at significantly high frequency in the upstream regions of non-vertebrate genes. Preliminary attempts to predict promoters with the aid of the new signal descriptions were unexpectedly successful. The new TATA-box matrix locates eukaryotic transcription initiation sites as reliably as do the best currently available methods to map Escherichia coli promoters.
Article
Semisynthetic promoters activated by Escherichia coli cyclic AMP receptor protein (CRP) were created by combining a synthetic CRP-binding site (crb) and nucleotide sequences derived from cryptic promoter regions. A 22-bp oligodeoxyribonucleotide corresponding to an idealized crb was randomly placed into DNA regions that precede a promoterless lacZ gene on a plasmid. Several plasmid clones were obtained which allowed the expression of lacZ in crp+ cya+ cells carrying a chromosomal deletion of lac genes. The beta-galactosidase and the quantitative S1-nuclease assays of crp+ and delta crp cells harboring these plasmids indicated that the transcription from newly created promoters is dependent on CRP. Sequence analysis revealed that these promoters are divided into two types based on the location of the crb relative to the transcription start point (tsp). The distance from the center of the crb to the tsp is 70 bp in the first type and 38 bp in the second type. The sequences of all these promoters exhibit poor homology with the consensus promoter sequence.
Article
The Escherichia coli hemB gene, which encodes 5-aminolevulinic acid dehydratase, and was cloned into pTZ18U, a multicopy plasmid, was sequenced. The hemB insert was double-digested with restriction enzymes and recloned back into pTZ18U and pTZ19U to allow for sequencing in two directions. In a second procedure, used to fill in gaps and to confirm the sequence derived from the first procedure, the whole insert was cloned into M13 phages. A nested set of deletions was constructed and recloned into M13. Both the double-digested fragments cloned into plasmids pTZ18U and pTZ19U and the overlapping fragments contained in M13 phages were sequenced using the dideoxy procedure with [35S]dATP. Computer software was used to identify coding regions and the correct reading frame. Two promoter regions, two Shine-Dalgarno sequences and two possible start sites were identified. Extensive homologies with yeast (36%), human liver (40%) and rat liver (40%) amino-acid (aa) sequences were observed, especially in the 16-aa Zn-binding region (75%) and the 4 aa surrounding the essential lysine at the active site (100% for rat and human proteins). Computer analysis of promoter strength and two independent analyses of codon usage indicated that the hemB gene is moderately expressed.
Article
Full-text available
Second-site mutations that restored activity to severe lacPl down-promoter mutants were isolated. This was accomplished by using a bacteriophage fl vector containing a fusion of the mutant E. coil lac promoters with the structural gene for chlorampnenicol acetyltransferase (CAT), so that a system was provided for selecting phage revertants (or pseudorevertants) that conferred resistance of phage-infectedcells to chloramphenicol. Among the second-site changes that relieved defects in mutant lac promoters, the only one that restored lacPl activity was a T→G substitution at position −14, a weakly conserved site in E. coli promoters. Three other sequence changes, G→A at −2, A→T at +1, and C→A at +10, activated nascent promoters in the lac regulatory region. The nascent promoters conformed to the consensus rule, that activty is gainedoy sequence changes toward homology with consensus sequences at the −35 and −10 regions of the promoter. However, the relative activities of some promoters cannot be explained solely by consideration of their conserved sequence elements.
Article
Full-text available
A new method for evaluating some complex characteristics of the primary structure of E.coli promoters is proposed. The method, of nonparametric statistical significance. selects important conserved single-base positions in combination with 2-base coupling relations of identity and complementarity. The extended consensus of promoter characteristics thus obtained was used to scan unknown sequences for similarity with E.coli promoters. In terms of this method, a complete set of 244 E.coli promoters was shown to be structurally inconsistent. The set was then broken down into functionally homogeneous subsets of promoters to enhance the selectivity of the search for E.coli-specific promoter sequences, with a high significance level being attained.
Article
An Escherichia coli gene, which complements two independent hemA mutants of E. coli, has been cloned onto a multi-copy plasmid and both its strands have been sequenced. Both complemented mutants produce 5-aminolevulinic acid (ALA) and display fluorescence after 24h. The cloned sequence appears to encode a 46-kDa protein, which when produced in the maxicell procedure is processed to a 41-kDa protein as determined by sodium dodecyl sulfate-polyacrylamide-gel electrophoresis. The amino acid sequence of the cloned gene product shows no significant homologies with any cloned ALA synthase, nor with any protein, in two E. coli databanks. A second cloned gene fragment, which has its coding region 34 bp away from the coding region of the gene that complements hemA, has been identified as part of protein release factor 1(RF1), thus confirming the location of hemA at min 26.7 and mapping it precisely near RF1. We have shown that E. coli utilizes the intact five-carbon chain of glutamate for the synthesis of ALA [Li et al., J Bacteriol. 171 (1989b) 2547-2552].
Article
Full-text available
All pairs of a large set of known vertebrate DNA sequences were searched by computer for most similar segments. Analysis of this data shows that the computed similarity scores are distributed proportionally to the logarithm of the product of the lengths of the sequences involved. This distribution is closely related to recent results of Erdos and others on the longest run of heads in coin tossing. A simple rule is derived for determination of statistical significance of the similarity scores and to assist in relating statistical and biological significance.
Article
Full-text available
Two internal promoters in the his operon of Salmonella typhimurium have been precisely mapped genetically. The internal promoters are found in, or very close to, gene border regions in the his operon. The his operon was examined for the presence of additional internal promoters whose transcripts were sensitive to rho-mediated transcription termination and therefore had escaped detection. No new internal promoters were found. It is argued that the internal promoters described here are not likely to be fortuitous message start sites, but may play a physiologically important role in operon expression.
Article
Full-text available
The entire histidine operon of Escherichia coli K-12 was cloned in the vector plasmid pBR313, and a complete restriction map of the operon was determined. By using subclones, complementation tests, and enzyme assays, we were able to make a correlation between the physical map and the genetic map of the operon. We determined the sequence of a fragment of DNA 665 base pairs long, comprising the distal portion of the hisC gene, the proximal portion of the hisB gene, and the internal transcription initiation site hisBp. The efficiency of this promoter was assessed under different physiological conditions by cloning the DNA fragment in a recombinant vector system used to study transcriptional regulatory signals. The precise point at which transcription initiates was determined by S1 nuclease mapping.
Article
Full-text available
A new type of search algorithm to find biological information inherited in nucleic acid sequences was developed. The algorithm is of pattern match type and is based on the fact that genetic information often is a function of a predictable statistical occurrence of the four bases within parts of the sequence. The search algorithm compares the known statistical pattern of bases in e.g. a promoter, with an unknown sequence and calculates the statistical significance of the match at all positions in the unknown sequence. The program was tested on 54 published prokaryotic promoters. 44 or 49 could be found with 1 or 4 false answers, respectively. The program was also used on plasmid pBR322. All promoters functioning in an in vitro transcription system were found (tet, anti-tet, p4, bla and ori) except the so called p5 promoter. A search for donor and acceptor sites was performed in a human HLA genomic sequence that contains six introns. Five of the possible six donor and acceptor sites were found.
Article
Full-text available
The DNA sequence of 168 promoter regions (−;50 to +10) for Escherichia coli RNA polymerase were compiled. The complete listing was divided into two groups depending upon whether or not the promoter had been defined by genetic (promoter mutations) or biochemical (5′ end determination) criteria. A consensus promoter sequence based on homologies among 112 well-defined promoters was determined that was in substantial agreement with previous compilations. In addition, we have tabulated 98 promoter mutations. Nearly all of the altered base pairs in the mutants conform to the following general rule: down-mutations decrease homology and up-mutations increase homology to the consensus sequence.
Article
Full-text available
We describe a simple algorithm for computing a homology score for Escherichia coli promoters based on DNA sequence alone. The homology score was related to 31 values, measured in vitro, of RNA polymerase selectivity, which we define as the product KB k2 , the apparent second order rate constant for open complex formation. We found that promoter strength could be predicted to within a factor of ±4.1 in KB k2 over a range of 104 in the same parameter. The quantitative evaluation was linked to an automated (Apple II) procedure for searching and evaluating possible promoters in DNA sequence files.
Article
Full-text available
The statistical behavior of the similarity score for unrelated DNA sequences calculated as letter-by-letter comparison or from various forms of optimal alignment was studied. It was found that natural DNA-sequences from a data base and true random sequences show the same statistical behavior in terms of such scores. This makes it possible to adopt a simple criterion for the rejection of fortuitous similarity. It is based on the mean and standard deviation of chance scores whose expected values, depending on chain length, gap penalty and probability of letter coincidence, may be calculated from formulae given in the paper.
Article
E. coli promoters which are coordinately regulated in response to amino acid limitation contain conserved nucleotide sequences immediately 3′ to −10 region. These sequences contain predominantly either GC or AT residues depending on whether the response is respectively negative or positive. Certain classes of promoters also contain conserved sequences upstream of the primary promoter. In tRNA genes these sequences could act as a secondary polymerase binding site.
Article
The complete nucleotide sequence of bacteriophage T7 DNA, 39,936 base-pairs, has been determined by the techniques of Maxam & Gilbert. All previously known T7 genes and several unsuspected genes have been identified in the sequence. T7 DNA carries genetic information very efficiently: the coding sequences of 50 genes are close-packed but essentially not overlapping, and occupy almost 92% of the nucleotide sequence. This arrangement strongly suggests that all 50 of these closepacked genes are expressed, although there is as yet evidence for expression of only 38 of them. In addition, five potential overlapping genes have been identified, and there is preliminary evidence that one of them is expressed. Where gaps between coding sequences are found, they usually are less than 100 basepairs long, and usually contain one or more transcription signals, RNAase III cleavage sites, or origins of replication. Transcription signals in the T7 DNA include the three strong early promoters and the early termination site for Escherichia coli RNA polymerase, and 17 promoters and one termination site for T7 RNA polymerase. Ten RNAase III cleavage sites have been located, five in the early region and five in the late region. The primary transcripts are processed at these sites to provide the messenger RNAs observed in vivo. Almost all of the T7 messenger RNAs are polycistronic, but there are few polar effects at the level of transcription or translation, and most T7 proteins seem to be initiated independently, each from its own ribosome-binding and initiation site. The initiation codon for most T7 proteins is AUG, but a few proteins are predicted to begin at GUG. Certain T7 genes specify pairs of overlapping proteins. The two proteins specified by gene 4 are made in about equal amounts, beginning at two different ribosome-binding and initiation sites in the same reading frame and ending at a common termination codon. The two proteins specified by gene 10 are made in very different amounts. They begin at the same initiation site, but the minor gene 10 protein appears to be produced by a shift in translational reading frame just ahead of the normal termination codon, thereby adding 53 amino acids to the COOH-terminal end of the major protein. Gene 10 specifies the major capsid protein of the phage particle, and both the major and minor gene 10 proteins are incorporated into the phage particle. One or two other T7 genes appear to utilize translational frameshifting to produce unequal amounts of proteins that differ at their COOH-terminal ends. The amino acid sequences and compositions predicted for all of the T7 proteins (except the proteins produced by frameshifting) are given. T7 DNA begins and ends with a perfect direct repeat of 160 base-pairs. Immediately adjacent to this terminal repetition, at both ends of the mature DNA, lie very similar, regular arrays of 12 imperfect copies of a seven-base sequence. These arrays occupy about 160 base-pairs, starting about 15 basepairs from the terminal repetition. In the concatemeric form of T7 DNA, a single copy of the terminal repetition is flanked by these two arrays of repeated sequences, and it seems likely that this arrangement is involved somehow in formation of the ends of mature T7 DNA.
Article
The nucleotide sequence of the phage λ rex region consists of 1428 bp and codes for two genes, rex A and rexB. Hence the complete λ immunity region codes for four genes and covers 2664 bp of sequence unique to X, as defined by the left and right boundaries of theimm434 region. Coordinate expression of both rex A and rexB, which are co-regulated with the cI represser gene from promoters prm and pre is responsible for the Rex phenotype i.e. exclusion of a wide variety of superinfecting phage such as T4rII. The position of a third promoter, plit, which overlaps the carboxy-terminal end of the rex A coding region, permits expression of rexE without rexA, from the resulting 470 nucleotide lit RNA. The lit transcript, therefore, must act as messenger for rexE in the noncoordinate expression of the rexgenes that occurs late in λ lytic infection. The coordinate and noncoordinate expression of rexB and rex A suggests a dual role for the very hydrophobic rexE protein. Studies of λ early and late DNA replication implicate rexB as having auxiliary functions in both lysogenisation and lytic infection.
Article
The binding has been studied of Escherichia coli RNA-polymerase to the fragments of lambda DNA obtained by digestion with restriction endonucleases BsuI, HindIII, BamHI, EcoRI and HindII + III. There are at least six sites of RNA-polymerase binding in the b2-region. In vitro transcription of those BsuI-fragments of the b2-region which contain six binding sites is rightward. Therefore, the fragments contain promoters rather than mere RNA-polymerase binding sites. One of the promoters of the b2 region named patt was calculated to be about 50 bp to the left of the att site. We postulate that this promoter might correspond to the hef-target which was described as important for the site-specific recombination.
Article
Twenty Escherichia coli RNA polymerase molecules bind specifically to linear, double-stranded coliphage λ DNA at 30:1 polymerase-to-λ DNA molar ratio, under the conditions used. The binding sites were identified by electron microscopy employing a glutaraldehyde/BAC technique which measures binding with 84% specificity. Binding sites could be assigned to all well identified λ promoters, including pI, pL, prm, pr, po and pr, although only one bound polymerase could be found in the prm−pR region, probably reflecting the low affinity of RNA polymerase for the Prm promoter. Several binding sites seem to correspond to minor in vivo-active transcriptional startpoints (e.g., of lit or mis RNA), to potential promoter sites (e.g., hip), or to the startpoints observed only during in vitro enzymatic RNA synthesis (e.g., the b2 region and the 96 to 99.5% λ region). Moreover, a few binding sites are in the regions that bear no known startpoints but contain known transcriptional terminators. Correlation between the efficiency of initiation of RNA synthesis and the frequency of RNA polymerase binding is good only for some promoters. All of the RNA polymerase binding sites lie within the A + T-rich regions, as determined by partial denaturation mapping. However, quantitative correlation between frequencies of polymerase binding and localized DNA melting is far from perfect.
Article
Electron microscopy of nascent RNA chains has been used to localize promoters on linear and negatively superhelical λ DNA. Transcription was performed in vitro using Escherichia coli RNA polymerase. RNA from four promoters was seen on linear λ DNA; these include the λ PL and PR promoters, which are the main “early” λ promoters in vivo, and two promoters within the b2 region.In order to orient the circular DNA for electron microscopy, a restriction enzyme isolated from Streptomyces albus G. (Sal I) was used to cleave the DNA at unique points. The Sal I cleavage sites on λ DNA have been determined to be at 67.3% and 68.3% on the linear map.Through visualization of nascent RNA transcribed from superhelical λ DNA it is concluded that: (a) transcription increases from PL and PR with a particularly large increase from PL; (b) there is activation of transcription from promoters that are not used on the linear DNA and that coincide with the areas of λ which are most readily denatured and which possess the highest A + T content. The activation of such regions and the increased efficiency of the promoters used on linear DNA are discussed in terms of the energetics of unwinding a negatively superhelical DNA by a ligand such as RNA polymerase. The association of A + T-rich regions with promoters and the possible role of superhelicity in transcriptional activation in vivo are discussed.
Article
The cleavage sites in the early promoter region of coliphage T7 have been mapped for four restriction enzymes. They are, from the left end in base pairs, 1100 and 740 for Hinf; 680, 320, 530, 240, 77, and 67 for Hind II; 620 and 530 for Hpa II; 790 for Alu I. The nucleotide sequence between the Hind II site at 680 base pairs from the left end and the Hinf site at 740 base pairs from the left end has been determined, from which the start point of the promoter A3 is located at 720 base pairs from the left end. The start points of the other two major promoters A1 and A2 are deduced to be at 460 and 580 base pairs from the left end, respectively, from the chain lengths of the in vitro transcripts off the 1100 base-pairs long Hinf fragment. Similar to the sequences of a pL and pR promotors of phage lambda and a sequence in Simian Virus 40 used by Escherichia coli RNA polymerase as a promotor, the sequence of the A3 promotor of T7 also has a Hind II restriction site approximately 30 base pairs upstream to the start point of RNA synthesis. No such Hind II sites exist, however, for the A1 and A2 promoters. Experiments on the protection of some of the restriction sites on the 1100 base-pairs-long Hinf fragment by RNA polymerase binding support the electron microscopic observations of others that, in addition to the three sites A1, A2 and A3, there is at least a fourth site at which E. coli RNA polymerase can bind strongly. In addition to the Hind II site at 680 base pairs from the left end and the Hinf site at 740 base pairs from the left end, which are presumably protected by the binding of a single RNA polymerase at the A3 site, the Hind II site at 240 base pairs from the left end is also protected at a level of 5 polymerase molecules/DNA fragment. The possible existence of several minor promotor sites in the early promotor region, in addition to the three major promotor sites, is discussed.
Article
Transcription of T7 bacteriophage DNA by Escherichia coli RNA polymerase is initiated primarily at a cluster of three promoter sites (A1, A2, A3) located near the left-hand end of the genome. These sites are utilized preferentially in vitro when purified E. coli RNA polymerase is added to T7 DNA in the presence of the four ribonucleoside triphosphates (free initiation conditions). In addition, when E. coli RNA polymerase is added to T7 DNA in the absence of nucleoside triphosphates (prebinding conditions) the enzyme binds preferentially at these sites to form highly stable promoter complexes. However, addition of RNA polymerase in excess of that needed to saturate the A promoter sites leads to binding of enzyme at three or four additional promoter sites (minor promoter sites B, C, D and E). RNA polymerase holoenzyme bound at these sites forms highly stable complexes which are able to initiate RNA chains rapidly when presented with ribonucleoside triphosphates. The minor promoter sites have been located on the T7 genome and correspond to a set of promoter sites identified first by Minkley & Pribnow (1973) in studies of dinucleotideprimed initiation.Comparative studies of the properties of E. coli RNA polymerase bound to major (A1) and minor (C and E) promoters show that the stability of minor promoter complexes to dissociation and to attack by inhibitors can be comparable to those formed at the major promoter site. The transition temperatures for the several promoters are also quite close; thus, RNA polymerase forms “open” promoter complexes at both major and minor promoter sites at 37°C. Therefore, the preferential binding of RNA polymerase to the A1 promoter is not due to a greater affinity for that region, nor to inability of the enzyme to form open promoter complexes at the minor promoter sites. Studies of the kinetics of site selection confirm that RNA polymerase holoenzyme forms rapidly starting promoter complexes much more rapidly at promoter A1 than at promoters C and E. However, evidence is found for rapid formation of a complex at promoter C which cannot initiate an RNA chain rapidly. It is suggested that there is rapid formation of an initial “closed” complex at both major and minor promoters, and that discrimination is based on the rate of the melting reaction to form an open complex.
Article
The promoter for the major coat protein gene of bacteriophage fd contains a unique sequence, TATAAT, in the non-transcribed region corresponding to the Pribnow box. A R.Hha I cleavage site which destroys promoter function is located five base pairs upstream from the TATAAT sequence(fifteen base pairs upstream from the RNA initiation site). The promoter was cleaved into two fragments by R.Hha I and each promoter fragment was joined to DNA fragments derived from other regions. Ligation of the TATAAT-containing fragment to any of the DNA fragments examined resulted in recovery of promoter function. The results suggest for this type of promoter that no unique sequence is necessary upstream from the R.Hha I cleavage site although a contiguous DNA chain must be present in this area.
Article
The initiation point for transcription from the Escherichia coli RNA polymerase E promoter on bacteriophage T7 has been determined to be at 36 835 base pairs (92.22 T7 units) from the left end of T7. The location was determined by RNA fingerprinting of a runoff transcription product. Kinetics of association for the E and the T7 A3 promoters were measured by using the abortive initiation assay for approach to steady-state turnover. The kinetic association constant, ka (=KBk2), for E was found to be over 10-fold slower than ka for A3. For the E promoter, ka = 1.2 X 10(6) M-1 s-1. For A3, we report ka greater than or equal to 4 X 10(7) M-1 s-1. This difference is due mostly to a 10-fold difference in the initial equilibrium constant, KB, for formation of the initial polymerase-promoter complex. The rate of isomerization, k2, of the initial complex to the open polymerase-promoter complex for the E promoter was only 2-fold slower than k2 for the A3 promoter. Various numerical methods for calculation of the kinetic parameters are discussed and compared. We argue that a nonlinear analysis provides the most reliable means of data analysis.
Article
A method for visualizing in vitro synthesized RNA in extended form still attached to the DNA template is described. Bacteriophage T4 gene 32 product is attached to the RNA and after fixation with glutaraldehyde the transcription complexes are prepared for electron microscopy by the Kleinschmidt technique. Secondary structure caused by partial complementarity of the RNA can be stretched out by the attachment of bacteriophage T4 gene 32 protein as is shown in the case of bacteriophage R17 RNA. The possibilities of the method are exemplified using T7 as a template for Escherichia coli RNA polymerase. Data for the position of the promoter and the extent of transcription in the region of early T7 messenger RNA are confirmed. In addition, we have demonstrated the presence of a weak promotor at the right-hand end of the DNA; as in the early region, the direction of RNA synthesis is from left to right. Using SV40 viral DNA as a template, up to six RNA chains can be synthesized in the presence of rifampicin. After synthesis for 15 minutes at 37 °C, the length of the RNA chains shows that the polymerase has transcribed at least four times around the DNA circle.
Article
The isolation and genetic characterization of a number of mutations that are located in the promoter region of the lac† operon are described. These mutations have reduced levels of lac operon expression in a wild, type (crp+cya+) genetic background. Three of the mutations also have lower levels of lac operon expression than lacP+ in a crp−cya− genetic background, that is in the absence of the catabolite activator protein and 3′,5′-adenosine cyclic monophosphate. These three mutations are located nearest to the lac operator. They define a second essential site in the promoter region.
Article
The constitutive low-efficiency promoter site (P2) near the middle of the tryptophan operon of Escherichia coli has been mapped by analysis of short deletions internal to the trp operon. Comparison of deletions which remove this internal promoter with those which retain it show that P2 is located within trpD, the region coding for phosphoribosyl anthranilate transferase. P2 maps near the operator-distal end of trpD, on the operator-proximal side of two trpD point mutants. Comparisons of strains with and without the P2 site indicate that initiations at this promoter are responsible for synthesis of 80% of the trpC, trpB and trp A polypeptides present in repressed cells.
Article
A new picture of the in vitro transcription of the early region of bacteriophage T7 is presented. Dinucleotides were used to stimulate transcription by RNA polymerase from selected initiation sites on T7 DNA. Five initiation sites (three of which are very close together in the early promoter region) and five termination sites have been mapped relative to deletions in the early region. Over 20 r-strand specific RNA products arising from these sites have been characterized, most of which result from readthrough. Our results provide a strong correlation between in vivo and in vitro transcription of T7 by RNA polymerase.An additional initiation site outside the early region gives rise to a small l-strand specific RNA product.The dinucleotides have enabled us to determine whether ATP or GTP is used to initiate RNA chains at each initiation site. Since the dinucleotide primers apparently base-pair with the template at initiation sites, we have predicted short initiation site sequences, including the first few bases of the RNA chains.
Article
The patterns of (1) leftward transcription from the repressed lambda prophage and (2) post-induction changes in the initiation of RNA synthesis within the immunity-ori region (which contains several regulatory elements including the repressor gene) were studied in detail. In the noninduced prophage about 80% of the leftward transcription originates from within the immunity region (cI-rex mRNA), 2% is from the ori segment (oop RNA) and the remainder is evenly distributed between the int and b2 regions (Fig.1). The s c startpoint for the 1880 nucleotide-long cI-rex transcript, which codes for the repressor and the rex product, in 325 nucleotides from the right imm434 endpoint (Figs. 2 and 3). Upon induction of Tof+ lysogens, the cI-rex transcription is rapidly turned off. After a brief lag, a 600 nucleotide-long transcript, denoted lit, appears in the left part of gene rex. The lit and cI-rex transcripts both terminate at the same t i site. No RNA synthesis is detected in the 400-nucleotide segment between the left imm434 endpoint and the t i terminator. This DNA segment contains the p L-o L promoter-operator region for the major leftward transcription. The increase in lit transcription parallels the increase in synthesis of oop RNA, as if both transcripts originated from a common promoter or were positively regulated by a common factor at their promoters. The oop startpoint s o is located at least 2000 nucleotides upstream from the lit startpoint s i. The synthesis of the oop and lit RNAs is coordinately stimulated up to 100-fold by host and phage DNA replication factors. The short 4S oop RNA is thought to prime leftward DNA replication initiated at the ori site.
Article
The nucleotide sequence of the DNA of bacteriophage λ has been determined using the dideoxy chain termination method in conjunction with random cloning in M13 vectors. Various methods were studied for sequencing specific regions to complete the sequence, but all were much slower than the random approach. The DNA in its circular form contains 48,502 base-pairs. Open reading frames were identified and, where possible, ascribed to genes by comparing with the previously determined genetic map. The reading frames for 46 genes were clearly identified, though in about 20 the position of the protein initiation site could not be rigorously established. Probable positions for the kil, cIII and lom genes are suggested but remain uncertain. There are about 20 other unidentified reading frames that may code for proteins.The genome is fairly compact with comparatively little non-coding DNA. In many cases the translation terminators and initiators overlap, particularly in the sequence A-T-G-A where the TGA terminates one gene and the ATG initiates the next. Such structures seem to be characterized by a purine-rich sequence, rather than by a specific “Shine and Dalgarno” sequence, before the initiator. In the whole of the left arm the codon CTA, which is normally read by a minor leucine tRNA, is absent. The distribution of other rare codons in the genes of the left arm suggests that they may have a controlling function on the relative amounts of the proteins produced.
Article
We present and review experiments that identify points of close approach of the RNA polymerase to two promoters, lac UV5 and T7 A3. We identify the contacts to the phosphates along the DNA backbone, to the N7s of guanines in the major groove and the N3s of adenines in the minor groove, and to the methyl groups of thymines. These contacts to the two promoters are strikingly homologous in space, as shown on three-dimensional models, and identify major regions of interactions lying on one side of the DNA molecule (at -35 and -16), as well as further areas extending through the Pribnow box. Both promoters are unwound similarly by the polymerase, across a region of about twelve bases extending from the middle of the Pribnow box to just beyond the RNA start site. We discuss the areas of interaction in the context of promoter homologies and promoter mutations. The disposition of the contacts in space suggests a model for the pathway along which the RNA polymerase binds to promoters.
Article
The nucleotide sequence of the lacZ gene coding for beta-galactosidase (EC 3.2.1.23) in Escherichia coli has been determined. Beta-Galactosidase is predicted to consist of 1023 residues, resulting in a protein with a mol. wt. of 116 353 per subunit. The protein sequence originally determined by Fowler and Zabin was shown to be essentially correct and in an Appendix these authors comment on the discrepancies.
Article
The promoter-cloning plasmid pBRH4 (a derivative of pBR322 with a partially deleted promoter of the tet gene) is shown to contain a sequence which is located near the EcoRI site and can operate as an effective Pribnow box, but is not the remainder of the deletion-inactivated tet promoter of pBR322. If there is a sequence homologous to the '-35' promoter region at the border of the DNA fragment inserted at the EcoRI site, then a compound promoter arises and activates the tet gene. Point mutations in the nonfunctional--35 region of pBRH4 also activate the cryptic Pribnow box. Several compound promoters were obtained through deleting small portions of DNA around the HindIII site of pBR322; the deletions moved various sequences that could operate as Pribnow boxes towards the -35 region of the tet promoter.
Article
This paper describes computer methods for locating signals in nucleic acid sequences. The signals include ribosome binding sites, promoter sequences and splice junctions. The methods are of use both to those trying to interpret the function of newly determined sequences and to those studying the molecular mechanisms involved in the recognition of these special signal sequences.
Article
RNA polymerase binds very tightly at a site called Brex in the lambda immunity region, to the left of the rex gene and about 600 nucleotides to the right of PL. The complex formed is resistant to 1 M NaCl in the absence of nucleotide triphosphate. While in vitro little or no transcription is observed from Brex, in vivo, when inserted in a plasmid vector which allows detection of its activity, it acts as an efficient promoter. We have mapped the site protected by RNA polymerase against DNase and determined its sequence which is abnormal compared that of an average promoter.
Article
Although little transcription initiation is observed in vivo in the central region of coliphage λ, at least 11 short or long RNAs are initiated in vitro with pppA in the b2 segment, in addition to several pppG starts. Employing a [γ-32P]ATP label, digestion with various RNases, and gel electrophoresis, we observed 7 oligonucleotides originating between the lac5 and b511 endpoints, and we studied two of them in detail, both initiated at the leftward bL promotor. One of these is a decanucleotide (pppAUAAAAUAAU) that is spontaneously terminated in the absence of the rho factor, preferentially at elevated ionic strength. The other is a long transcript, initiated at the same startpoint and yielding pppAUAAAAUAAUACCG upon RNase T1, digestion. The in vitro strength of the bL promoter is intermediate between that of the major early promoters of lambda, L and R. The sequence of the 27 to 42 by region of bL (counted from bL = +1) contains almost 70% G + C, which is unique for λ and other known promoters.
Article
We proposed a simple formula to assess the statistical significance of homologous segments found in comparison of two nucleic acid sequences (Goad and Kanehisa, Nucleic Acids Res. 10, 247–263, 1982). This paper clarifies the basic assumptions of the formula and its reliability is examined by Monte Carlo calculations. The results were satisfactory for random sequences. The formula is a useful measure for screening potentially interesting homologies and it can be implemented in any search algorithms. Examples are given for the screening procedure in the graphic display version of the Goad-Kanehisa algorithm.
Article
We present an algorithm--a generalization of the Needleman-Wunsch-Sellers algorithm--which finds within longer sequences all subsequences that resemble one another locally. The probability that so close a resemblance would occur by chance alone is calculated and used to classify these local homologies according to statistical significance. Repeats and inverted repeats may also be found. Results for both random and biological nucleic acid sequences are presented. Fourteen complete genomes are analyzed for dyad symmetries.
Article
Previous genetic studies localized an internal low efficiency promoter, called p2, within the distal portion of the D gene of the tryptophan (trp) operon of Escherichia coli. The nucleotide sequence of trpD reveals a promoter-like region about 150 nucleotides from the 3′ end of the gene. We report here the results of transcription studies in vitro that confirm the assignment of the trp-p2 promoter by demonstrating the synthesis of a discrete RNA transcript and identifying the precise startpoint of transcription. To characterize the promoter further, we cloned a fragment of DNA containing this region into a vector such that it would direct the expression of the galactokinase gene. Under these conditions, trp-p2 functions in vivo at a level of about 15% of the primary trp promoter. A comparison of the nueleotide sequence to a “consensus” promoter sequence of E. coli reveals that its efficiency is probably limited by the constraints imposed by the coincident amino acid sequence of trpD. We discuss circumstantial evidence that trp-p2 is not accidental, and may provide a bypass function advantageous to the cell under conditions of severe nutritional deprivation.
Article
Specific fragments of bacteriophage T7 DNA that account for about 99% of the total molecule have been cloned in the plasmid pBR322. This set of plasmids was used to map individual point mutations of T7, by measuring recombination between T7 mutants and cloned fragments of wild-type T7 DNA. Cloned fragments that complement mutants defective in one or more of genes 2, 3.5, 8, 9, 10, 11, 13, 14 and 18 were also obtained. All but one of the plasmids that provide T7 functions carry a promoter for T7 RNA polymerase, a feature that is probably needed for efficient expression from the plasmid during infection. However, the promoter need not be immediately ahead of the gene; the polymerase can apparently transcribe around the entire plasmid DNA before transcribing the T7 gene. The major protein of the T7 phage head is among those that can be provided from a plasmid, indicating that substantial amounts of plasmid-specified proteins can be made. Using a combination of nucleotide sequence and cloning information the locations of 41 known or potential genes in T7 DNA have now been identified, at least 34 of which are known to specify a protein. T7 genes appear to be closely packed but essentially non-overlapping. The only places left in T7 DNA where undiscovered T7 genes are likely to lie are between genes 6 and 8, and to one or both sides of gene 19. The physical and genetic locations of the promoters and termination site for T7 RNA polymerase have also been defined. Certain fragments of T7 DNA cannot be cloned intact, and the lethality of at least some such fragments appears to be due to weak promoters for Escherichia coli RNA polymerase (in the T7 DNA) linked to T7 genes that are lethal if expressed. Separating the promoter from the lethal gene allows the intact gene to be cloned, but only in the silent orientation, where the predominant transcription from promoters in the pBR322 DNA crosses the inserted T7 DNA in the opposite direction from transcription in wild-type T7 DNA.
  • R Staden
Staden, R. (1984) Nucleic Acids Res. 12, 505-519.
  • S Stahl
  • J Chamberlin
Stahl, S,J. & Chamberlin, M.J. (1977) J. Mol. Biol. 77, 577-601.
  • S Hayes
  • W Szybalski
Hayes, S. & Szybalski, W. (1973) Mol. Gen. Genet. 126, 275-290.
  • M Kanehisa
Kanehisa, M. (1984) Nucleic Acids Res. 12, 203-213.
  • E C Rosenvold
  • E Calva
  • Rr Burgess
  • W Szybalski
Rosenvold, E.C., Calva, E., Burgess, RR. & Szybalski, W. (1980) Virology 107, 476-487.
  • J J Dunn
  • F W Studier
Dunn, J.J. & Studier, F.W. (1983) J. Mol. Biol. 166, 477-535.
  • J G Reich
  • H Drabsch
  • A Deumler
Reich, J.G., Drabsch, H. & Deumler, A. (1984) Nucleic Acids Res. 12, 5529-5543.
  • T.-S Hsieh
  • J C Wang
Hsieh, T.-S. & Wang, J.C. (1976) Biochemistry 15, 5776-5783.
  • H Delius
  • H Westphal
  • A Axelrod
Delius, H., Westphal, H. & Axelrod, A. (1973) J. Mol. Biol. 74, 677-687.
  • E G Minkley
  • D Pribnow
Minkley, E.G. & Pribnow, D. (1973) J. Mol. Biol. 77, 255-277.
  • R Harr
  • M Haggstrom
  • P Gustaffson
Harr, R., Haggstrom, M. & Gustaffson, P. (1983) Nucleic Acids Res. 11, 2943-2957.
  • F W Studier
  • A H Rosenberg
Studier, F.W. & Rosenberg, A.H. (1981) J. Mol. Biol. 153, 503-525.
  • V Grisola
  • A Riccio
  • C B Bruni
Grisola, V., Riccio, A. & Bruni, C.B. (1983) J. Bacteriol. 155, 1288-1296. 34. Hopkins, J.D. (1974) J. Mol. Biol. 87, 715-724.
  • P Botchan
Botchan, P. (1976) J. Mol. Biol. 105, 161-176.
  • F Sanger
  • A R Coulson
  • G F Hong
  • D F Hill
  • G B Petersen
Sanger, F., Coulson, A.R., Hong, G.F., Hill, D.F. & Petersen, G.B. (1982) J. Mol. Biol. 162, 729-773.
  • J Landsmann
  • M Kroger
  • G Hobom
Landsmann, J., Kroger, M. & Hobom, G. (1982) Gene 20, 11-24.