Fig 2 - uploaded by Peteris Zikmanis
Content may be subject to copyright.
(A) The average propensities for amino acid polarity compared to the length of C-terminus of type I (Δ) and type III () secreted proteins. (B) The average propensities for β-sheets compared to the length of C-terminus of type I () and type III () secreted proteins.  

(A) The average propensities for amino acid polarity compared to the length of C-terminus of type I (Δ) and type III () secreted proteins. (B) The average propensities for β-sheets compared to the length of C-terminus of type I () and type III () secreted proteins.  

Source publication
Article
Full-text available
The amino acid composition of sequences and structural attributes (α-helices, β-sheets) of C-and N-terminal fragments (50 amino acids) were compared to annotated (SWISS-PROT/ TrEMBL) type I (20 sequences) and type III (22 sequences) secreted proteins of Gram-negative bacteria. The discriminant analysis together with the stepwise forward and backwar...

Contexts in source publication

Context 1
... the profiles of predicted physicochemical and structural properties (polarity, hydrophobicity, α-helices, β-sheets) for both terminal fragments appeared to be different in regard to the type III and type I secreted proteins. The profiles of mean propensities for polarity and β-sheets of C-terminal fragments display the most pronounced differences (Fig. 2 A, B) when compared to similar sites of distinctions for other property vectors just as significant for particular positions of amino acid residues at both C-and N-termini of sequences (data not shown). Noticeable differences between the means of amino acid frequencies ( Fig. 1 A, B) and structural characteristics ( Fig. 2 A, B) of proteins ...
Context 2
... pronounced differences (Fig. 2 A, B) when compared to similar sites of distinctions for other property vectors just as significant for particular positions of amino acid residues at both C-and N-termini of sequences (data not shown). Noticeable differences between the means of amino acid frequencies ( Fig. 1 A, B) and structural characteristics ( Fig. 2 A, B) of proteins suggest that both indices could be employed, at least in principle, to differentiate groups of secretion substrates from Gram -negative ...
Context 3
... is noteworthy that the frequencies of occurrence for N-and C-terminal amino acid residues as the predictor variables provide considerably better cross-validated discrimina- tion accuracy (88.1 -95.2 % ) as compared to the structural indices, such as α-helices for both termini or N-terminal β-sheets (Table 5). However, these substantially fall behind the indices achieved by the predictor variables for full sequences (Tables 2, 3, 5). ...

Similar publications

Conference Paper
Full-text available
Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in several scenarios including human disease and drug discovery. In this age of rapid and affordable biological sequencing, the number of sequences accumulating in databases is rising with an increasing rate. This presents many chall...
Article
Full-text available
Studies were carried out to identify and detect potentially toxic proteins of wheat. The gliadin fractions were subjected to chromatographic and spectroscopic analyses to develop the relevant discriminants. The spectral analysis showed that these proteins differ considerably in their tryptophan-to-tyrosine molar ratios. A standard curve was used. T...
Article
Full-text available
Understanding protein function is one of the keys to understanding life at the molecular level. It is also important in several scenarios including human disease and drug discovery. In this age of rapid and affordable biological sequencing, the number of sequences accumulating in databases is rising with an increasing rate. This presents many chall...

Citations

... This study follows the line of our previous findings ( Kampenusa and Zikmanis 2008;Andersone and Zikmanis 2007;Zikmanis et al. 2006) and fits within the general framework to specify the structural and functional features of the proteomes of sequenced bacterial genomes. Recently, some computational approaches have been proposed to identify T3SS-secreted proteins within the bacterial proteome ( Samudrala et al. 2009;Vinatzer et al. 2005) based on estimates of amino acid usage at N-termini of sequences, however representing a very narrow range of organisms. ...
Article
The combined set of codon usage frequencies (61 sense codons) from the 111 annotated sequences of leaderless secreted type I, type III, type IV, and type VI proteins from proteobacteria were subjected to the forward and backward selection to obtain a combination of most effective predictor variables for classification/prediction purposes. The group of 24 codon frequencies displayed a strong discriminatory power with an accuracy of 100% for originally grouped and 97.3 +/- 1.6% for cross-validated (LOOCV) cases and an acceptable error rate (0.062 +/- 0.012) in k-fold (k = 6) cross-validation (KCV). The summary frequencies of synonymous codons for ten amino acids as the alternative predictor variables revealed a comparable discriminatory power (92.8 +/- 2.5% for LOOCV), however at somewhat lower levels of prediction accuracy (0.106 +/- 0.015 of KCV). A number of significant (p < 0.001) differences were found among indices of codon usage and amino acid composition depending on a definite secretion type. About 60% of secretion substrates were characterized as apparently originated from horizontal gene transfer events or putative alien genes and found to be unequally allocated in respect of groups. The proposed prediction approaches could be used to specify secretome proteins from genomic sequences as well as to assess the compatibility between bacterial secretion pathways and secretion substrates.
... Recently we proposed definite amino acid (R, E, G, I, M, P, S, Y, V) frequencies [10] and the periodic patterns of aliphatic (A, L, I, V) and aromatic (H, F, W, Y) amino acids near the C-termini of sequences [11] and these variables were strong predictor variables to discriminate between annotated type I and type III proteins secreted form proteobacteria. On the other hand, several -10.2478/s11535-008-0026-5 ...
... On the other hand, the definite amino acid frequencies of occurrence, i.e., the "global sequence properties" [9] were found to keep their classification/prediction capabilities [10] also over the extended original data set (Table 1) containing type IV secreted proteins (data not shown). Thus, the extended set of predictor variables (20 common amino acids except W and Y /18 frequencies) instead of that previously proposed [10] for the resolution of type I and type III proteins (R, E, G, I, M, P, S, Y, V) exhibited a strong discrimination power with the average error rate 0.059±0.01 ...
... On the other hand, the definite amino acid frequencies of occurrence, i.e., the "global sequence properties" [9] were found to keep their classification/prediction capabilities [10] also over the extended original data set (Table 1) containing type IV secreted proteins (data not shown). Thus, the extended set of predictor variables (20 common amino acids except W and Y /18 frequencies) instead of that previously proposed [10] for the resolution of type I and type III proteins (R, E, G, I, M, P, S, Y, V) exhibited a strong discrimination power with the average error rate 0.059±0.01 over the k-fold cross-validation (KCV) of multiple protein groups. ...
Article
Full-text available
C- and N-terminal sequences (64 amino acid residues each) of 89 non-classically secreted type I, type III and type IV proteins (Swiss-Prot/TrEMBL) from proteobacteria were transformed into predicted secondary structures. Multivariate analysis of variance (MANOVA) confirmed the significance of location (C- or N-termini) and secretion type as essential factors in respect of quantitative representations of structured (a-helices, b-strands) and unstructured (coils) elements. The profiles of secondary structures were transcripted using unequal property values for helices, strands and coils and corresponding numerical vectors (independent variables) were subjected to multiple discriminant analysis with the types of secreted proteins as the dependent variables. The set of strong predictor variables (21 property values located at the region of 2–49 residues from the C-termini) was capable to classify all three types of non-classically secreted proteins with an accuracy of 93.3% for originally and 89.9% for cross-validated (leave-one-out procedure) grouped cases. The average error rate (0.137 ± 0.015) of k-fold (k = 3; 4; 6; 8; 10; 89) cross validation affirmed an acceptable prediction accuracy of defined discriminant functions with regard to the types of non-classically secreted proteins. The proposed prediction tool could be used to specify the secretome proteins from genomic sequences as well as to assess the compatibility between secretion pathways and secretion substrates of proteobacteria.
... Our recent studies [7] revealed definite amino acid (R, E, G, I, M, P, S, Y, V) frequencies of occurrence as the strong predictor variables to discriminate between the full sequences of annotated type III and type I secreted proteins of proteobacteria with an accuracy up to 100% in leave-one-out and test-retest procedures. ...
... The patterns of average intensities for corresponding periodicity profiles (Figure 3) support this view, however, only a minor part of intensities as selected variables were found to differ significantly by the Mann -Wittney U test (FT orders of n = 14 for aromatic amino acids and n = 5; 7; 18 for aliphatic ones, i.e., periodicities of 6 and 12.8; 9.14; 3.55 residues, respectively). Similar to previously reported frequencies of occurrence for amino acid residues [7], certain variables on the basis of C-terminal periodicities can therefore achieve necessary levels of statistical tolerance [16,18] and discrimination power ( Table 3, Table 4), with slight differences between the groups. These differences are possibly due to intrinsic links between the definite location of aromatic and aliphatic residues and structural determinants of differently secreted proteins [7]. ...
... Similar to previously reported frequencies of occurrence for amino acid residues [7], certain variables on the basis of C-terminal periodicities can therefore achieve necessary levels of statistical tolerance [16,18] and discrimination power ( Table 3, Table 4), with slight differences between the groups. These differences are possibly due to intrinsic links between the definite location of aromatic and aliphatic residues and structural determinants of differently secreted proteins [7]. ...
Article
Full-text available
The Fourier transform (FT) method was applied to specify the distribution of 14 predefined groups of amino acids (64 residues) at both termini of annotated type III and type I secreted proteins from proteobacteria. Type I proteins displayed a higher occurrence of significant periodicities at both C-and N-termini, indicating potent features to discriminate between secretion types, particularly by the use of variables selected from the full periodicity profiles at 19 orders of FT. The Fishers linear discriminant analysis, together with the stepwise selection of variables throughout equal pairs of combinations for all predefined groups of residues, revealed the C-terminal harmonics of aromatic (HFWY) and aliphatic (VLIA) residues as a set of strong predictor variables to classify both types of secreted proteins with an accuracy of 100% for original grouped cases and 96.4% for cross-validated grouped cases. The prediction accuracy of proposed discriminant function was estimated by repeated k-fold cross-validation procedures where the original data set was randomly divided into k subsets, with one of the k-subsets serving as the test set and the remaining data forming the training set. The average error rate computed across all k-trials and repeats did not exceed that of leave-one-out procedure. The proposed set of predictor variables could be used to assess the compatibility between secretion pathways and secretion substrates of proteobacteria by means of discriminant analysis.