Figure 4 - uploaded by Gajendra Pal Singh Raghava
Content may be subject to copyright.
Screen shot of beta turn and beta hairpin annotation module of StarPDB, checkboxes with options are shown in left column. Query sequence, consensus beta turn/beta hairpin and beta turn/beta hairpin of similar protein chains are shown in right block

Screen shot of beta turn and beta hairpin annotation module of StarPDB, checkboxes with options are shown in left column. Query sequence, consensus beta turn/beta hairpin and beta turn/beta hairpin of similar protein chains are shown in right block

Source publication
Article
Full-text available
Background: In the era of next-generation sequencing where thousands of genomes have been already sequenced; size of protein databases is growing with exponential rate. Structural annotation of these proteins is one of the biggest challenges for the computational biologist. Although, it is easy to perform BLAST search against Protein Data Bank (PD...

Contexts in source publication

Context 1
... module allows users to predict/annotate β-turn and nine types β-turn in a query protein sequence. It is based on similarity of query protein with protein chains in BetaTurn sub-database. Regions of the query sequence annotated by multiple PDB chains have a higher probability of β-turn formation (Fig. 4). Thus users are advised to increase the number of similar PDB chains in the input form. The β-turn types option provides in-depth annotation based upon nine types of β-turn. A careful examination of results must be per- formed for better understanding of β-turn and types ...
Context 2
... structure module of StarPDB, checkboxes with options are shown in left column. Query sequence, consensus secondary structure and secondary structure of similar protein chains are shown in right block irrespective of the length of the loop region. The module annotates the complete beta-hairpin region including the beta sheet and loop regions (Fig. ...

Citations

... The query peptide is annotated based on its alignment score with known peptides. Our study implemented the BLAST-based technique blastp (BLAST+ 2.7.1), a peptide-peptide BLAST, to predict B-cell epitopes and non B-cell epitopes [42][43][44][45]. BLAST formatted database was constructed using the training dataset against which the query sequences (sequences in the test set) were hit at various e-values that range from 1e-6 to 1e+3. ...
... The query peptide is annotated based on its alignment score with known peptides. In our study, we implemented the BLAST-based technique blastp (BLAST+ 2.7.1), a peptide-peptide BLAST for the prediction of B-cell epitopes and non B-cell epitopes [42][43][44][45]. BLAST formatted database were constructed using the training dataset against which the query sequences (sequences in the test set) were hit at various e-values that ranges from 1e-6 to 1e+3. ...
Preprint
Full-text available
B-cell is an essential component of the immune system that plays a vital role in providing the immune response against any pathogenic infection by producing antibodies. Existing methods either predict linear or conformational B-cell epitopes in an antigen. In this study, a single method was developed for predicting both types (linear/conformational) of B-cell epitopes. The dataset used in this study contains 3875 B-cell epitopes and 3996 non-B-cell epitopes, where B-cell epitopes consist of both linear and conformational B-cell epitopes. Our primary analysis indicates that certain residues (like Asp, Glu, Lys, Asn) are more prominent in B-cell epitopes. We developed machine-learning based methods using different types of sequence composition and achieved the highest AUC of 0.80 using dipeptide composition. In addition, models were developed on selected features, but no further improvement was observed. Our similarity-based method implemented using BLAST shows a high probability of correct prediction with poor sensitivity. Finally, we came up with a hybrid model that combine alignment free (dipeptide based random forest model) and alignment-based (BLAST based similarity) model. Our hybrid model attained maximum AUC 0.83 with MCC 0.49 on the independent dataset. Our hybrid model performs better than existing methods on an independent dataset used in this study. All models trained and tested on 80% data using cross-validation technique and final model was evaluated on 20% data called independent or validation dataset. A webserver and standalone package named "CLBTope" has been developed for predicting, designing, and scanning B-cell epitopes in an antigen sequence (https://webs.iiitd.edu.in/raghava/clbtope/).
... The query peptide is annotated based on its alignment score with known peptides. Basic Local Alignment Search Tool (BLAST) is a very popular method for similarity search [35][36][37][38]. Currently, we have implemented BLAST-based search for identifying similarity of peptides/epitopes with T1DM associated and non-T1DM associated peptides. ...
Article
There are a number of antigens that induce autoimmune response against β-cells, leading to type 1 diabetes mellitus (T1DM). Recently, several antigen-specific immunotherapies have been developed to treat T1DM. Thus, identification of T1DM associated peptides with antigenic regions or epitopes is important for peptide based-therapeutics (e.g. immunotherapeutic). In this study, for the first time, an attempt has been made to develop a method for predicting, designing, and scanning of T1DM associated peptides with high precision. We analysed 815 T1DM associated peptides and observed that these peptides are not associated with a specific class of HLA alleles. Thus, HLA binder prediction methods are not suitable for predicting T1DM associated peptides. First, we developed a similarity/alignment based method using Basic Local Alignment Search Tool and achieved a high probability of correct hits with poor coverage. Second, we developed an alignment-free method using machine learning techniques and got a maximum AUROC of 0.89 using dipeptide composition. Finally, we developed a hybrid method that combines the strength of both alignment free and alignment-based methods and achieves maximum area under the receiver operating characteristic of 0.95 with Matthew’s correlation coefficient of 0.81 on an independent dataset. We developed a web server ‘DMPPred’ and stand-alone server for predicting, designing and scanning T1DM associated peptides (https://webs.iiitd.edu.in/raghava/dmppred/).
... The query peptide is annotated based on its alignment score with known peptides. One of the commonly used method for similarity search is BLAST [33][34][35][36]. Currently, we have implemented BLAST based search for the identification of similarity of peptides/epitopes with T1DM causing and non-T1DM causing peptides. ...
Preprint
There are a number of antigens that induce autoimmune response against β-cells, leading to Type 1 diabetes mellitus (T1DM). Recently several antigen-specific immunotherapies have been developed to treat T1DM. Thus identification of T1DM associated peptides with antigenic regions or epitopes is important for peptide based-therapeutics (e.g., immunotherapeutic). In this study, for the first time an attempt has been made to develop a method for predicting, designing and scanning of T1DM associated peptides with high precision. We analyzed 815 T1DM associated peptides and observed that these peptides are not associated with a specific class of HLA alleles. Thus, HLA binder prediction methods are not suitable for predicting T1DM associated peptides. Firstly, we developed a similarity/alignment based method using BLAST and achieved a high probability of correct hits with poor coverage. Secondly, we developed an alignment free method using machine learning techniques and got maximum AUROC 0.89 using dipeptide composition. Finally, we developed a hybrid method that combines the strength of both alignment free and alignment based methods and achieve maximum AUROC 0.95 with MCC 0.81 on independent dataset. We developed a webserver “DMPPred” and standalone server, for predicting, designing and scanning of T1DM associated peptides ( https://webs.iiitd.edu.in/raghava/dmppred/ ). Key Points Prediction of peptides responsible for inducing immune system against β-cells Compilation and analysis of Type 1 diabetes associated HLA binders BLAST based similarity search against Type 1diabetes associated peptides Alignment free method using machine learning techniques and composition A hybrid method using alignment free and alignment based approach Author’s Biography Nishant Kumar is currently working as Ph.D. in Computational biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India Sumeet Patiyal is currently working as Ph.D. in Computational biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India Shubham Choudhury is currently working as Ph.D. in Computational biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India Ritu Tomer is currently working as Ph.D. in Computational biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India Anjali Dhall is currently working as Ph.D. in Computational Biology from Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India. Gajendra P. S. Raghava is currently working as Professor and Head of Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India.
... BLAST is heavily used in literature to annotate protein sequences [24,[40][41][42]. We have used it to identify allergens based on the similarity of a protein with allergenic and non-allergenic sequences. ...
Article
AlgPred 2.0 is a web server developed for predicting allergenic proteins and allergenic regions in a protein. It is an updated version of AlgPred developed in 2006. The dataset used for training, testing and validation consists of 10 075 allergens and 10 075 non-allergens. In addition, 10 451 experimentally validated immunoglobulin E (IgE) epitopes were used to identify antigenic regions in a protein. All models were trained on 80% of data called training dataset, and the performance of models was evaluated using 5-fold cross-validation technique. The performance of the final model trained on the training dataset was evaluated on 20% of data called validation dataset; no two proteins in any two sets have more than 40% similarity. First, a Basic Local Alignment Search Tool (BLAST) search has been performed against the dataset, and allergens were predicted based on the level of similarity with known allergens. Second, IgE epitopes obtained from the IEDB database were searched in the dataset to predict allergens based on their presence in a protein. Third, motif-based approaches like multiple EM for motif elicitation/motif alignment and search tool have been used to predict allergens. Fourth, allergen prediction models have been developed using a wide range of machine learning techniques. Finally, the ensemble approach has been used for predicting allergenic protein by combining prediction scores of different approaches. Our best model achieved maximum performance in terms of area under receiver operating characteristic curve 0.98 with Matthew’s correlation coefficient 0.85 on the validation dataset. A web server AlgPred 2.0 has been developed that allows the prediction of allergens, mapping of IgE epitope, motif search and BLAST search (https://webs.iiitd.edu.in/raghava/algpred2/).
... BLAST is heavily used in literature to annotate protein sequences [24,[40][41][42]. We have used it to identify allergens based on the similarity of a protein with allergenic and non-allergenic sequences. ...
Article
Full-text available
AlgPred 2.0 is a web server developed for predicting allergenic proteins and allergenic regions in a protein. It is an updated version of AlgPred developed in 2006. The dataset used for training, testing and validation consists of 10 075 allergens and 10 075 non-allergens. In addition, 10 451 experimentally validated immunoglobulin E (IgE) epitopes were used to identify antigenic regions in a protein. All models were trained on 80% of data called training dataset, and the performance of models was evaluated using 5-fold cross-validation technique. The performance of the final model trained on the training dataset was evaluated on 20% of data called validation dataset; no two proteins in any two sets have more than 40% similarity. First, a Basic Local Alignment Search Tool (BLAST) search has been performed against the dataset, and allergens were predicted based on the level of similarity with known allergens. Second, IgE epitopes obtained from the IEDB database were searched in the dataset to predict allergens based on their presence in a protein. Third, motif-based approaches like multiple EM for motif elicitation/motif alignment and search tool have been used to predict allergens. Fourth, allergen prediction models have been developed using a wide range of machine learning techniques. Finally, the ensemble approach has been used for predicting allergenic protein by combining prediction scores of different approaches. Our best model achieved maximum performance in terms of area under receiver operating characteristic curve 0.98 with Matthew’s correlation coefficient 0.85 on the validation dataset. A web server AlgPred 2.0 has been developed that allows the prediction of allergens, mapping of IgE epitope, motif search and BLAST search (https://webs.iiitd.edu.in/raghava/algpred2/).
... Taxonomic annotations generated from this homology-based method were served to identify the phylogenetic affiliation of ARG carrying bacteria. BlastP [36] was used to search ORFs against SARG database with similarity cutoff of 80% and alignment length cutoff of 70% to identify ARGs from these metagenome-assemblies. Genus level carrier with two seasonal sequencing read partition results (normalized after mapping reads back to contigs) [37] were summarized in Figure 4 and supplementary file 5. ...
Article
As a symbol of the defense mechanisms that bacteria have evolved over time, the genes that make bacteria resist antibiotics are overwhelmingly present in the environment. Currently, bacterial antibiotic resistance genes (ARGs) in the air are a serious concern. Previous studies have identified bacterial communities and summarized putative routes of transmissions for some dominant hospital-associated pathogens from hospital indoor samples. However, little is known about the possible indoor air ARG transportation. In this study, we mainly surveyed air-conditioner air dust samples under different airflow conditions and analyzed these samples using a metagenomic-based method. The results show air dust samples exhibited a complex resistome, and the average concentration is 0.00042 copies/16S rRNA gene, which is comparable to some other environments. The hospital air-conditioners can form resistome over time and accumulate pathogens. In addition, our results indicate that the Outpatient hall is one of the main ARG transmission sources, which can distribute ARGs to other departments (explains >80% resistome). We believe that the management should focus on ARG carrier genera such as Staphylococcus, Micrococcus, Streptococcus, and Enterococcus in this hospital and our novel evidence-based network strategy proves that plasmid-mediated ARG transfer can occur frequently. Overall, these results provide insights into the characteristics of air dust resistome and possible route for how ARGs are spread in air.
... However, yteJ which in an operon with sppA encodes for a protein of unknown function and unknown structure. As a first step towards the understanding of this unknown protein, we analysed (i) its genomic context using bioinformatics tools, such as BLAST (Singh and Raghava 2016) and microbial genomic context viewer (Overmars et al. 2013), and (ii) its putative structure using structure prediction softwares such as Protter (Figure S 1 of appendix chapter) (Omasits et al. 2014). Using BLAST-P to align the sequence of the Bacillus subtilis sppA yteJ operon with different bacterial genomes of different bacterial species, the results showed that yteJ is often present following sppA. ...
Thesis
We have identified a membrane protein complex of Bacillus subtilis involving an unknown protein, YteJ, and SppA, a membrane protein first described as a signal peptide peptidase and later shown to be also involved in the resistance to antibacterial peptides of the lantibiotic family. Using deletion mutant strains, we showed that both proteins are involved in this resistance. In the ΔsppA strain, the ectopic overexpression of SppA not only restored the resistance, it also induced the formation of elongated cells, a phenotype suppressed by the simultaneous overexpression of YteJ. Furthermore, the expression of truncated versions of YteJ pinpointed the inhibitory role of a specific domain of YteJ. Finally, in vitro biochemical studies showed that SppA protease activity was strongly reduced by the presence of YteJ, supporting the hypothesis of an inhibition by YteJ. Our in vivo and in vitro studies showed that YteJ, via one of its domain, acts as a negative regulator of the protease activity of SppA in this complex. In conclusion, we have shown that SppA/YteJ complex is involved in lantibiotic resistance through the protease activity of SppA, which is regulated by YteJ.
... Moreover, several types of prediction algorithms, such as neural networks, support vector machines, quadratic discriminant functions and random forests, were used for the prediction of the  hairpin motifs. The most recent predictor, STARPDB-beta hairpin [172], is based on a simple alignment into structurally annotated proteins collected from PDB. ...
... Subsequent, method use a pattern dictionary developed from known -turn- structures [179,180], where predictions are made directly from pattern similarity [179] or using a classifier over pattern occurrences [180]. Table 2 summarizes 17 supersecondary structure prediction methods, including 7  hairpin predictors (in chronological order): method by de la Cruz et al. [73], BhairPred [74], and methods by Hu et al. [170], Zou et al. [169], Xia et al. [168], Jia et al. [167], and the STARPDBbeta hairpin method [172]; 7 coiled coil predictors: MultiCoil2 [181,182], MARCOIL [177], PCOILS [176], bCIPA [175], Paircoil2 [174], CCHMM_PROF [173], and SpiriCoil [70]; and 3 -turn- predictors: method by Dodd and Egan [178], GYM [179], and Xiong et al. [180]. Older coiled coil predictors were reviewed in [84]. ...
Chapter
Full-text available
Many new methods for the sequence-based prediction of the secondary and supersecondary structures have been developed over the last several years. These and older sequence-based predictors are widely applied for the characterization and prediction of protein structure and function. These efforts have produced countless accurate predictors, many of which rely on state-of-the-art machine learning models and evolutionary information generated from multiple sequence alignments. We describe and motivate both types of predictions. We introduce concepts related to the annotation and computational prediction of the three-state and eight-state secondary structure as well as several types of supersecondary structures, such as β hairpins, coiled coils, and α-turn-α motifs. We review 34 predictors focusing on recent tools and provide detailed information for a selected set of 14 secondary structure and 3 supersecondary structure predictors. We conclude with several practical notes for the end users of these predictive methods.
... TC6 genes were predicted using RAST 3 (Aziz et al., 2008;McNair et al., 2018). DNA sequences and protein sequences were scanned for homologs by using BLAST (Boratyn et al., 2013;Singh and Raghava, 2016). The software tRNAscan-SE 4 was used to predict tRNA genes (Lowe and Chan, 2016). ...
Article
Full-text available
Phages have attracted a renewed interest as alternative to chemical antibiotics. Although the number of phages is 10-fold higher than that of bacteria, the number of genomically characterized phages is far less than that of bacteria. In this study, phage TC6, a novel lytic virus of Pseudomonas aeruginosa, was isolated and characterized. TC6 consists of an icosahedral head with a diameter of approximately 54 nm and a short tail with a length of about 17 nm, which are characteristics of the family Podoviridae. TC6 can lyse 86 out of 233 clinically isolated P. aeruginosa strains, thus showing application potentials for phage therapy. The linear double-stranded genomic DNA of TC6 consisted of 49796 base pairs and was predicted to contain 71 protein-coding genes. A total of 11 TC6 structural proteins were identified by mass spectrometry. Comparative analysis revealed that the P. aeruginosa phages TC6, O4, PA11, and IME180 shared high similarity at DNA sequence and proteome levels, among which PA11 was the first phage discovered and published. Meanwhile, these phages contain 54 core genes and have very close phylogenetic relationships, which distinguish them from other known phage genera. We therefore proposed that these four phages can be classified as Pa11virus, comprising a new phage genus of Podoviridae that infects Pseudomonas spp. The results of this work promoted our understanding of phage biology, classification, and diversity.