An illustration of the concept of half-space depth for 1- and 2-dimensional datasets.

An illustration of the concept of half-space depth for 1- and 2-dimensional datasets.

Source publication
Article
Full-text available
Since proteins carry out their functions through interactions with other molecules, accurately identifying the protein-ligand binding site plays an important role in protein functional annotation and rational drug discovery. In the past two decades, a lot of algorithms were present to predict the protein-ligand binding site. In this paper, we intro...

Citations

... Based on the site map points allosteric grid is generated to dock the molecules. (Wang et al., 2013) ...
Preprint
Full-text available
Interaction of low-density lipoprotein receptors with proprotein convertase subtilisin/ kexin type 9 (PCSK9) plays a vital role in causing atherosclerosis. It is the hidden precursor of clinical myocardial infarction (MI), stroke, CVD and estimates 60% of deaths worldwide. The current need is to design small molecules to prevent the interaction between PCSK9 with LDL receptors. This study aims to evaluate the PCSK9 antagonistic effect of a derivative of Stevioside ( also referred as Methylidene tetracyclo derivative) and atorvastatin. Also, a comparative study was performed to analyze the binding interaction of molecules inside the active and allosteric sites of PCSK9. The RCSB downloaded protein 7S5H and above said ligands were optimized to the local minima energy level and docked inside the active and allosteric sites. The stability of non-bonded interaction of complex was analyzed using Desmond MD simulation studies. The results of docking showed that the Methylidene tetracyclo molecule possesses a two-fold higher affinity of -10.159 kcal/mol in the active site and -10.824 kcal/mol in the allosteric site. The Phe377 amino acid made the Methylidene tetracyclo molecule orient inside the active site. Nine H-bonds with 5 amino acids of allosteric site increase the binding affinity compared to Atorvastatin. The MD simulation studies exposed that the nonbonded interaction of Methylidene tetracyclo molecule was stable throughout 100ns. This confirms the Methylidene tetracyclo molecule will be the better hit as well as the lead molecule to inhibit PCSK9 protein.
... Another approach was proposed by Wang et al. [81] introducing the statistical depth function to identify negative samples for predicting binding site using sequence and structural information with SVM. In this study, the statistical depth functions were used to determine the depth of the residues and analyze the protein structure. ...
Article
Full-text available
New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein–ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein–ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein–ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein–ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein–ligand interactions.
... Nevertheless, predicting a stable 3D structure alone is not of much use in structural genomics until one can validate their function as active proteins. It is necessary to predict potential ligand-binding and active sites as these sites play an important role in functional annotations and interactions of proteins with other molecules (Wang et al. 2013). Being so divergent in folding, all structures, respective of their families have different active sites along with a different core motif. ...
Article
We report structural characterization of proteases mined in silico from solid tannery waste (STW) metagenome. The physico-chemical analysis revealed the molecular weight of selected query proteases in the range of 34–43 kDa except 52.46 kDa for Cp-6. Secondary structure analysis suggested the dominance of α-helices (26–51 %) followed by β-sheets (13–34 %). Conserved regions in the selected proteases were identified using multiple sequence alignment. Diversity analysis of the proteases was performed by aligning with their best hits and already characterized proteases of similar families. The 3D structures of 19 selected amino acid sequences were deduced using homology modeling and evaluated following Ramachandran plots (R-plots), stability of the 3D conformation, and overall quality factor (G factor). The R-plots of all structures had 96.4–98.9 % residues in the favoured region, whereas 0-0.8 % lied in the outlier region. Fifteen modeled protein structures passed the quality assessment criteria and were subsequently used for molecular docking. All models showed two domains except Cp-6, which had an extra domain. Ligand binding sites and active sites were identified using sequence and structural homologs of the respective protease molecules. Serine, lysine, and serine of different conserved motifs were specified as the active site residues in carboxypeptidases, whereas serine, aspartate, and histidine were specific to aminopeptidases Ap-1, Ap-2, and Ap-4. Further, Ap-8 and Ap-9 required Mn2+ for enzyme activity along with histidine and glutamate in their active sites. These 3D structures were used for molecular docking with their specific peptide ligands to obtain stable docked conformations. Successfully docked complexes indicated the catalytic activity of these enzymes in hydrolyzing the peptide ligand. The structural and functional insights gained from our study may help in the identification of novel industrially important proteases after validation through experimental studies.
... Fortunately, predicting protein-protein interaction sites using computational methods has become a hot topic with the development of machine learning algorithms [4][5][6][7][8]. Previous studies showed that support vector machine (SVM) and its improved methods can predict effectively protein interaction sites [9][10][11][12][13][14]. Computational algorithms such as random forests, KNN, and Naive Bayes Classifier have been also applied to the prediction of PPIs [15][16][17][18]. ...
Article
Full-text available
The study of protein-protein interaction is of great biological significance, and the prediction of protein-protein interaction sites can promote the understanding of cell biological activity and will be helpful for drug development. However, uneven distribution between interaction and non-interaction sites is common because only a small number of protein interactions have been confirmed by experimental techniques, which greatly affects the predictive capability of computational methods. In this work, two imbalanced data processing strategies based on XGBoost algorithm were proposed to re-balance the original dataset from inherent relationship between positive and negative samples for the prediction of protein-protein interaction sites. Herein, a feature extraction method was applied to represent the protein interaction sites based on evolutionary conservatism of proteins, and the influence of overlapping regions of positive and negative samples was considered in prediction performance. Our method showed good prediction performance, such as prediction accuracy of 0.807 and MCC of 0.614, on an original dataset with 10,455 surface residues but only 2297 interface residues. Experimental results demonstrated the effectiveness of our XGBoost-based method.
... In functional annotation, it is crucial to understand the protein-ligand interaction as it plays a vital role in drug discovery. 46 The binding site prediction identifies the relationship of the protein-ligand based interaction which is divided into two methods; geometry and energy-based method. 46 Binding site prediction software works to detect a site of which the site has the highest potential to induce the binding interaction with other molecules. ...
... 46 The binding site prediction identifies the relationship of the protein-ligand based interaction which is divided into two methods; geometry and energy-based method. 46 Binding site prediction software works to detect a site of which the site has the highest potential to induce the binding interaction with other molecules. Some predictors provide data such as the binding pocket. ...
Article
Full-text available
An increase in expansion of antibiotic-resistant bacterial pathogens alarms the world’s population and creating a wave of the antibiotic apocalypse. The inclination of the death rate due to these antibiotic-resistant superbugs signifies urgency towards a new drug discovery to combat against these bacterial pathogens. The last class of antibiotics developed leaves a huge gap in the antibiotic timeline as the antibiotic development progress failed to kill the bacteria. Current antibiotic targets the central dogma of the bacteria hence finding a new potential drug target could eliminate the superbugs. It is, therefore, crucial to understand the underlying mechanism to identify the root cause of the resistant characteristic by understanding the biological cellular processes. Hypothetical proteins are an uncharacterized protein that is not known for its function which could provide a deeper understanding of the metabolic pathway of the bacterial proteome. This paper will generally provide a guideline for non-bioinformatician to mine potential drug targets from hypothetical proteins of bacterial proteome using a fast and less-cost bioinformatics approach
... In functional annotation, it is crucial to understand the protein-ligand interaction as it plays a vital role in drug discovery. 46 The binding site prediction identifies the relationship of the protein-ligand based interaction which is divided into two methods; geometry and energy-based method. 46 Binding site prediction software works to detect a site of which the site has the highest potential to induce the binding interaction with other molecules. ...
... 46 The binding site prediction identifies the relationship of the protein-ligand based interaction which is divided into two methods; geometry and energy-based method. 46 Binding site prediction software works to detect a site of which the site has the highest potential to induce the binding interaction with other molecules. Some predictors provide data such as the binding pocket. ...
... These comparative observations from accessible and buried vertices of ligand binding cleft imply that DRG2 and DRG7 remain tightly packed within the interacting groove of NF-jb in comparison with hispolon. It has been previously reported that a sample lead compound is defined as negative if the average depth value of its binding cleft is less than 8 Å ( Wang et al., 2013). But in this study, calculating HBA, hydrogen bond acceptors; HBD, hydrogen bond donors; MolLogP, molecular octanol-water partition coefficient; MolLogS, molecular octanol-water partition solubility; MolPSA, molecular polar surface area. ...
Article
Full-text available
Hispolon is a polyphenolic compound derived from black hoof mushroom (Phellinus linteus) or shaggy bracket mushroom (Inonotus hispidus) which induces the inhibition of cancer-promoting nuclear factor-kappa beta (NF-κβ) complex. To develop more potent lead molecules with enhanced anticancer efficiency, the mechanism of hispolon-mediated nuclear factor-κβ inhibition has been investigated by molecular modelling and docking. Ten derivatives of hispolon (DRG1-10) have been developed by pharmacophore-based design with a view to enhance the anticancer efficacy. Hispolon and its derivatives were further screened for different pharmacological parameters like binding free energy, drug likeliness, absorption–digestion–metabolism–excretion (ADME), permeability, mutagenicity, toxicity and inhibitory concentration 50 (IC50) to find a potent lead molecule. Based on pharmacological validation, comparative molecular dynamics (MD) simulations have been performed for three lead molecules: Hispolon, DRG2 and DRG7 complexed with human NF-κβ up to 50 ns. By analysing different factors like root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent-accessible surface area (SASA) and principal component analysis (PCA), Gibb’s free energy plots DRG2 have more binding efficiency compared to hispolon and DRG7. In RMSD plot, hispolon-bound NF-κβ has the most deviation within a range between 0.125 and 0.45 nm, and DRG2-bound complex showed the range between 0.125 and 0.25 nm. The residues of NF-κβ responsible for hydrophobic interactions with ligand, e.g. Met469, Leu522 and Cys533, have the lowest fluctuation values in DRG2-bound complex. The average Rg fluctuation for DRG2-bound NF-κβ has been recorded under 2.025 nm for most of the simulation time which is much less compared to hispolon and DRG7. Gibb’s free energy plots also define the highest stability of DRG2-bound NF-κβ. Communicated by Ramaswamy H. Sarma
... The most common are classified into two groups: template-based and pocket-based methods. Some less common structure-based methods that have been reported include a de-solvation based free energy model [8], solvent mapping [9], molecular docking [10,11], machine learning [12,13] and molecular dynamics (MD) [14,15] methods. ...
Article
Full-text available
In the field of medicinal chemistry there is increasing focus on identifying key proteins whose biochemical functions can firmly be linked to serious diseases. Such proteins become targets for drug or inhibitor molecules that could treat or halt the disease through therapeutic action or by blocking the protein function respectively. The protein must be targeted at the relevant biologically active site for drug or inhibitor binding to be effective. As insufficient experimental data is available to confirm the biologically active binding site for novel protein targets, researchers often rely on computational prediction methods to identify binding sites. Presented herein is a short review on structure-based computational methods that (i) predict putative binding sites and (ii) assess the druggability of predicted binding sites on protein targets. This review briefly covers the principles upon which these methods are based, where they can be accessed and their reliability in identifying the correct binding site on a protein target. Based on this review, we believe that these methods are useful in predicting putative binding sites, but as they do not account for the dynamic nature of protein–ligand binding interactions, they cannot definitively identify the correct site from a ranked list of putative sites. To overcome this shortcoming, we strongly recommend using molecular docking to predict the most likely protein–ligand binding site(s) and mode(s), followed by molecular dynamics simulations and binding thermodynamics calculations to validate the docking results. This protocol provides a valuable platform for experimental and computational efforts to design novel drugs and inhibitors that target disease-related proteins. Graphical Abstract Open image in new window
... We employed protein lengths as the only global feature in our feature vector. 48 All the above seven feature groups for a query residue along with its neighboring residues within a sliding window were examined for their usefulness for carbohydrate-binding prediction. Only some of these features will be selected for the final model as described below. ...
Article
Full-text available
Carbohydrate-binding proteins play significant roles in many diseases including cancer. How and where these proteins interact with carbohydrates is of fundamental importance and practical interest. Experimental studies of binding mechanisms are costly and labour intensive because of low binding affinity between proteins and carbohydrates. As a result, developing an effective computational method becomes increasingly important. Here, we established a machine-learning-based method (called Sequence-based Prediction of Residue-level INTeraction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins by using Support Vector Machines (SVM). We found that integrating evolution-derived sequence profiles with additional information of sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, predictive method (area under receiver operating characteristic curve, AUC=0.78 and 0.77, and Mathews' correlation coefficient of 0.34 and 0.28 for ten-fold cross validation and independent test, respectively without balancing binding and non-binding residues). The quality of the method is further demonstrated by statistically significant more binding residues predicted for carbohydrate-binding proteins than presumptive non-binding proteins in the human proteome and by the bias of rare alleles toward more non-synonymous mutations in carbohydrate-binding sites. SPRINT-CBH is available as an online server at: http://sparks-lab.org/.
... where (plane Resi ) is the number of residues in the half space which is divided by the plane through that i-th residue, and N is the total number of residues in the protein. The use of residue HSD is motivated by Shen [47,48]. Convex Hull. ...
Article
Full-text available
The prediction of conformational b-cell epitopes plays an important role in immunoinformatics. Several computational methods are proposed on the basis of discrimination determined by the solvent-accessible surface between epitopes and non-epitopes, but the performance of existing methods is far from satisfying. In this paper, depth functions and the k-th surface convex hull are used to analyze epitopes and exposed non-epitopes. On each layer of the protein, we compute relative solvent accessibility and four different types of depth functions, i.e., Chakravarty depth, DPX, half-sphere exposure and half space depth, to analyze the location of epitopes on different layers of the proteins. We found that conformational b-cell epitopes are rich in charged residues Asp, Glu, Lys, Arg, His; aliphatic residues Gly, Pro; non-charged residues Asn, Gln; and aromatic residue Tyr. Conformational b-cell epitopes are rich in coils. Conservation of epitopes is not significantly lower than that of exposed non-epitopes. The average depths (obtained by four methods) for epitopes are significantly lower than that of non-epitopes on the surface using the Wilcoxon rank sum test. Epitopes are more likely to be located in the outer layer of the convex hull of a protein. On the benchmark dataset, the cumulate 10th convex hull covers 84.6% of exposed residues on the protein surface area, and nearly 95% of epitope sites. These findings may be helpful in building a predictor for epitopes.