Table 1 - uploaded by Jian-Yu Shi
Content may be subject to copyright.
Comparing local model with the models having Spy and Purified Super-target strategies respectively

Comparing local model with the models having Spy and Purified Super-target strategies respectively

Source publication
Article
Full-text available
Background There has been paid more and more attention to supervised classification models in the area of predicting drug-target interactions (DTIs). However, in terms of classification, unavoidable missing DTIs in data would cause three issues which have not yet been addressed appropriately by former approaches. Directly labeled as negatives (non-...

Context in source publication

Context 1
... validate the effectiveness of two proposed strategies and their combination, we first run the ordinary local model (RLS); then run the model incorporated with Spy strategy alone (RLSm_spy); after that, run the model extended by only Super-target strategy (RLSm_super); last, combined two strategies and run the model again (RLSm_comb). The performance was measured by both AUC and Coverage (Table 1). Compared with RLS, RLSm_spy is better because of the less confusing boundary generated by positives and reliable-negatives at the level of targets. ...

Similar publications

Article
Full-text available
Calibration models used in analytical chemistry must be previously validated to achieve their aim. Generally, this is achieved by means of regression analysis. This study proposes strategies of internal and external validation of simple linear calibration models (LCM) to determine total petroleum hydrocarbons (TPH) by means of ultraviolet visible s...

Citations

... Similar to the idea of interaction recovery, Bi-Clustering Trees with output space Reconstruction (BICTR) [18] utilizes NRLMF to restore the training interactions on which an ensemble of Bi-clustering trees [42] is built. In [53,54], traditional PU learning algorithms, such as Spy and Rocchio, are employed to extract reliable non-interacting drug-target pairs. Based on a more informative PU learning assumption that interactions are not missing at random, a probabilistic model without bias to labeled data is proposed in [55]. ...
Article
Full-text available
Predicting drug-target interactions (DTI) via reliable computational methods is an effective and efficient way to mitigate the enormous costs and time of the drug discovery process. Structure-based drug similarities and sequence-based target protein similarities are the commonly used information for DTI prediction. Among numerous computational methods, neighborhood-based chemogenomic approaches that leverage drug and target similarities to perform predictions directly are simple but promising ones. However, existing similarity-based methods need to be re-trained to predict interactions for any new drugs or targets and cannot directly perform predictions for both new drugs, new targets, and new drug-target pairs. Furthermore, a large amount of missing (undetected) interactions in current DTI datasets hinders most DTI prediction methods. To address these issues, we propose a new method denoted as Weighted k-Nearest Neighbor with Interaction Recovery (WkNNIR). Not only can WkNNIR estimate interactions of any new drugs and/or new targets without any need of re-training, but it can also recover missing interactions (false negatives). In addition, WkNNIR exploits local imbalance to promote the influence of more reliable similarities on the interaction recovery and prediction processes. We also propose a series of ensemble methods that employ diverse sampling strategies and could be coupled with WkNNIR as well as any other DTI prediction method to improve performance. Experimental results over five benchmark datasets demonstrate the effectiveness of our approaches in predicting drug-target interactions. Lastly, we confirm the practical prediction ability of proposed methods to discover reliable interactions that were not reported in the original benchmark datasets.
... In such class of techniques, the emphasis is on different feature selection mechanisms [28,29]. Both semi-supervised [30,31] and supervised [32,33] classification based prediction approaches have been leveraged in Drug-Target interaction prediction. ...
Article
Full-text available
The identification of potential interactions between drugs and target proteins is crucial in pharmaceutical sciences. The experimental validation of interactions in genomic drug discovery is laborious and expensive; hence, there is a need for efficient and accurate in-silico techniques which can predict potential drug-target interactions to narrow down the search space for experimental verification. In this work, we propose a new framework, namely, Multi-Graph Regularized Nuclear Norm Minimization, which predicts the interactions between drugs and target proteins from three inputs: known drug-target interaction network, similarities over drugs and those over targets. The proposed method focuses on finding a low-rank interaction matrix that is structured by the proximities of drugs and targets encoded by graphs. Previous works on Drug Target Interaction (DTI) prediction have shown that incorporating drug and target similarities helps in learning the data manifold better by preserving the local geometries of the original data. But, there is no clear consensus on which kind and what combination of similarities would best assist the prediction task. Hence, we propose to use various multiple drug-drug similarities and target-target similarities as multiple graph Laplacian (over drugs/targets) regularization terms to capture the proximities exhaustively. Extensive cross-validation experiments on four benchmark datasets using standard evaluation metrics (AUPR and AUC) show that the proposed algorithm improves the predictive performance and outperforms recent state-of-the-art computational methods by a large margin. Software is publicly available at https://github.com/aanchalMongia/MGRNNMforDTI.
... In fact, many protein residues owe various PTM sites, and there are almost various modification types in the same residue [9][10][11][12][13][14]. More than 20 type of PTM s have been characterized, such as lysine acetylation, glycation and methylation [15][16][17][18]. ...
Article
Full-text available
Lysine Malonylation (Kmal) is a newly discovered protein post-translational modifications (PTMs) type, which plays an important role in many biological processes. Therefore, identifying and understanding Kmal sites is very critical in the studies of biology and diseases. The typical methods are time-wasting and expensive. Nowadays, many researchers have proposed machine learning (ML) methods to deal with PTMs’s identification issue. Especially, some deep learning (DL) methods are also utilized in this field. In this work, we proposed K_net, which employed Convolutional Neural Network to identify the potential sites. Meanwhile, we proposed a new verification method Split to Equal Validation (SEV), which can well solve the impact of sample imbalance on prediction results. More Specifically, Acc, Sn, Sp, MCC and AUC values were adopted to evaluate the prediction performance of predictors. In total, CNNKmal achieved the better performance than other methods.
... Next, they applied ML-kNN [45] on both matrices separately and integrated the individual prediction scores to yield the final DTI predictions. The aforementioned method was also complemented by a strategy for finding nonreported interactions in the existing data in [46]. Furthermore, Liu et al. [25] proposed a strategy which focuses on the identification of highly negative samples prior to the application of a classifier. ...
Article
Identifying drug-target interactions is crucial for drug discovery. Despite modern technologies used in drug screening, experimental identification of drug-target interactions is an extremely demanding task. Predicting drug-target interactions in silico can thereby facilitate drug discovery as well as drug repositioning. Various machine learning models have been developed over the years to predict such interactions. Multi-output learning models in particular have drawn the attention of the scientific community due to their high predictive performance and computational efficiency. These models are based on the assumption that all the labels are correlated with each other. However, this assumption is too optimistic. Here, we address drug-target interaction prediction as a multi-label classification task that is combined with label partitioning. We show that building multi-output learning models over groups (clusters) of labels often leads to superior results. The performed experiments confirm the efficiency of the proposed framework.
... This internal database was later reused by another 15 predictors that we review [44, 45, 47-49, 51, 52, 54, 57, 61, 63, 65, 70, 71, 75]. It was also used by a different set of 18 methods which we did not include in our analysis because of the relatively low impact factor of the venues where they were published [185][186][187][188][189][190][191][192][193][194][195][196][197][198][199][200][201][202]. The frequent reuse of this database explains to some extent why this predictor enjoys high citation counts in Table 4. ...
Article
Therapeutic activity of a significant majority of drugs is determined by their interactions with proteins. Databases of drug-protein interactions (DPIs) primarily focus on the therapeutic protein targets while the knowledge of the off-targets is fragmented and partial. One way to bridge this knowledge gap is to employ computational methods to predict protein targets for a given drug molecule, or interacting drugs for given protein targets. We survey a comprehensive set of 35 methods that were published in high-impact venues and that predict DPIs based on similarity between drugs and similarity between protein targets. We analyze the internal databases of known PDIs that these methods utilize to compute similarities, and investigate how they are linked to the 12 publicly available source databases. We discuss contents, impact and relationships between these internal and source databases, and well as the timeline of their releases and publications. The 35 predictors exploit and often combine three types of similarities that consider drug structures, drug profiles, and target sequences. We review the predictive architectures of these methods, their impact, and we explain how their internal DPIs databases are linked to the source databases. We also include a detailed timeline of the development of these predictors and discuss the underlying limitations of the current resources and predictive tools. Finally, we provide several recommendations concerning the future development of the related databases and methods.
... To tackle the above limitations of traditional approaches, chemogenomic method [23] have been performed successfully to predict DTIs by combining the chemical structure of drug molecules and the sequence information of target proteins. This method can be classified as machine learning [24] (supervised methods [25], [26] and semi-supervised methods [27], [28]), network methods [29], [30] and matrix factorization theory [31]- [33]. The machine learning based techniques have developed predictive models based on discriminative features or similarity matrices between drug molecules and target proteins where experimentally validated pairs are used as standard datasets. ...
... Moreover, the proposed 'super-target' concept manages the huge volume of missing interactions between drugs and targets in the prediction which have great effect on the performance of the predictive model. Some methods [27], [28] further adopted the semi-supervised techniques to identify the potential noninteraction pairs from labeled and unlabeled drug-target interactions by investigating the coverage index and rank coherence. ...
Article
Full-text available
Identifying interaction between drug and protein is a crucial challenge in drug discovery, which can lead the researchers to develop novel drug compounds or new target proteins for the existing drugs. The determination of drug–target interactions (DTIs) is an extremely time-consuming, costly, and tedious task with wet-lab experiments. To date, multiple computational techniques have been presented to simplify the drug discovery process, but a huge number of interactions are still undiscovered. Furthermore, a class imbalance is a critical challenge regarding this experiment which can significantly degrade the classification accuracy that has not been effectively addressed yet. In this paper, we proposed a novel high-throughput computational model, called iDTi-CSsmoteB, for identification of DTIs based on drug chemical structures and protein sequences. More specifically, the protein sequence is extracted through position-specific scoring matrix (PSSM)-Bigram, amphiphilic pseudo amino acid composition (AM-PseAAC) and dipeptide PseAAC descriptors which represents evolutionary and sequence information. The drug chemical structure is represented as a molecular substructure fingerprint (MSF) which describes the existence of the functional fragments or groups. Finally, we used the over-sampling SMOTE technique to overcome the imbalance issue of the datasets and applied XGBoost algorithm as a classifier to predict DTIs. To evaluate the performance of iDTi-CSsmoteB, several experiments have been conducted on four benchmark datasets, namely, enzyme, ion channel, GPCR, and nuclear receptor based on fivefold cross validation. The experimental analysis exhibits that our model outperforms similar methods in terms of area under the ROC (auROC) curve. In addition, our achieved results indicate the effectiveness of the feature extraction techniques, balancing methods, and classifier for predicting the DTIs which can provide substance for new drug development. iDTi-CSsmoteB webserver is available online at http://idticssmoteb-uestc.me/ .
... For example, intestinal microbial disorders can cause intestinal inflammatory diseases , such as ulcerative colitis, CRC, atherosclerosis, diabetes and obesity. Accordingly, it is necessary to predict the microbial-disease association because this study not only improves the diagnosis and prognosis of human diseases, but also develops the new drugs (Yu et al., 2015(Yu et al., , 2016aShi et al., 2016;Su et al., 2018;Fan et al., 2019). However, few studies have investigated predictive analysis of the microbial-disease association. ...
Article
Full-text available
Microorganisms are ubiquitous and closely related to people’s daily lives. Since they were first discovered in the 19th century, researchers have shown great interest in microorganisms. People studied microorganisms through cultivation, but this method is expensive and time consuming. However, the cultivation method cannot keep a pace with the development of high-throughput sequencing technology. To deal with this problem, machine learning (ML) methods have been widely applied to the field of microbiology. Literature reviews have shown that ML can be used in many aspects of microbiology research, especially classification problems, and for exploring the interaction between microorganisms and the surrounding environment. In this study, we summarize the application of ML in microbiology.
... On the other side, we will focus on improving the features of drug pairs and the predictive model. In the feature improvement, being aware of the fact that there are missing entries of drug targets, drug-drug interactions, GO terms and side effects in the original database [4], we plan to leverage both predictive approaches (e.g. for DTI [47][48][49][50], DDI [51][52][53], GO annotation [54], side effects [55][56][57]) and web-servers (e.g. for DDI [38], ATC [39] and DTI [40][41][42][43][44]) to dig out more missing entries. Moreover, we shall integrate both pathway dynamics and drug response to model the mechanism of action for the identified synergistic drug pairs. ...
Article
Full-text available
The full paper can be found in https://www.sciencedirect.com/science/article/pii/S0169260718306047?via%3Dihub Background and Objective: Due to the synergistic effects of drugs, drug combination is one of the effective approaches for treating complex diseases. However, the identification of drug combinations by dose-response methods is still costly. It is promising to develop supervised learning-based approaches to predict potential drug combinations on a large scale. Nevertheless, these approaches have the inadequate utilization of heterogeneous features, which causes the loss of information useful to classification. Moreover, they have an intrinsic bias, because they assume unknown drug pairs as non-combinations, of which some could be real drug combinations in practice. Methods: To address above issues, this work first designs a two-layer multiple classifier system (TLMCS) to effectively integrate heterogeneous features involving anatomical therapeutic chemical codes of drugs, drug-drug interactions, drug-target interactions, gene ontology of drug targets, and side effects. To avoid the bias caused by labelling unknown samples as negative, it then utilizes the one-class support vector machines, (which requires no negative instance and only labels approved drug combinations as positive instances), as the member classifiers in TLMCS. Last, both a 10-fold cross validation (10-CV) and a novel prediction are performed to validate the performance of TLMCS. Results: The comparison with three state-of-the-art approaches under 10-CV exhibits the superiority of TLMCS, which achieves the area under the receiver operating characteristic curve = 0.824 and the area under the precision-recall curve = 0.372. Moreover, the experiment under the novel prediction demonstrates its ability, where 9 out of the top-20 predicted combinative drug pairs are validated by checking the published literature. Furthermore, for each of the newly-validated drug combinations, this work analyses the combining mode of the member drugs and investigates their relationship in terms of drug targeting pathways. Conclusions: The proposed TLMCS provides an effective framework to integrate those heterogeneous features and is trained by only positive samples such that the bias of taking unknown drug pairs as negative samples can be avoided. Furthermore, its results in the novel prediction reveal five types of drug combinations and three types of drug relationships in terms of pathways.
... In recent years, computational models have been extensively used for predicting bi-partite relationships (e.g. drug-target interactions [12][13][14][15], lncRNA-disease associations [16] and microbe-disease associations [17][18][19]). As an indispensable step to identify miRNA-target interactions, it is a common practice to develop computational prediction for refining the candidate list before further experimental validation [20,21]. ...
Article
Full-text available
Background Current knowledge and data on miRNA-lncRNA interactions is still limited and little effort has been made to predict target lncRNAs of miRNAs. Accumulating evidences suggest that the interaction patterns between lncRNAs and miRNAs are closely related to relative expression level, forming a titration mechanism. It could provide an effective approach for characteristic feature extraction. In addition, using the coding non-coding co-expression network and sequence data could also help to measure the similarities among miRNAs and lncRNAs. By mathematically analyzing these types of similarities, we come up with two findings that (i) lncRNAs/miRNAs tend to collaboratively interact with miRNAs/lncRNAs of similar expression profiles, and vice versa, and (ii) those miRNAs interacting with a cluster of common target genes tend to jointly target at the common lncRNAs. Methods In this work, we developed a novel group preference Bayesian collaborative filtering model called GBCF for picking up a top-k probability ranking list for an individual miRNA or lncRNA based on the known miRNA-lncRNA interaction network. Results To evaluate the effectiveness of GBCF, leave-one-out and k-fold cross validations as well as a series of comparison experiments were carried out. GBCF achieved the values of area under ROC curve of 0.9193, 0.8354+/− 0.0079, 0.8615+/− 0.0078, and 0.8928+/− 0.0082 based on leave-one-out, 2-fold, 5-fold, and 10-fold cross validations respectively, demonstrating its reliability and robustness. Conclusions GBCF could be used to select potential lncRNA targets of specific miRNAs and offer great insights for further researches on ceRNA regulation network. Electronic supplementary material The online version of this article (10.1186/s12920-018-0429-8) contains supplementary material, which is available to authorized users.
... K-fold Cross-validation (K-CV) is one of standard approaches to evaluate the performance of algorithms in machine learning. As former approaches mentioned [19,[22][23][24][25], K-CV should be elaborated to avoid over-optimistic results in the case of predicting potential DDIs for new drugs(having no known interaction). Thus, we design two K-CV schemes, CV1 and CV2, when assessing DDI prediction tasks, which are denoted as T 1 and T 2 respectively (see also Fig. 1). ...
Article
Full-text available
Background A significant number of adverse drug reactions is caused by unexpected Drug-drug interactions (DDIs). The identification of DDIs becomes crucial before the co-prescription of multiple drugs is made. Such a task in clinics or in drug discovery usually requires high costs and numerous limitations, while computational approaches are able to predict potential DDIs effectively by utilizing diverse drug attributes (e.g. side effects). Nevertheless, they’re incapable when required to predict enhancive and degressive DDIs, which change increasingly and decreasingly the pharmacological behavior of interacting drugs respectively. The pharmacological change of DDIs is one of the most important factors when making a multi-drug prescription. Results In this work, we design a Triple Matrix Factorization-based Unified Framework (TMFUF) to address the above issue. By leveraging a group of side effect entries of drugs, TMFUF achieves the inspiring result (AUC = 0.842 and AUPR = 0.526) in the case of conventional DDI prediction under the traditional screening task. In the comparison with two state-of-the-art approaches, TMFUF demonstrates it superiority by ~ 7% and ~ 20% improvement in terms of AUC and AUPR respectively. More importantly, TMFUF shows its ability in the comprehensive DDI prediction under different screening tasks. Finally, a utilization TMFUF reveals the significant pairs of side effects, which contribute to form enhancive and degressive DDIs, for further clinical validation. Conclusions The proposed TMFUF is first capable to predict both conventional binary DDIs and comprehensive DDIs such that it captures the pharmacological changes caused by DDIs. Furthermore, it provides a unified solution of DDI prediction for two screening scenarios, which involves newly given drugs having no prior interaction. Another advantage is its ability to indicate how significantly the pairs of drug features contribute to form DDIs.