Figure 1 - uploaded by David J Huggins
Content may be subject to copyright.
Chemical structures used in compound filtering Chemical structures of functional groups commonly used to remove compounds from consideration in HTS assays. The functional group name and SMILES/SMARTS string used in the filter are reported.

Chemical structures used in compound filtering Chemical structures of functional groups commonly used to remove compounds from consideration in HTS assays. The functional group name and SMILES/SMARTS string used in the filter are reported.

Source publication
Article
Full-text available
Traditionally a pursuit of large pharmaceutical companies, high-throughput screening assays are becoming increasingly common within academic and government laboratories. This shift has been instrumental in enabling projects that have not been commercially viable, such as chemical probe discovery and screening against high-risk targets. Once an assa...

Context in source publication

Context 1
... well as physicochemical properties, REOS filters remove molecules containing certain functional groups, as described by SMILES or SMARTS patterns(52). Some of these are shown in Figure 1. REOS filters flag compounds containing functional groups that may lead to false positives due to reactivity or assay interference, which have long been noted as a problem in HTS efforts(53). ...

Similar publications

Article
Full-text available
High-throughput screening (HTS) has been postulated in several quarters to be a contributory factor to the decline in productivity in the pharmaceutical industry. Moreover, it has been blamed for stifling the creativity that drug discovery demands. In this article, we aim to dispel these myths and present the case for the use of HTS as part of a pr...

Citations

... 6 The main point of calculating similarity measurements lies in the "molecular similarity principle": similar molecules have similar properties/activities. 7 This powerful idea is at the core of virtual screening, [8][9][10][11] hit selection, 12 QSAR/QSPR modeling, 13,14 many chemical space exploration methods, 15,16 activity landscape description, 17,18 diversity selection, 19 clustering, 20,21 and many more applications. ...
Article
Full-text available
The quantification of molecular similarity has been present since the beginning of cheminformatics. Although several similarity indices and molecular representations have been reported, all of them ultimately reduce to the calculation of molecular similarities of only two objects at a time. Hence, to obtain the average similarity of a set of molecules, all the pairwise comparisons need to be computed, which demands a quadratic scaling in the number of computational resources. Here we propose an exact alternative to this problem: iSIM (instant similarity). iSIM performs comparisons of multiple molecules at the same time and yields the same value as the average pairwise comparisons of molecules represented by binary fingerprints and real-value descriptors. In this work, we introduce the mathematical framework and several applications of iSIM in chemical sampling, visualization, diversity selection, and clustering.
... Physio-chemical property filters predominantly aim at addressing ADMET (absorption, distribution, metabolism, excretion, toxicity) issues that may arise in downstream drug-development process [29]. The knowledge-based approach of developing such filters is based on the fact that certain descriptors such as logP, molecular weight (MW), and number of hydrogen bond acceptors/donors have been correlated with oral bioavailability [14,17]. ...
Article
Full-text available
Efficient chemical library design for high-throughput virtual screening and drug design requires a pre-screening filter pipeline capable of labeling aggregators, pan-assay interference compounds (PAINS), and rapid elimination of swill (REOS); identifying or excluding covalent binders; flagging moieties with specific bio-evaluation data; and incorporating physicochemical and pharmacokinetic properties early in the design without compromising the diversity of chemical moieties present in the library. This adaptation of the chemical space results in greater enrichment of hit lists, identified compounds with greater potential for further optimization, and efficient use of computational time. A number of medicinal chemistry filters have been implemented in the Konstanz Information Miner (KNIME) software and analyzed their impact on testing representative libraries with chemoinformatic analysis. It was found that the analyzed filters can effectively tailor chemical libraries to a lead-like chemical space, identify protein–protein inhibitor-like compounds, prioritize oral bioavailability, identify drug-like compounds, and effectively label unwanted scaffolds or functional groups. However, one should be cautious in their application and carefully study the chemical space suitable for the target and general medicinal chemistry campaign, and review passed and labeled compounds before taking further in silico steps.
... Approximately 15 million compounds (14,955,127 compounds) from the ZINC database were passed through REOS and PAINS filters (available in Canvas, Schrodinger) to assess clean drug-like compounds. 18,19 The resulting compounds were further filtered using Lipinski's and Veber's rules. Specifically, to access compounds with lead-like properties, we kept the molecular weight within 150−450 g/ mol, and log P was set to ≤5.0. ...
Article
Full-text available
It is imperative to explore the gigantic available chemical space to identify new scaffolds for drug lead discovery. Identifying potent hits from virtual screening of large chemical databases is challenging and computationally demanding. Rather than the traditional two-dimensional (2D)/three-dimensional (3D) approaches on smaller chemical libraries of a few hundred thousand compounds, we screened a ZINC library of 15 million compounds using multiple computational methods. Here, we present the successful application of a virtual screening methodology that identifies several chemotypes as starting hits against lactate dehydrogenase-A (LDHA). From 29 compounds identified from virtual screening, 17 (58%) showed IC50 values < 63 μM, two showed single-digit micromolar inhibition, and the most potent hit compound had IC50 down to 117 nM. We enriched the database and employed an ensemble approach by combining 2D fingerprint similarity searches, pharmacophore modeling, molecular docking, and molecular dynamics. WaterMap calculations were carried out to explore the thermodynamics of surface water molecules and gain insights into the LDHA binding pocket. The present work has led to the discovery of two new chemical classes, including compounds with a succinic acid monoamide moiety or a hydroxy pyrimidinone ring system. Selected hits block lactate production in cells and inhibit pancreatic cancer cell lines with cytotoxicity IC50 down to 12.26 μM against MIAPaCa-2 cells and 14.64 μM against PANC-1, which, under normoxic conditions, is already comparable or more potent than most currently available known LDHA inhibitors.
... The algorithms and computational methods used in the design of compound libraries that are used in HTS campaigns have been discussed extensively in the literature and will not be repeated here (Eurtivong and Reynisson, 2019;López-Vallejo et al., 2012;Schneider et al., 2009;Follmann et al., 2019;Huggins et al., 2011). Some of the earlier screening libraries have used the Lipinski Rules, a much discussed set of rules, which triage compounds to be included in the screening set based on a set of physicochemical properties. ...
Chapter
The recent advancements of machine learning and deep learning (DL) methods are making it possible to create systems that automatically mine patterns and learn from data. Applications of those methods in chemistry, in particular QSAR and drug discovery, are already available. While DL can be applied as a conventional way of learning from chemical descriptors, the potentialities of the method are far more. In particular the capabilities of DL to autonomously extract, through multiple transformations, the structural elements that are correlated with the property under investigation can help in discovering the link between a chemical and its biological/physical effects. After presenting the principal DL methods developed for chemical problems, the focus is on a study case in mutagenicity prediction that uses directly the chemical graph, either as SMILES, graphs, or images, and applies convolutional and recurrent networks. The knowledge extracted from the networks is analyzed and compared with the accepted structural alerts for mutagenicity. The next challenges and the future of DL for QSAR are finally discussed.
... The algorithms and computational methods used in the design of compound libraries that are used in HTS campaigns have been discussed extensively in the literature and will not be repeated here (Eurtivong and Reynisson, 2019;López-Vallejo et al., 2012;Schneider et al., 2009;Follmann et al., 2019;Huggins et al., 2011). Some of the earlier screening libraries have used the Lipinski Rules, a much discussed set of rules, which triage compounds to be included in the screening set based on a set of physicochemical properties. ...
Chapter
In this chapter, we have discussed a relatively advanced and successful hybrid machine learning workflow that may help to unravel causative agents of disease from high throughput RNA-Seq datasets. The method is then applied to a breast cancer dataset taken from the Gene Expression Omnibus repository, and disease genes associated with breast cancer are identified. Finally, using the PPI network analysis approach, we observed the significance that the detected disease genes possess a role in the causal mechanism of disease. This method discussed here is universal and can be applied to any RNA-Seq data independent of disease.
... The exploration of these bioactive phytochemicals as QS and virulence inhibitors through virtual screenings, allows for a rapid and economical selection of prospective target ligands from large libraries of molecules (Huggins et al., 2011). This further accelerates the time and reduces the cost of traditional drug development processes (Naqvi et al., 2018) as well as narrowing the amount of potential ligands to be tested in vitro for drug screening and drug ability. ...
Article
Full-text available
Klebsiella pneumoniae is one of the perturbing multidrug resistant (MDR) and ESKAPE pathogens contributing to the mounting morbidity, mortality and extended rate of hospitalization. Its virulence, often regulated by quorum sensing (QS) reinforces the need to explore alternative and prospective antivirulence agents, relatively from plants secondary metabolites. Computer aided drug discovery using molecular modelling techniques offers advantage to investigate prospective drugs to combat MDR pathogens. Thus, this study employed virtual screening of selected terpenes and flavonoids from medicinal plants to interrupt the QS associated SdiA protein in K. pneumoniae to attenuate its virulence. 4LFU was used as a template to model the structure of SdiA. ProCheck, Verify3D, Ramachandran plot scores, and ProSA-Web all attested to the model’s good quality. Since SdiA protein in K. pneumoniae leads to the expression of virulence, 31 prospective bioactive compounds were docked for antagonistic potential. The stability of the protein-ligand complex, atomic motions and inter-atomic interactions were further investigated through molecular dynamics simulations (MDS) at 100 ns production runs. The binding free energy was estimated using the molecular mechanics/ poisson-boltzmann surface area (MM/PB-SA). Furthermore, the drug-likeness properties of the studied compounds were validated. Docking studies showed phytol possesses the highest binding affinity (-9.205 kcal/mol) while glycitein had -9.752 kcal/mol highest docking score. The MDS of the protein in complex with the best-docked compounds revealed phytol with the highest binding energy of -44.2625 kcal/mol, a low root-mean-square deviation (RMSD) value of 1.54 Å and root-mean-square fluctuation (RMSF) score of 1.78 Å. Analysis of the drug-likeness properties prediction and bioavailability of these compounds revealed their conformed activity to lipinski’s rules with bioavailability scores of 0.55 F. The studied terpenes and flavonoids compounds effectively thwart SdiA protein, therefore regulate inter- or intra cellular communication and associated virulence in Enterobacteriaceae, serving as prospective antivirulence drugs.
... Although the importance of both sequence diversity and molecular structure diversity for drug discovery has been long understood [43][44][45][46], only recently has the diversity of sampled solutions from quantum algorithms been studied. King et al., for example, examined solver performance with respect to diversity. ...
Preprint
Full-text available
Molecular docking, which aims to find the most stable interacting configuration of a set of molecules, is of critical importance to drug discovery. Although a considerable number of classical algorithms have been developed to carry out molecular docking, most focus on the limiting case of docking two molecules. Since the number of possible configurations of N molecules is exponential in N, those exceptions which permit docking of more than two molecules scale poorly, requiring exponential resources to find high-quality solutions. Here, we introduce a one-hot encoded quadratic unconstrained binary optimization formulation (QUBO) of the multibody molecular docking problem, which is suitable for solution by quantum annealer. Our approach involves a classical pre-computation of pairwise interactions, which scales only quadratically in the number of bodies while permitting well-vetted scoring functions like the Rosetta REF2015 energy function to be used. In a second step, we use the quantum annealer to sample low-energy docked configurations efficiently, considering all possible docked configurations simultaneously through quantum superposition. We show that we are able to minimize the time needed to find diverse low-energy docked configurations by tuning the strength of the penalty used to enforce the one-hot encoding, demonstrating a 3-4 fold improvement in solution quality and diversity over performance achieved with conventional penalty strengths. By mapping the configurational search to a form compatible with current- and future-generation quantum annealers, this work provides an alternative means of solving multibody docking problems that may prove to have performance advantages for large problems, potentially circumventing the exponential scaling of classical approaches and permitting a much more efficient solution to a problem central to drug discovery and validation pipelines.
... There exist manifold methods for the preparation of compound libraries [9,10]. They can be obtained by extracting compounds from a larger set based on the required parameters. ...
... A widely adopted way of library generation is searching for structural analogs of the known active compounds [9][10][11][12][13][14][15]. Although some fraction of the compounds having high structural similarity to the known actives do show comparable biological activity [16,17], the potential of this approach for the creation of diversified sets is rather modest [10,18]. ...
... A widely adopted way of library generation is searching for structural analogs of the known active compounds [9][10][11][12][13][14][15]. Although some fraction of the compounds having high structural similarity to the known actives do show comparable biological activity [16,17], the potential of this approach for the creation of diversified sets is rather modest [10,18]. In our search for the methods of generation of tailored compound libraries, we have turned our attention to the 3D shape-based descriptors [19][20][21]. ...
Article
In the emerging field of drug discovery, rapid virtual screening methods become extremely valuable, especially when dealing with ultra-large databases of organic small bioactive molecules. In this work, we present a fast, computationally resource-efficient, and simple workflow for screening targeted compound libraries generated from ultra-large virtual chemical space. This workflow aims to find compounds with similar molecular 3D shapes with reference ones, and at the same time to expand chemical diversity and to identify new and potentially active scaffolds. This pipeline ensures the enrichment of the generated libraries with novel chemotypes. Also, it was shown that delicate tailoring of the physicochemical parameters of the search set ensures that all library compounds will possess desired property distributions. A visual inspection has shown that found structures bind to the receptor in the same way as the reference ones. Using our screening workflow, we have created a number of conventional protein-targeted libraries: the GPCRs Targeted Library (531 K compounds) and the Protein Kinases Targeted Library (113 K compounds). The described pipeline and scripts are freely accessible at: https://github.com/ChemSpace-LLC/usrcat_sim.
... The drug-like property (DL) indicates the similarity of a compound with a known drug compound. The currently widely used DL evaluation index is Lipinski's ve rules [50]. The physicochemical properties and structure of DL compounds can have good correlation with the pharmacokinetics in the body and meet the expected ADME requirements [51]. ...
Preprint
Full-text available
Background: The increasing demand for Chinese medicine resources has piqued the scientific community’s interest in modernizing the Chinese medicine industry. However, traditional Chinese medicine (TCM) is based on the "multiple targets and multiple components" treatment modality and involves unique treatment methods, such as "same disease with different treatments" and "treating different diseases with same method." Hence, it is difficult to elucidate the mechanisms of TCM formulations. Network pharmacology enables the analysis of the characteristics of "multiple components, multiple targets, and multiple pathways," which is consistent with the overall characteristics of TCM. Hence, network pharmacology analysis can be used to examine the active ingredients, mechanism of action, and compatibility rules of TCM, consequently providing a scientific basis for the theories of TCM. Results: Network pharmacology is a new pharmacological research method that can quickly predict the pharmacological action mechanism of TCMs. Compared with traditional pharmacological research, this method does not require complicated TCM extraction and long-term experimental verification, is fast and efficient, but has certain errors. Combined with multi-omics methods, it can reduce the error while shortening the experiment time, and it can be used in the analysis of the mechanism of action and compatibility of TCM quickly and efficiently. In addition, the application of this method in reverse pharmacology is helpful for the development of new Chinese medicines for specific diseases. In our article, we briefly summarize the commonly used databases and software and calculation methods, and pay more attention to the combination of multi-omics and TCM. At the same time, we also summarize the mechanism of single prescription, compound prescriptions, "same disease with different treatments and treating different diseases with the same method" in TCM, reverse pharmacology of TCM. Conclusions: Network pharmacology has unique advantages to active ingredients, compatibility rules, and mechanisms of action explorations of TCMs. Using network pharmacology to clarify the mechanism of TCM can enhance the global acceptance of TCM products and promote the modernization and international development of the TCM industry, as well as providing a scientific basis for the law of compatibility of TCM.
... Compound finger pr ints were calculated using the CACTVS Chemoinformatics Toolkit (40). The Sphere Exclusion algorithm (41,42) complemented with Tanimoto coefficient (43) was applied to select diverse subsets from the Hippo(crates) database. This procedure was repeated several times using different parameter sets in order to identify the optimal thresholds and separate the dataset in internal clusters. ...
Article
Full-text available
Modern drug discovery and pharmaceutics benefit from nature. Natural products (NPs) are used as a source of therapeutic agents with beneficial uses. Currently, there is considerable interest in the exploration of NPs for drug discovery and continuous investigations on the therapeutic claims and mechanisms of herbal medicines. To date, approximately one million NPs have been isolated and subjected to experimental assays to evaluate quantitative biological activities. This renders the use of an integrated database to assemble and correlate this valuable information from the literature, experimental studies and databases necessary. Although databases contain a large volume of information, it is frequently difficult and complex, even in well.organized databases, to extract the required information. Novel databases must be accompanied by efficient algorithms and techniques in order to extract beneficial knowledge by a simple query. The Hippo(crates) database aims to fill this gap in the field of chemoinformatics and natural products by providing retrieval not only linked to the Hippo(crates) database, but also to other worldwide chemical and biological databases. Part of the OPENSCREEN.GR project, the Hippo(crates) Database Graphical User Interface (HDGUI) web server was developed to provide a user.friendly access interface, integrating annotated information of NP origin (sources and species), biological activities, physicochemical properties, linear and 3D chemical structure, as well as relative terms that correlate chemical compounds and their use. In its current version (V1.0), the Hippo database provides 45,300 NPs, NP derivatives and synthetic compounds, which are separated into 32 major categories, including biological or medicinal properties. In the database, 22,830 NP source organisms are correlated, with >100,000 terms, including biological pathways, target organisms, target diseases, target types, target proteins and pathogens, and 6,070 three.dimensional structures of NP target proteins. For each entry, a cluster with similar compounds and a ligand.based or structure.based pharmacophore model is provided. The portal is designed as an easy.to.use web tool where the user can easily search, extract and correlate information and data for natural product chemical compounds through various fields, such as categories, keywords, targets, species, or two.dimensional or three.dimensional similarity structure in the Hippo(crates) atlas of the NP database.