ArticlePDF AvailableLiterature Review

Review of MARCH-INSIDE & Complex Networks Prediction of Drugs: ADMET, Anti-parasite Activity, Metabolizing Enzymes and Cardiotoxicity Proteome Biomarkers

Authors:

Abstract and Figures

In this communication we carry out an in-depth review of a very versatile QSPR-like method. The method name is MARCH-INSIDE (MARkov CHains Ivariants for Network Selection and DEsign) and is a simple but efficient computational approach to the study of QSPR-like problems in biomedical sciences. The method uses the theory of Markov Chains to generate parameters that numerically describe the structure of a system. This approach generates two principal types of parameters Stochastic Topological Indices (sto-TIs). The use of these parameters allows the rapid collection, annotation, retrieval, comparison and mining structures of molecular, macromolecular, supramolecular, and non-molecular systems within large databases. Here, we review and comment by the first time on the several applications of MARCH-INSIDE to predict drugs ADMET, Activity, Metabolizing Enzymes, and Toxico-Proteomics biomarkers discovery. The MARCH-INSIDE models reviewed are: a) drug-tissue distribution profiles, b) assembling drug-tissue complex networks, c) multi-target models for anti-parasite/anti-microbial activity, c) assembling drug-target networks, d) drug toxicity and side effects, e) web-server for drug metabolizing enzymes, f) models in drugs toxico-proteomics. We close the review with some legal remarks related to the use of this class of QSPR-like models.
Content may be subject to copyright.
A preview of the PDF is not available
... In the notation D k (m sqi ) g /D k (m sri ) g the letter D = Descriptor, k = type of descriptor, s = sub-type of molecule, q = molecules involved in query reaction, r = molecules involved in reference reaction, i = ID number of the molecule, g = group of atoms inside the molecule. The general formula for the calculation is shown in Eq. 1 (see MARCH-INSIDE algorithm details in literature) [37]. ...
Article
Full-text available
The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products. In this context, Chiral Phosphoric Acid (CPA) catalysts are versatile catalysts for this type of reactions. The selection and design of new CPA catalysts for different enantioselective reactions has a dual interest because new CPA catalysts (tools) and chiral drugs or materials (products) can be obtained. However, this process is difficult and time consuming if approached from an experimental trial and error perspective. In this work, an Heuristic Perturbation-Theory and Machine Learning (HPTML) algorithm was used to seek a predictive model for CPA catalysts performance in terms of enantioselectivity in α-amidoalkylation reactions with R ² = 0.96 overall for training and validation series. It involved a Monte Carlo sampling of > 100,000 pairs of query and reference reactions. In addition, the computational and experimental investigation of a new set of intermolecular α-amidoalkylation reactions using BINOL-derived N -triflylphosphoramides as CPA catalysts is reported as a case of study. The model was implemented in a web server called MATEO: InterMolecular Amidoalkylation Theoretical Enantioselectivity Optimization, available online at: https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo . This new user-friendly online computational tool would enable sustainable optimization of reaction conditions that could lead to the design of new CPA catalysts along with new organic synthesis products.
... In the notation Dk(msqi)g/Dk(msri)g the letter D = Descriptor, k = type of descriptor, s = sub-type of molecule, q = molecules involved in query reaction, r = molecules involved in reference reaction, i = ID number of the molecule, g = group of atoms inside the molecule. The general formula for the calculation is shown in Eq. 1 (see MARCH-INSIDE algorithm details in literature) [37]; (%) = (%) ...
Preprint
Full-text available
The enantioselective Brønsted acid-catalyzed α-amidoalkylation reaction is a useful procedure is for the production of new drugs and natural products. In this context, Chiral Phosphoric Acid (CPA) catalysts are versatile catalysts for this type of reactions. The selection and design of new CPA catalysts for different enantioselective reactions has a dual interest because new CPA catalysts (tools) and chiral drugs or materials (products) can be obtained. However, this process is difficult and time consuming if approached from an experimental trial and error perspective. In this work, an Heuristic Perturbation-Theory and Machine Learning (HPTML) algorithm was used to seek a predictive model for CPA catalysts performance in terms of enantioselectivity in α-amidoalkylation reactions with R ² = 0.91 in training and validation series. It involved a Monte Carlo sampling of > 100,000 pairs of query and reference reactions. In addition, the computational and experimental investigation of a new set of intermolecular α-amidoalkylation reactions using BINOL-derived N -triflylphosphoramides as CPA catalysts is reported as a case of study. The model was implemented in a web server called MATEO: InterMolecular Amidoalkylation Theoretical Enantioselectivity Optimization, available online at: https://cptmltool.rnasa-imedir.com/CPTMLTools-Web/mateo. This new user-friendly online computational tool would enable sustainable optimization of reaction conditions that could lead to the design of new CPA catalysts along with new organic synthesis products.
... Finally, the new CPTML could be a useful tool for determining the structure of MRNs in new species in biotechnology. The main bibliographic sources used in this paper are listed below [1][2][3][4][5][6][7][8][9][10]. ...
Conference Paper
Full-text available
Metabolic Reaction Networks (MRNs) are complex networks produced by thousands of chemical reactions or transformations (links) of metabolites (nodes) in a live organism. An essential goal of chemical biology is to test the connectivity (structure) of these complex MRNs models presented for new microorganisms with promising features. In theory, we can undertake hands-on testing (Manual Curation). However, due to the large number of possible combinations of node pairs, this is a difficult operation (possible metabolic reactions). We combined Combinatorial, Perturbation Theory, and Machine Learning approaches in this study to find a CPTML model for MRNs >40 organisms compiled by Barabasis' group. First, we used a novel type of node index termed Markov linear indices fk to quantify the local structure of a very large collection of nodes in each MRN. Next, for over 150 000 MRN query and reference node combinations, we computed CPT operators. Finally, we fed these CPT operators into several ML algorithms. The CPTML linear model obtained using the LDA algorithm is capable of distinguishing nodes (metabolites) with correct reaction assignment from nodes with incorrect reaction assignment with accuracy, specificity, and sensitivity values ranging from 85 to 100 % in both the training and external validation data series. Meanwhile, the top three non-linear models with more than 97.5 % accuracy were found to be PTML models based on Bayesian networks, J48-Decision Tree, and Random Forest algorithms. The new work sets the door for the investigation of MRNs from various organisms using PTML models. Finally, the new CPTML could be a useful tool for determining the structure of MRNs in new species in biotechnology.
... Here, we present for the first time the generalization of atom-based linear indices of molecular graphs to the study of complex networks using Markov chain theory. We combine the concept behind Marrero's molecular descriptors, specifically the atom-based linear indices [31], and the MARCH-INSIDE (MI) software algorithm of González-Díaz et al. [32]. This generalization is done here with the aim to extend the atom-based linear indices for the study of different types of complex networks. ...
Article
Background Checking the connectivity (structure) of complex Metabolic Reaction Networks (MRNs) models proposed for new microorganisms with promising properties is an important goal for chemical biology. Objective In principle, we can perform a hand-on checking (Manual Curation). However, this is a hard task due to the high number of combinations of pairs of nodes (possible metabolic reactions). Method In this work, we used Combinatorial, Perturbation Theory, and Machine Learning, techniques to seek a CPTML model for MRNs >40 organisms compiled by Barabasis’ group. First, we quantified the local structure of a very large set of nodes in each MRN using a new class of node index called Markov linear indices fk. Next, we calculated CPT operators for 150000 combinations of query and reference nodes of MRNs. Last, we used these CPT operators as inputs of different ML algorithms. Results The CPTML linear model obtained using LDA algorithm is able to discriminate nodes (metabolites) with correct assignation of reactions from not correct nodes with values of accuracy, specificity, and sensitivity in the range of 85-100% in both training and external validation data series. Conclusion Meanwhile, PTML models based on Bayesian network, J48-Decision Tree and Random Forest algorithms were identified as the three best non-linear models with accuracy greater than 97.5%. The present work opens a door to the study of MRNs of multiple organisms using PTML models.
... One way to develop this class mt-QSAR is incorporated into the QSAR equation parameters of the structure of the target (protein, DNA, RNA, etc.) in addition to the structural parameters of the drug present in classic QSAR. Some of the more known software we can use to reach this goal are: DRAGON, CODESSA [7], MODES-LAB [8], TOMO-COMD [9], and MARCH-INSIDE (MI) [10]. The software DRAGON is one of the more complete calculating more than 1600 descriptors for drug structure including as zero-(0D) one-(1D), two-(2D), threedimensional (3D) parameters. ...
... Nowadays, it is almost impossible to have clear ideas about biochemical or biological processes and phenomena without using bioinformatics [9]. Bioinformatics is concerned with the application of statistics and computer science to the field of molecular biology, and has been a determinant of better understanding of processes related to Medicinal Chemistry [10][11][12][13][14][15][16][17][18][19], Proteomics [20][21][22][23][24][25][26], Drug Metabolism [27][28][29][30][31][32][33][34][35], and Pharmaceutical Design [36][37][38][39][40][41][42][43][44][45]. Bioinformatics has been essential in drug design for quantitative structure-activity relationship (QSAR) methodologies [46], more specifically in 3-D-QSAR methodologies, and in molecular modeling techniques [47]. ...
Article
Full-text available
Cissampelos sympodialis is a plant to northeastern Brazil region used by populace for treating diseases respiratory. Several studies have shown that the ethanol extract of the leaves have immunomodulatory and anti-inflammatory activities. The infusion is an ancient popular technique widely used in traditional medicine using as extracting means just hot water. This study aims to investigate the acute toxicological potential administering a dose of 2000mg/kg in Rattus norvegicus combined with in silico study of 117 alkaloids present in Cissampelos genre. In silico, 5 we can determine which molecules have a high toxicity (21, 8, 93, 32 and 88) and low toxicity (57, 77, 28, 25 and 67) and their percentage metabolizing liver. The toxicological evaluation in vivo showed that there was a decrease in water consumption in males only and feed intake in both sexes. However, these numbers were not statistically sufficient to change the weight gain of the animals. As for the biochemical parameters of urea there was an increase and decrease of uric acid and AST in males. In females was a decrease in albumin and globulin which consequently leads to a decrease in total protein. Despite the biochemical changes are suggestive for kidney damage, histological sections confirm that there was no change in this organ and as well as in the liver. Therefore, the results indicate that despite the genus Cissampelos present alkaloids which may be toxic, the infusion of Cissampelos sympodialis sheets when applied orally at a dose of 2000mg/kg presents a low toxicity.
... Fortunately, the new mt-QSAR is not only useful for different targets but also to different multiplexing assay conditions (cj) for all targets. Definitely, we have stated the first QSAR model for multiplexing assays of anti-Alzheimer, anti-parasitic, anti-fungi, and anti-bacterial activity (6)(7)(8)(9)(10)(11)(12)(13)(14)(15). ...
Conference Paper
Hypertension is a multifactorial disease in which blood vessels are extensively exposed to a higher voltage than usual, this tension endures more strain on the heart leading to greater cardiac output to pump blood to the body. Hypertension is classified by the World Health Organization (WHO) as one of the main risk factors for disability and premature death in the world population. WHO has strengthened various health services around the world, listing the groups of basic medicines for high blood pressure such as: angiotensin-converting enzyme inhibitors, thiazide diuretics, beta blockers, long-acting calcium channel blockers, among other groups for drug treatment to the population with this condition. The discovery of new drugs with better activity and less toxicity for the treatment of Hypertension is a goal of the major importance. In this sense, theoretical models as QSAR can be useful to discover new drugs for hypertension treatment. For this reason, we developed a new multi-target-QSAR (mt-QSAR) model to discover new drugs. A public databases ChEMBL contain Big Data sets of multi-target assays of inhibitors of a group of receptors with special relevance in Hypertension was used. However, almost all the computational models known focus in only one target or receptor. In this work, Beta-2 adrenergic receptor, Adrenergic receptor beta, Type-1 angiotensin II receptor, Angiotensin-converting enzyme, Beta-adrenergic receptor, Cytochrome P450 11B2 and Renin were used as receptor inputs in the model. A Artificial Neural Network (ANN) is our statistical analysis. In that way, we used as input Topological Indices, in specific Wiener, Barabasi and Harary indices calculated by Dragon software. These operators quantify the deviations of the structure of one drug from the expected values for all drugs assayed in different boundary conditions such as type of receptor, type of assay, type of target, target mapping. Overall training performance was 90%. Overall Validation predictability performance was 90%.
Presentation
Full-text available
rom the 1940s to the present, approximately 49% of all medicines approved by the WHO are composed of active substances of natural origin or direct derivatives thereof. About 75% of the medicines authorized by the WHO to counteract the direct and indirect effects induced by bacteria, parasites and viruses (cytokine storms, toxins and others) are of natural origin; highlighting the fact that 100% of the natural active principles investigated, developed and used to control and cure bacterial, parasitic and viral infections with almost 100% success correspond to metabolites present in foods of vegan origin used by animals, "intuitively", in their natural environment to control diseases caused by these same pathogens. Beyond that, the diet profiles of people in Africa, North America, South America, Asia, Australia, and Eastern, Western, Northern, and Southern Europe have been found to differ markedly. The differences in the diets and in the composition of the nutrients that the people of the different regions ingest are directly related to the level, degree and incidence in the presence and progression of cancer, autoimmune diseases, diabetes, hormonal alterations and increased susceptibility to contract infections and the development of symptoms, alterations and the progression of the disease to serious, critical or fatal stages. This fact motivated our team of scientists to focus on studies, starting in 1997, on the relationship that exists between changes in feeding behavior adopted by wild animals, from the moment in which the alterations caused by infections caused by viruses, parasites and bacteria, with the effectiveness that changes in diet are caused in the control and cure of the pathologies presented. We selected coronaviruses because, for approximately a decade, we verified and received information on the recurrence of infections and deaths caused by these pathogens in animals used in preclinical research in our laboratory and others around the world. Since that date, we have conducted dozens of investigations in the field of theoretical chemistry, high-throughput screening, preclinical and clinical trials on the ability of families of active compounds present in plants to prevent, control, treat and cure many diseases and also the pathological alterations caused by parasites, bacteria and viruses in animals and man. These studies culminated in the registration of a significant number of products that have shown great effectiveness against the pathological processes for which they were developed. When the "COVID-19" pandemic arrived, the experience accumulated over decades allowed us to quickly extrapolate and correlate the knowledge acquired, during all that time, towards the behavior that animals that fall ill with coronavirus follow and the consequent correlation of these findings with we could expect in humans to become infected with this type of virus. This reasoning was done avoiding any supposition, or theory, about the transformation of this virus in a laboratory because, in addition to the fact that this is not relevant to us, we know that pathogen modifications are generally carried out by scientists to anticipate mutations of the viruses, bacteria and others in nature in order to find cures for infections. Simultaneously, an analysis was carried out on the evolution and application of the protocols recommended by the WHO for the control and treatment of COVID-19, from the beginning of the pandemic to the present; demonstrating that the most effective protocols for the control, treatment and cure of patients affected by the Sar-CoV-2
Article
Full-text available
In silico predictive models for aqueous solubility, human intestinal absorption (HIA), and Ames genotoxicity were developed principally using artificial neural net (ANN) analysis and topological descriptors. Approximately 10,000 compounds spread across three data sets were used in the construction of these quantitative-structure-activity/property-relationship (QSAR/QSPR) models. For aqueous solubility, 5,037 chemically diverse compounds were used to construct ANN-QSPRs for intrinsic aqueous solubility. When these robust models were applied to 938 compounds in external validation, they gave an r2 = 0.78 with 84% predicted within 1 log unit for these new chemical entities (NCEs). 417 therapeutic drugs were used in the development of an ANN-QSPR to predict for percent oral absorption (%OA). For validation testing on 195 new drugs, 92% of the compounds were predicted to within 25% of their reported %OA values, which ranged from 0% to 100%. Polar surface area and logP, the octanol-water partition coefficient, were found to be important descriptors in our QSPR model. Development of an ANN-QSAR as a genotoxicity predictor for S. typhimurium employed 2963 compounds including 290 therapeutic drugs. Validation results on 400 NCEs with the ANN-QSAR gave a concordance of 83% which rose to 91% when a confidence indicator was applied. With new drugs a concordance of 92% was reached, which increased to 97% when the reliably indicator was invoked.
Article
Full-text available
Development of a novel semi-empirical descriptor (MR(chi) for molecular modelling. The index is based on a molar refractivity partition using Randictype graph-theoretical invariant. This hybrid index describes not only the London dispersive forces in a ligand fragment related to the molar refractivity but also structural features of the molecule It is also applicable in Quantitative Structure-Activity Relationship (QSAR) and Quantitative Structure-Property Relationship (QSPR) studies. The method is convenient and can discriminate between isomers.
Article
Chemometrics, that based prediction on the probability of chemical distribution to different systems, is highly important for physicochemical, environmental, and life sciences. However, the amount of information is huge and difficult to analyze. A multi-system partition Complex Network (MSP-CN) may be very useful in this sense. We define MSP-CNs as large graphs composed by nodes (chemicals) interconnected by arcs if a pair of chemicals have similar partition in a given system. Experimental quantification of partition in many systems is expensive, so we can use a Quantitative Structure–Partition Relationship (QSPR) model. Unfortunately, with classic QSPR we need to use one model for each system. Here construct the first MSP-CN based on a multi-target QSPR (mt-QSPR). The model is based on the spectral moments (πk) of a molecular Markov matrix weighted with atomic parameters that depend on both the nature of the atom and the partition system. The mt-QSPR predicts 90.6% of 413 compound/system pairs in training series and 90.0% in validation. The MSP-CN predicted presents 413 nodes, 2060 edges, average node degree 9.9, and only 7.7% drugs are unconnected. The model was used to study the biophysical phenomena of transport or distribution of G1 (a novel antimicrobial drug) to different rat tissues. Predicted probabilities (P) coincide with low experimental partition coefficients (logPC) reported herein by the first time in skin (P=0.455; logPC=−0.02b0→U), heart (0.453; −0.02→U), and brain (0.324; −0.34→U). The Kamada–Kawai algorithm evidenced the community structure of the MSP-CN and clusters G1 into three different communities of the U-type drugs. These results coincide with the low distribution of G1 to these tissues and consequently have low expected drug side effect.
Article
The Quantitative Structure–Property Relationships (QSPRs) based on Graph or Network Theory are important for predicting the properties of polymeric systems. In the three previous papers of this series (Polymer 45 (2004) 3845–3853; Polymer 46 (2005) 2791–2798; and Polymer 46 (2005) 6461–6473) we focused on the uses of molecular graph parameters called topological indices (TIs) to link the structure of polymers with their biological properties. However, there has been little effort to extend these TIs to the study of complex mixtures of artificial polymers or biopolymers such as nucleic acids and proteins. In this sense, lood Proteome (BP) is one of the most important and complex mixtures containing protein polymers. For instance, outcomes obtained by Mass Spectrometry (MS) analysis of BP are very useful for the early detection of diseases and drug-induced toxicities. Here, we use two Spiral and Star Network representations of the MS outcomes and defined a new type of TIs. The new TIs introduced here are the spectral moments (pk) of the stochastic matrix associated to the Spiral graph and describe non-linear relationships between the different regions of the MS characteristic of BP. We used the MARCH-INSIDE approach to calculate the pk(SN) of different BP samples and S2SNet to determine several Star graph TIs. In the second step, we develop the corresponding Quantitative Proteome–Property Relationship (QPPR) models using the Linear Discriminant Analysis (LDA). QPPRs are the analogues of QSPRs in the case of complex biopolymer mixtures. Specifically, the new QPPRs derived here may be used to detect drug induced cardiac toxicities from BP samples. Different Machine Learning classification algorithms were used to fit the QPPRs based on pk(SN), showing J48 decision tree classifier to have the best performance. These results suggest that the present approach captures important features of the complex biopolymers mixtures and opens new opportunities to the application of the idea supporting classic QSPRs in polymer sciences.
Article
A QSAR study of two sets of carbonic anhydrase inhibitors is presented using a variety of molecular descriptors including topological indices. The first set consists of 29 benzenesulphonamides, and the second set includes 35 sulphanilamide Schiff bases. Two regression methodologies have been used involving ridge regression and the CODESSA program, and their results are compared with those of previous QSAR studies. Good correlations were found for the former set, and less satisfactory results for the latter set when the number of molecular descriptors is kept below five.
Article
ChemInform is a weekly Abstracting Service, delivering concise information at a glance that was extracted from about 200 leading journals. To access a ChemInform Abstract of an article which was published elsewhere, please select a “Full Text” option. The original article is trackable via the “References” option.
Article
The paper reports an ‘in silico’ approach to gene expression analysis based on a barley gene co-expression network resulting from the study of several publicly available cDNA libraries. The work is an application of Systems Biology to plant science: at the end of the computational step we identified groups of potentially related genes. The communities of co-expressed genes constructed from the network are remarkably characterized from the functional point of view, as shown by the statistical analysis of the Gene Ontology annotations of their members. Experimental, lab-based testing has been carried out to check the relationship between network and biological properties and to identify and suggest effective strategies of information extraction from the network-derived data.
Article
We describe a modification of the compact representation of DNA sequences which transforms the sequence into a 2-D diagram in which the ‘spots’ have integer coordinates. As a result the accompanying numerical characterization of DNA is quite simple and straightforward. This is an important advantage, particularly when considering DNA sequences having thousands of nucleic bases. The approach starts with the compact representation of DNA based on zigzag spiral template used for placing ‘spots’ associated with binary codes of the nucleic acids and subsequent suppression of the underlying zigzag curve. As a result, a 2-D map is formed in which all ‘spots’ have integer coordinates. By using only distances between spots having the same x or the same y coordinate one can construct a ‘map profile’ using integer arithmetic. The approach is illustrated on DNA sequences of the first exon of human β-globin.