Figure 1 - uploaded by Rene van Schaik
Content may be subject to copyright.
Keyword profiles of DMN, AB1, MPy, PBO, DEHP and VPA. 

Keyword profiles of DMN, AB1, MPy, PBO, DEHP and VPA. 

Source publication
Article
Full-text available
To reduce continuously increasing costs in drug development, adverse effects of drugs need to be detected as early as possible in the process. In recent years, compound-induced gene expression profiling methodologies have been developed to assess compound toxicity, including Gene Ontology term and pathway over-representation analyses. The objective...

Contexts in source publication

Context 1
... ease of comparison the keywords were catego- rized into 13 groups: pathogenesis, drug metabo- lism, oxidative stress, DNA damage and response, cell cycle and mitosis, apoptosis and cell death, inflammation and immune response, cell differentiation and development, steroid metabolism, lipid metabolism, energy metabo- lism, general cell metabolism, and miscellaneous keywords. The profiles of the keyword finger- prints are visualized in Figure 1 (the categories and their corresponding keywords can be retrieved from the online web supplement). ...
Context 2
... differences in the keyword profiles can be appreciated in Figure 1. The keywords of the two peroxisome proliferators DEHP and VPA are mainly grouped into the steroid, lipid, and energy metabolism categories. ...
Context 3
... observed vacuolation is often caused by accumu- lation of lipids in the cells. The profile of the keyword fingerprint of DEHP (Figure 1), shows that DEHP largely has its effect on lipid metabo- lism and energy metabolism, which is in agree- ment with the observed pathologies and with known effects of peroxisome proliferators. Key- words in the fingerprint of DEHP ( Table 1 and online web supplement) support its lipogenic effect, with low p-values for keywords such as lipid metabolism, fatty acid oxidation, fatty acid β-oxidation and hypertrophy. ...
Context 4
... profile of the keyword fingerprint of DMN (Figure 1) is in agreement with the observation that DMN causes a strong inflammatory response fol- lowing necrosis -as a relatively high number of keywords are grouped in the inflammatory and immune response category (e.g., immune system, inflammatory response and inflammation) -and that necrosis was one of the keywords with the lowest p-value (Table 1 and online web supple- ment). Furthermore, a relatively high number of keywords are grouped in the cell differentiation and development category (e.g., cell development, ...
Context 5
... has been reported that the cytotoxicity of MPy leads to necrosis, and consequently induces regenerative cell proliferation to replace necrotic cells [18,43] and that oxidative stress is proposed to be one of the causes of the cytotoxic nature of MPy [44]. These findings reflect the profile of the keyword fingerprint of MPy (Figure 1) in that keywords related to oxidative stress, apopto- sis/cell death and cell cycle/mitosis are promi- nent in this profile. Furthermore, keywords related to an inflammatory/immune response (e.g., inflammation and T-cell proliferation) can be correlated with the observed inflammation after MPy administration. ...
Context 6
... exact form of the keyword fingerprints that are shown in Figure 1 is partly dependent on the manual classification of the keywords in 13 high-level categories. Automatic literature- based compound profiling would benefit from structured vocabularies. ...

Citations

... A TExpress pattern specifying entity types including GENE and HPO was employed to ensure that synonyms of KLK-7 and all phenotype concepts from the Human Phenotype Ontology (HPO) were considered. Following extraction, a specificity score for each unique gene-phenotype co-occurrence was calculated based on mutual information between the gene and phenotype assessed by tf-idf scores, as described by Frijters et al. 41 . Specificity score reflected the relevancy of the association between the phenotype and the gene. ...
Article
Full-text available
Finding biomarkers that provide shared link between disease severity, drug-induced pharmacodynamic effects and response status in human trials can provide number of values for patient benefits: elucidating current therapeutic mechanism-of-action, and, back-translating to fast-track development of next-generation therapeutics. Both opportunities are predicated on proactive generation of human molecular profiles that capture longitudinal trajectories before and after pharmacological intervention. Here, we present the largest plasma proteomic biomarker dataset available to-date and the corresponding analyses from placebo-controlled Phase III clinical trials of the phosphodiesterase type 4 inhibitor apremilast in psoriasis (PSOR), psoriatic arthritis (PsA), and ankylosing spondylitis (AS) from 526 subjects overall. Using approximately 150 plasma analytes tracked across three time points, we identified IL-17A and KLK-7 as biomarkers for disease severity and apremilast pharmacodynamic effect in psoriasis patients. Combined decline rate of KLK-7, PEDF, MDC and ANGPTL4 by Week 16 represented biomarkers for the responder subgroup, shedding insights into therapeutic mechanisms. In ankylosing spondylitis patients, IL-6 and LRG-1 were identified as biomarkers with concordance to disease severity. Apremilast-induced LRG-1 increase was consistent with the overall lack of efficacy in ankylosing spondylitis. Taken together, these findings expanded the mechanistic knowledge base of apremilast and provided translational foundations to accelerate future efforts including compound differentiation, combination, and repurposing.
... A TExpress pattern specifying entity types including GENE and HPO was employed to ensure that synonyms of KLK-7 and all phenotype concepts from the Human Phenotype Ontology (HPO) were considered. Following the co-occurrence extraction, a specificity score for each unique gene-phenotype co-occurrence was calculated based on mutual information between the gene and phenotype assessed by tf-idf scores, as described by Frijters et al. 56 A higher specificity score shows that the association between the phenotype and the gene is more relevant. Each unique phenotype was tagged with their highest HPO parent class(es) under phenotypic abnormality based on the HPO hierarchical relationships, in order to elucidate essential findings. ...
Preprint
Full-text available
Linking molecular disease biomarkers, drug-induced pharmacodynamic effects and response status in human trials can provide number of values to patient benefits: 1) advancement in understanding the mechanism-of-action of current therapy; 2) back-translation utilities to fast-track preclinical development of next-generation therapeutics. Both opportunities are predicated on proactive generation of molecular biomarker datasets that capture longitudinal trajectories in patients before and after pharmacological intervention. Here, we present the largest plasma proteomic biomarker datasets available to date from placebo-controlled Phase III clinical trials of the phosphodiesterase type 4 inhibitor apremilast in psoriasis (PSOR), psoriatic arthritis (PsA), and ankylosing spondylitis (AS), and the corresponding meta-analyses. Using approximately 150 plasma analytes tracked across three time points from 526 subjects, we found robust biomarkers of PSOR severity, IL-17A and KLK-7, which were also reduced in apremilast responders. Furthermore, when compared to placebo, we observed a shared apremilast-specific pharmacodynamic pattern across all three diseases, with IL-17A universally represented in this pattern. Other notables in this pattern included MDC and HE4, common to the AS and PSOR cohorts, and IL-1RII in PSOR and PSA. Taken together, these findings provide foundational surrogate biomarkers surrogates to accelerate preclinical efforts including compound differentiation, synergy, and repurposing therapeutic strategies.
... and identifying early predictors of toxicity (section 6.3.1.4). Related to toxicological class discovery and separation and mechanistic analysis, Frijters and coworkers showed that text mining applied to expression data from toxicogenomics experiments can separate compounds that have distinct biological activities and yield detailed insight into the mode of toxicity (Frijters et al. 2007). They created keyword profiles for compounds based on co-occurrence in MEDLINE between the mentions of the differentially expressed genes in the experiments and keywords from an in-house thesaurus. ...
Chapter
This chapter concerns the application of bioinformatics methods to the analysis of toxicogenomics data. The chapter starts with an introduction covering how bioinformatics has been applied in toxicogenomics data analysis, and continues with a description of the foundations of a specific bioinformatics method called text-mining. Next, the integration of text-mining with toxicogenomics data analysis methods is described. Four different areas in toxicogenomics where conventional bioinformatics solutions can be assisted by text-mining are described: class discovery and separation, connectivity mapping, mechanistic analysis, and identification of early predictors of toxicity.
... The retrieval of relevant gene-disease associations out of the millions of abstracts in Medline is very labor intensive and thus a text mining system is needed to this in an automated fashion. In previous work we reported about CoPub181920, a publicly available text mining system, which has successfully been used for the analysis of microarray data and in toxicogenomics studies212223242526. CoPub calculates keyword co-occurrences in titles and abstracts from the entire Medline database, using thesauri for genes, diseases, drugs and pathways. ...
Data
Full-text available
Background Glucocorticoids are potent anti-inflammatory agents used for the treatment of diseases such as rheumatoid arthritis, asthma, inflammatory bowel disease and psoriasis. Unfortunately, usage is limited because of metabolic side-effects, e.g. insulin resistance, glucose intolerance and diabetes. To gain more insight into the mechanisms behind glucocorticoid induced insulin resistance, it is important to understand which genes play a role in the development of insulin resistance and which genes are affected by glucocorticoids. Medline abstracts contain many studies about insulin resistance and the molecular effects of glucocorticoids and thus are a good resource to study these effects. Results We developed CoPubGene a method to automatically identify gene-disease associations in Medline abstracts. We used this method to create a literature network of genes related to insulin resistance and to evaluate the importance of the genes in this network for glucocorticoid induced metabolic side effects and anti-inflammatory processes. With this approach we found several genes that already are considered markers of GC induced IR, such as phosphoenolpyruvate carboxykinase (PCK) and glucose-6-phosphatase, catalytic subunit (G6PC). In addition, we found genes involved in steroid synthesis that have not yet been recognized as mediators of GC induced IR. Conclusions With this approach we are able to construct a robust informative literature network of insulin resistance related genes that gave new insights to better understand the mechanisms behind GC induced IR. The method has been set up in a generic way so it can be applied to a wide variety of disease networks.
... Rat research also has a strong physiological focus and a long history of disease-related model systems [4,5], although rat genetics is currently more rudimentary than mouse and other model organisms. The recent sequencing of the rat genome has increased the amount of mechanistic work done in this model [5], and has underpinned additional pharmacogenomic and toxicogenomic studies [6,7]. An experimental system is only as valuable as our understanding of its limitations. ...
Article
Full-text available
Rat is a major model organism in toxicogenomics and pharmacogenomics. Hepatic mRNA profiles after treatment with xenobiotic chemicals are used to predict and understand drug toxicity and mechanisms. Surprisingly, neither inter- and intra-strain variability of mRNA abundances in control rats nor the heritability of rat mRNA abundances yet been established. We address these issues by studying five populations: the popular Sprague-Dawley strain, sub-strains of Long-Evans and Wistar rats, and two lines derived from crosses between the Long-Evans and Wistar sub-strains. Using three independent techniques--variance analysis, linear modelling, and unsupervised pattern recognition--we characterize extensive intra- and inter-strain variability in mRNA levels. We find that both sources of variability are non-random and are enriched for specific functional groups. Specific transcription-factor binding-sites are enriched in their promoter regions and these genes occur in "islands" scattered throughout the rat genome. Using the two lines generated by crossbreeding we tested heritability of hepatic mRNA levels: the majority of rat genes appear to exhibit directional genetics, with only a few interacting loci. Finally, a comparison of inter-strain heterogeneity between mouse and rat orthologs shows more heterogeneity in rats than mice; thus rat and mouse heterogeneity are uncorrelated. Our results establish that control hepatic mRNA levels are relatively homogeneous within rat strains but highly variable between strains. This variability may be related to increased activity of specific transcription-factors and has clear functional consequences. Future studies may take advantage of this phenomenon by surveying panels of rat strains.
... This system uses Medline abstracts to calculate robust statistics for keyword co-occurrences, to be used for the biological interpretation of microarray data (1,2). Since then, CoPub has been intensively used in the analysis of several microarray experiments and toxicogenomics studies (3)(4)(5)(6)(7)(8). However, literature data can be applied far beyond questions related to microarray studies. ...
Article
Full-text available
In this article, we present CoPub 5.0, a publicly available text mining system, which uses Medline abstracts to calculate robust statistics for keyword co-occurrences. CoPub was initially developed for the analysis of microarray data, but we broadened the scope by implementing new technology and new thesauri. In CoPub 5.0, we integrated existing CoPub technology with new features, and provided a new advanced interface, which can be used to answer a variety of biological questions. CoPub 5.0 allows searching for keywords of interest and its relations to curated thesauri and provides highlighting and sorting mechanisms, using its statistics, to retrieve the most important abstracts in which the terms co-occur. It also provides a way to search for indirect relations between genes, drugs, pathways and diseases, following an ABC principle, in which A and C have no direct connection but are connected via shared B intermediates. With CoPub 5.0, it is possible to create, annotate and analyze networks using the layout and highlight options of Cytoscape web, allowing for literature based systems biology. Finally, operations of the CoPub 5.0 Web service enable to implement the CoPub technology in bioinformatics workflows. CoPub 5.0 can be accessed through the CoPub portal http://www.copub.org.
... A wealth of knowledge concerning the function of genes and their role in biological processes is present in the biomedical literature, embodied in full text articles or the Medline abstract database. Various text mining approaches have been developed to extract information on gene function from this body of literature [1,2] and these have been successfully applied to annotate genes and proteins34567 and the interpretation of experimental results891011121314. A common method to establish relationships between biomedical concepts such as genes and pathways is co-occurrence [15]. ...
... Detailed knowledge of the mechanism of action of a drug and the biological processes that are targeted by a drug is of importance for fine tuning drugs and biomarker discovery. In an earlier study, we showed that the application of text mining on expression data from a toxicogenomics experiment yielded detailed insight in the mode of toxicity of the tested compounds [13]. With the hidden relationship algorithm presented in this paper we provide a text mining tool that is independent of gene expression data, to improve the understanding of a drug's mechanism of action and the pathways targeted by that drug. ...
... Six thesauri containing human genes, Gene Ontology biological processes, liver pathologies, diseases, pathways and drugs were used to search Medline XML files containing title, abstract and substances (1966 – August 2009, http://www.nlm.nih.gov/bsd/ licensee/2009_stats/baseline_doc.html), as described previously [13,25]. The keyword thesauri are based on biological items, which represent an instance of a biological concept (e.g., a gene, a pathway), and may contain multiple keywords (e.g., a gene is assigned a full gene name, a gene symbol and gene aliases). ...
Article
Full-text available
The scientific literature represents a rich source for retrieval of knowledge on associations between biomedical concepts such as genes, diseases and cellular processes. A commonly used method to establish relationships between biomedical concepts from literature is co-occurrence. Apart from its use in knowledge retrieval, the co-occurrence method is also well-suited to discover new, hidden relationships between biomedical concepts following a simple ABC-principle, in which A and C have no direct relationship, but are connected via shared B-intermediates. In this paper we describe CoPub Discovery, a tool that mines the literature for new relationships between biomedical concepts. Statistical analysis using ROC curves showed that CoPub Discovery performed well over a wide range of settings and keyword thesauri. We subsequently used CoPub Discovery to search for new relationships between genes, drugs, pathways and diseases. Several of the newly found relationships were validated using independent literature sources. In addition, new predicted relationships between compounds and cell proliferation were validated and confirmed experimentally in an in vitro cell proliferation assay. The results show that CoPub Discovery is able to identify novel associations between genes, drugs, pathways and diseases that have a high probability of being biologically valid. This makes CoPub Discovery a useful tool to unravel the mechanisms behind disease, to find novel drug targets, or to find novel applications for existing drugs.
... These omics technologies promise to deliver a molecular "footprint" for risk factors that could be identified and traced through the steps of the traditional toxicological paradigm (Guyton et al. 2009;Ramos et al. 2007;Schnackenberg and Beger 2006;Wetmore and Merrick 2004). Attempts to produce this "holy grail" include relatively simple analyses of base-pair substitutions (Lasky and Silbergeld 1996) and collection of dense data sets derived from micro array gene expression data and proteomics (Frijters et al. 2007). To a large extent, these applications have used cross-sectional measurements, which may explain why surprisingly little value has been extracted in terms of explaining dynamic events between exposure and outcome. ...
Article
Full-text available
In this review we highlight the need to expand the scope of environmental health research, which now focuses largely on the study of toxicants, to incorporate infectious agents. We provide evidence that environmental health research would be strengthened through finding common ground with the tools and approaches of infectious disease research. We conducted a literature review for examples of interactions between toxic agents and infectious diseases, as well as the role of these interactions as risk factors in classic "environmental" diseases. We investigated existing funding sources and research mandates in the United States from the National Science Foundation and the National Institutes of Health, particularly the National Institute of Environmental Health Sciences. We adapted the toxicological paradigm to guide reintegration of infectious disease into environmental health research and to identify common ground between these two fields as well as opportunities for improving public health through interdisciplinary research. Environmental health encompasses complex disease processes, many of which involve interactions among multiple risk factors, including toxicant exposures, pathogens, and susceptibility. Funding and program mandates for environmental health studies should be expanded to include pathogens in order to capture the true scope of these overlapping risks, thus creating more effective research investments with greater relevance to the complexity of real-world exposures and multifactorial health outcomes. We propose a new model that integrates the toxicology and infectious disease paradigms to facilitate improved collaboration and communication by providing a framework for interdisciplinary research. Pathogens should be part of environmental health research planning and funding allocation, as well as applications such as surveillance and policy development.
... Biomedical text mining has been shown to be valuable for diverse applications in the domains of molecular biology, toxicogenomics, and medicine. For example, it has been used to functionally annotate gene lists from microarray experiments1234, create literature-based compound profiles [5], generate medical hypotheses [6,7], find new uses for old drugs8910, and measure protein similarity [11,12]. The identification of biomedical terms in natural language is essential for biomedical text mining. ...
Article
Full-text available
Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule. Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus. We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at http://biosemantics.org/casper.
... Such programs include Ingenuity, GeneGo, ToxWizz and many more. However, a recent publication by Frijters and colleagues suggested a novel use of such natural language search engines (Frijters et al. 2007). Text mining of Medline abstracts was used to produce 'keyword fingerprints' for two GTX and two NGTX chemicals. ...
Article
Full-text available
For the rapid development of safe, efficacious chemicals it is important that any potential liabilities are identified as early as possible in the discovery/development pipeline. Once identified it is then possible to make rational decisions on whether to progress a chemical and/or series further; one such liability is chemical carcinogenesis, a highly undesirable characteristic in a novel chemical entity. Chemical carcinogens may be roughly divided into two classes, those that elicit their actions through direct damage to DNA (genotoxic carcinogens) and those that cause carcinogenesis through mechanisms that involve direct damage of the DNA by the agent (non-genotoxic carcinogens). Whereas the former group can be identified by in vitro screens to a good degree of accuracy, the latter group are far more problematic due to their diverse modes of action. This review will focus on the latter class of chemical carcinogens, examining how modern '-omic' technologies have begun to identify signatures that may represent sensitive, early markers for these processes. In addition to their use in signature generation the role of -omic level approaches to delineating molecular mechanisms of action will also be discussed.