Figure - uploaded by Eitan Rubin
Content may be subject to copyright.
Age-range classes defined by the Age Ontology. Age-range classes were generally defined based on the MeSH age range definitions, with minor changes introduced to improve consistency. In the Age Ontology, age-range classes that are contained within another age-range class are defined as subclasses of that class. Each age-range class is made disjoint from all other age-range classes since each specific age can belong to only one age range class. * This age range was modified from the MeSH definition.

Age-range classes defined by the Age Ontology. Age-range classes were generally defined based on the MeSH age range definitions, with minor changes introduced to improve consistency. In the Age Ontology, age-range classes that are contained within another age-range class are defined as subclasses of that class. Each age-range class is made disjoint from all other age-range classes since each specific age can belong to only one age range class. * This age range was modified from the MeSH definition.

Source publication
Article
Full-text available
Currently, data about age-phenotype associations are not systematically organized and cannot be studied methodically. Searching for scientific articles describing phenotypic changes reported as occurring at a given age is not possible for most ages. Here we present the Age-Phenome Knowledge-base (APK), in which knowledge about age-related phenotypi...

Context in source publication

Context 1
... Age Ontology is a very simple ontol- ogy, developed specifically for this research, that allows ages to be represented by grouping them into classes similar to those defined in the Medical Subject Headings (MeSH) controlled vocabulary [10]. In the Age Ontol- ogy, age is defined as the time which has passed since birth and is an attribute of a person (Figure 1). Minor changes were introduced to the MeSH age definitions in order to improve their formal logic consistency. ...

Similar publications

Article
Full-text available
Recent developments within the digital cultural heritage research community show more enthusiastic conversation among scholars in digital humanities, anthropology, history and allied disciplines. This study contributes to these debates by presenting some aspect of the theoretical, methodological and technical issues explored in a digital literary p...
Conference Paper
Full-text available
Systematic literature reviews (SLRs) have been gaining a significant amount of attention from Software Engineering researchers since 2004. SLRs are considered to be a new research methodology in Software Engineering, which allow evidence to be gathered with regard to the usefulness or effectiveness of the technology proposed in Software Engineering...
Conference Paper
Full-text available
The researchers explore the intersections between Information Assurance and Risk using visual analysis of text mining operations. The methodological approach involves searching for and extracting for analysis those abstracts and keywords groupings that relate to risk within a defined subset of scientific research journals. This analysis is conducte...
Conference Paper
Full-text available
Text Mining and NLP techniques are a hot topic nowadays. Researchers thrive to develop new and faster algorithms to cope with larger amounts of data. Particularly, text data analysis has been increasing in interest due to the growth of social networks media. Given this, the development of new algorithms and/or the upgrade of existing ones is now a...
Article
Full-text available
Due to the large amount of available patent data, it is no longer feasible for industry actors to manually create their own terminology lists and ontologies. Furthermore, domain specific the-sauruses are rarely accessible to the research community. In this paper we present extraction of hyponymy lexical relations conducted on patent text using lexi...

Citations

... Additional vocabularies included disease ontology; human phenotype ontology; ontology of adverse events; chemical entities of biological interest; comparative toxicogenomics database chemical and disease subclasses; clinical trials ontology; gender, sex, and sexual orientation ontology; chemotherapy toxicities ontology; cancer care: treatment outcomes ontology; symptoms ontology; nonpharmacological interventions ontology; and nursing care coordination ontology. [24][25][26][27][28][29][30][31][32][33][34] ReGeX and heuristics like POS tag cues were used to capture recurring class-specific patterns otherwise not captured by standardized terminologies. Vocabularies are structured, standardized data sources that do not capture writing variations from clinical literature and custom-built ReGeX are restricted by either task or entity type. ...
Article
Full-text available
Objective The aim of this study was to test the feasibility of PICO (participants, interventions, comparators, outcomes) entity extraction using weak supervision and natural language processing. Methodology We re-purpose more than 127 medical and nonmedical ontologies and expert-generated rules to obtain multiple noisy labels for PICO entities in the evidence-based medicine (EBM)-PICO corpus. These noisy labels are aggregated using simple majority voting and generative modeling to get consensus labels. The resulting probabilistic labels are used as weak signals to train a weakly supervised (WS) discriminative model and observe performance changes. We explore mistakes in the EBM-PICO that could have led to inaccurate evaluation of previous automation methods. Results In total, 4081 randomized clinical trials were weakly labeled to train the WS models and compared against full supervision. The models were separately trained for PICO entities and evaluated on the EBM-PICO test set. A WS approach combining ontologies and expert-generated rules outperformed full supervision for the participant entity by 1.71% macro-F1. Error analysis on the EBM-PICO subset revealed 18–23% erroneous token classifications. Discussion Automatic PICO entity extraction accelerates the writing of clinical systematic reviews that commonly use PICO information to filter health evidence. However, PICO extends to more entities—PICOS (S—study type and design), PICOC (C—context), and PICOT (T—timeframe) for which labelled datasets are unavailable. In such cases, the ability to use weak supervision overcomes the expensive annotation bottleneck. Conclusions We show the feasibility of WS PICO entity extraction using freely available ontologies and heuristics without manually annotated data. Weak supervision has encouraging performance compared to full supervision but requires careful design to outperform it.
... Patients were assigned into three groups according to their age: young group (<45 years), middle-age group (45-65 years), and old group (>65) based on similar studies on the same subject. [23][24][25] ...
Article
Full-text available
Background: ST-segment elevation myocardial infarction (STEMI) and non-ST-segment elevation myocardial infarction (NSTEMI ) are common types of acute coronary syndrome which are associated with the risk factors of age, obesity, hypertension, and diabetes. Objective: The present study aimed to examine the effects of age on the risk factors and clinical symptoms of acute coronary syndrome. Methods: A cross-section prospective study was conducted on 125 patients with acute coronary syndrome chosen by non-probability convenience sampling method in the coronary care unit in Sulaimani, the Kurdistan region of Iraq. Acute coronary syndrome types were diagnosed through clinical presentations, electrocardiography (ECG), and troponin test. Data was collected using a researcherbased checklist through face-to-face interviews. Results: The results indicated that males were the dominant group. The age group 45-65 had the highest prevalence rate of acute coronary syndrome. The most frequent risk factors for acute coronary syndrome were hypertension (54.4%), dyslipidemia (52%), smoking (42.4%), and diabetes mellitus (38.4%). Typical chest pain was found to be the most frequent clinical presentation (88%). There was a significant difference between the age groups in terms of the effect of age on typical and atypical symptoms; however, neither age nor typical/atypical symptoms had a significant effect on type of acute coronary syndrome. Similarly, family history, hypertension, diabetes mellitus, obesity, smoking, physical inactivity, and dyslipidemia had no effect on type of acute coronary syndrome. Conclusion: Age is a predictive factor for acute coronary syndrome, but family history, hypertension, diabetes mellitus, obesity, smoking, physical inactivity, and dyslipidemia cannot predict acute coronary syndrome.
... 2 On the other hand, other systems rely on a single data source to create a knowledge base or multiple knowledge bases that are related to a particular domain. For instance, Geifman and Rubin, propose to model and store knowledge about age-related phenotypic patterns and events in an Age-Phenome Knowledge Base (APK) [16]. Another example is the terrorism knowledge base, which contains all relevant knowledge about terrorist groups, their members, leaders, affiliations, and full descriptions of specific terrorist events [11]. ...
Article
Full-text available
With the development of the Semantic Web (SW), the creation of ontologies to formally conceptualize our understanding of various domains has widely increased in number. However, the conceptual and terminological differences (a.k.a semantic heterogeneity problem) between ontologies form a major limiting factor towards their use/reuse and full adoption in practical settings. A key solution to addressing this problem can be through identifying semantic correspondences between the entities (including concepts, relations, and instances) of heterogeneous ontologies, and consequently achieving interoperability between them. This process is also known as ontology alignment. The output of this process can be further exploited to merge ontologies into a single coherent ontology. Indeed, this is widely regarded as a crucial, yet difficult task, specifically when dealing with heavyweight ontologies that consist of hundreds of thousands of concepts. To address this issue, various ontology merging approaches have been proposed. These approaches can be classified into three categories: single-strategy-based approaches, multiple-strategy-based approaches, and approaches based on exploiting external semantic resources. In this paper, we first discuss the strengths and limitations of each of these approaches, and then present our framework for addressing the semantic heterogeneity problem through merging domain-specific ontologies based on multiple external semantic resources. The novelty of the proposed approach is mainly based on employing knowledge represented by multiple external resources (knowledge bases in our work) to make aggregated decisions on the semantic correspondences between the entities of heterogeneous ontologies. Other important issues that we attempt to tackle in the proposed framework are: (i) Identifying and handling inconsistency of semantic relations between the ontology concepts and, (ii) Handling the issue of missing background knowledge (such as concepts and instances) in the exploited knowledge bases by utilizing an integrated statistical and semantic technique. Additionally, the proposed solution soundly enriches the knowledge bases with missing background knowledge, and thus enables the reuse of the newly obtained knowledge in future ontology merging tasks. To validate our proposal, we tested the framework using the OAEI 2009 benchmark and compared the produced results with state-of-the-art syntactic and semantic based systems. In addition, we utilized the proposed techniques to merge three heavyweight ontologies from the environmental domain.
... Recently, we developed the Age-Phenome Knowledge base (APK) that holds a structured representation of knowledge derived from the scientific literature and clinical data regarding clinicallyrelevant traits that occur at different ages [11]. The database underpinning the APK contains over 35,000 entries that describe relationships between age and disease, which were mined from over 1.5 million PubMed abstracts [12]. ...
... The database developed was conducted in MySQL [14]. Similarly to the APK [11], the mouse-APK database comprises of three main tables: (i) an evidence table that contains evidence instances (e.g. text fragments from abstracts) and their description; (ii) an evidence-age table that contains a description of the age information found in each evidence instance; and (iii) an evidencephenotype table that links phenotypes to each evidence instance. ...
Article
Full-text available
Similarities between mice and humans lead to generation of many mouse models of human disease. However, differences between the species often result in mice being unreliable as preclinical models for human disease. One difference that might play a role in lowering the predictivity of mice models to human diseases is age. Despite the important role age plays in medicine, it is too often considered only casually when considering mouse models. We developed the mouse-Age Phenotype Knowledgebase, which holds knowledge about age-related phenotypic patterns in mice. The knowledgebase was extensively populated with literature-derived data using text mining techniques. We then mapped between ages in humans and mice by comparing the age distribution pattern for 887 diseases in both species. The knowledgebase was populated with over 9800 instances generated by a text-mining pipeline. The quality of the data was manually evaluated, and was found to be of high accuracy (estimated precision >86%). Furthermore, grouping together diseases that share similar age patterns in mice resulted in clusters that mirror actual biomedical knowledge. Using these data, we matched age distribution patterns in mice and in humans, allowing for age differences by shifting either of the patterns. High correlation (r(2)>0.5) was found for 223 diseases. The results clearly indicate a difference in the age mapping between different diseases: age 30 years in human is mapped to 120 days in mice for Leukemia, but to 295 days for Anemia. Based on these results we generated a mice-to-human age map which is publicly available. We present here the development of the mouse-APK, its population with literature-derived data and its use to map ages in mice and human for 223 diseases. These results present a further step made to bridging the gap between humans and mice in biomedical research.
... (.0165 95% CI) for MetaMap for exact matches of mentions [16]. In a separate study, HealthTermFinder has been shown to have sensitivity of 91% and specificity of 86% [20] for identifying medical concepts in Pubmed abstracts. BeOK answers were analyzed using an in-house Hebrew NLP pipeline for mapping to UMLS [21]. ...
Article
Full-text available
Online Consumer Health websites are a major source of information for patients worldwide. We focus on another modality, online physician advice. We aim to evaluate and compare the freely available online expert physicians' advice in different countries, its scope and the type of content provided. Using automated methods for information retrieval and analysis, we compared consumer health portals from the US, Canada, the UK and Israel (WebMD,NetDoctor,AskTheDoctor and BeOK). The evaluated content was generated between 2002 and 2011. We analyzed the different sites, looking at the distribution of questions in the various health topics, answer lengths and content type. Answers could be categorized into longer broad-educational answers versus shorter patient-specific ones, with different physicians having personal preferences as to answer type. The Israeli website BeOK, providing 10 times the number of answers than in the other three health portals, supplied answers that are shorter on average than in the other websites. Response times in these sites may be rapid with 32% of the WebMD answers and 64% of the BeOK answers provided in less than 24 hours. The voluntary contribution model used by BeOK and WebMD enables generation of large numbers of physician expert answers at low cost, providing 50,000 and 3,500 answers per year, respectively. Unlike health information in online databases or advice and support in patient-forums, online physician advice provides qualified specialists' responses directly relevant to the questions asked. Our analysis showed that high numbers of expert answers could be generated in a timely fashion using a voluntary model. The length of answers varied significantly between the internet sites. Longer answers were associated with educational content while short answers were associated with patient-specific content. Standard site-specific guidelines for expert answers will allow for more desirable content (educational content) or better throughput (patient-specific content).
... While much data concerning disease and age exist, such information was not systematically organized and only of late became available for research. Recently, we developed the Age-Phenome Knowledge-base (APK) that holds a structured representation of knowledge derived from the scientific literature and clinical data regarding clinically-relevant traits and trends that occur at different ages, such as disease symptoms and propensity (Geifman and Rubin 2011). The database underpinning the APK contains over 35,000 entries that describe relationships between age and disease and were mined from over 1.5 million PubMed abstracts (Geifman and Rubin 2012). ...
Article
Age is an important factor when considering phenotypic changes in health and disease. Currently, the use of age information in medicine is somewhat simplistic, with ages commonly being grouped into a small number of crude ranges reflecting the major stages of development and aging, such as childhood or adolescence. Here, we investigate the possibility of redefining age groups using the recently developed Age-Phenome Knowledge-base (APK) that holds over 35,000 literature-derived entries describing relationships between age and phenotype. Clustering of APK data suggests 13 new, partially overlapping, age groups. The diseases that define these groups suggest that the proposed divisions are biologically meaningful. We further show that the number of different age ranges that should be considered depends on the type of disease being evaluated. This finding was further strengthened by similar results obtained from clinical blood measurement data. The grouping of diseases that share a similar pattern of disease-related reports directly mirrors, in some cases, medical knowledge of disease-age relationships. In other cases, our results may be used to generate new and reasonable hypotheses regarding links between diseases.
... More selective approaches for the identification of relations include the extraction of specific types of statements, such as those referring to causal relations between mutations and diseases 50 , interactions between genes and proteins 51,52 and relations between environmental features and diseases 53 . To carry out this task, pattern-based approaches are used 54 as well as more sophisticated solutions, such as machine learning, statistical analyses and formal inference 4 . ...
... Building knowledge bases from literature content As well as using the results of information extraction approaches directly for scientific analyses, they can also be automatically deposited into databases or used to support manual database curation efforts 53,73,74 . A large number of databases now use text mining to gather their data. ...
Article
In response to the unbridled growth of information in literature and biomedical databases, researchers require efficient means of handling and extracting information. As well as providing background information for research, scientific publications can be processed to transform textual information into database content or complex networks and can be integrated with existing knowledge resources to suggest novel hypotheses. Information extraction and text data analysis can be particularly relevant and helpful in genetics and biomedical research, in which up-to-date information about complex processes involving genes, proteins and phenotypes is crucial. Here we explore the latest advancements in automated literature analysis and its contribution to innovative research approaches.
... On the other hand, other systems rely on a single data source to create the knowledge base or create knowledge bases that are related to a particular domain. For instance, Geifman and Rubin, proposed to model and store knowledge about age-related phenotypic patterns and events in an Age-Phenome Knowledge Base (APK) [3]. Another example is the terrorism knowledge base, which contains all relevant knowledge about terrorist groups, their members, leaders, affiliations, and full descriptions of specific terrorist events [4]. ...
Conference Paper
Full-text available
Manual construction and maintenance of general-purpose knowledge bases forms a major limiting factor towards their full adoption, use and reuse in practical settings. In this paper, we present KnowBase, a system for automatic knowledge base construction from heterogeneous data sources including domain-specific ontologies, general-purpose ontologies, plain texts, and image and video captions, which are automatically extracted from WebPages. In our approach, several information extraction techniques are integrated to automatically create, enrich, and keep the knowledge base up to date. Consequently, knowledge represented by the produced knowledge base can be employed in several application domains. In our experiments, we used the produced knowledge base as an external resource to align heterogeneous ontologies from the environmental and agricultural domains. The produced results demonstrate the effectiveness of the used knowledge base in finding corresponding entities between the used ontologies.
... As a result of these investigations, a significant quantity of data exists linking specific ages or age ranges with disease, as well as with other clinical phenotypes, such as 'normal' parameter values from blood tests. We have previously described the Age-Phenome Knowledge-base (APK) in which knowledge about agerelated phenotypic patterns and events can be modelled and stored for retrieval (Geifman and Rubin 2011). The knowledge-base holds a structured representation of knowledge, derived from scientific literature and clinical data, about clinically-relevant traits and trends which occur at different ages, such as disease symptoms and propensity. ...
... We have previously described the Age-Phenome Wiki, an interface which was developed as a means to share the knowledge stored in the APK and to harness the knowledge of the user community for further enriching the knowledge in APK (Geifman and Rubin 2011). The knowledge shared via this wiki has been updated but due to the large amount of data now available in the APK, only a subset of the knowledge is made available using this platform. ...
Article
Full-text available
Data linking specific ages or age ranges with disease are abundant in biomedical literature. However, these data are organized such that searching for age-phenotype relationships is difficult. Recently, we described the Age-Phenome Knowledge-base (APK), a computational platform for storage and retrieval of information concerning age-related phenotypic patterns. Here, we report that data derived from over 1.5 million human-related PubMed abstracts have been added to APK. Using a text-mining pipeline, 35,683 entries which describe relationships between age and phenotype (such as disease) have been introduced into the database. Comparing the results to those obtained by a human reader reveals that the overall accuracy of these entries is estimated to exceed 80%. The usefulness of these data for obtaining new insight regarding age-disease relationships is demonstrated using clustering analysis, which is shown to capture obvious, as well as potentially interesting relationships between diseases. In addition, a new tool for browsing and searching the APK database is presented. We thus present a unique resource and a new framework for studying age-disease relationships and other phenotypic processes.
... Further research is required to better understand how and when age effects need to be corrected for. Toward this goal, we have recently described a database for recording age-disease interactions from biomedical publications [17]. With this resource and others, more refined models for age correction could be developed that will further enhance the power of unsupervised learning in clinical data. ...
Article
Full-text available
It has been proposed that clustering clinical markers, such as blood test results, can be used to stratify patients. However, the robustness of clusters formed with this approach to data pre-processing and clustering algorithm choices has not been evaluated, nor has clustering reproducibility. Here, we made use of the NHANES survey to compare clusters generated with various combinations of pre-processing and clustering algorithms, and tested their reproducibility in two separate samples. Values of 44 biomarkers and 19 health/life style traits were extracted from the National Health and Nutrition Examination Survey (NHANES). The 1999-2002 survey was used for training, while data from the 2003-2006 survey was tested as a validation set. Twelve combinations of pre-processing and clustering algorithms were applied to the training set. The quality of the resulting clusters was evaluated both by considering their properties and by comparative enrichment analysis. Cluster assignments were projected to the validation set (using an artificial neural network) and enrichment in health/life style traits in the resulting clusters was compared to the clusters generated from the original training set. The clusters obtained with different pre-processing and clustering combinations differed both in terms of cluster quality measures and in terms of reproducibility of enrichment with health/life style properties. Z-score normalization, for example, dramatically improved cluster quality and enrichments, as compared to unprocessed data, regardless of the clustering algorithm used. Clustering diabetes patients revealed a group of patients enriched with retinopathies. This could indicate that routine laboratory tests can be used to detect patients suffering from complications of diabetes, although other explanations for this observation should also be considered. Clustering according to classical clinical biomarkers is a robust process, which may help in patient stratification. However, optimization of the pre-processing and clustering process may be still required.