Figure - uploaded by Stephane M Meystre
Content may be subject to copyright.
2 : Measurements during the study, in all patients and in the ICU patient subgroup (Tests+prop corresponds to the potential problem list).

2 : Measurements during the study, in all patients and in the ICU patient subgroup (Tests+prop corresponds to the potential problem list).

Source publication
Article
Full-text available
The electronic problem-oriented medical record was conceived to alleviate limitations of the paper-based medical record, and to improve its organization. The list of medical problems is at the heart of this problem-oriented record, and requires completeness, accuracy and timeliness to fulfill this central role. At Intermountain Health Care (IHC), a...

Citations

... Previous studies have demonstrated the feasibility of extracting meaningful clinical data using natural language processing (NLP) tools from diagnoses [1], problem lists [2], pathology reports [3][4], and radiology reports [5][6][7][8]. These tools are not designed to handle the complexities of radiation therapy (RT) site names, which include many abbreviations specific to our field. ...
Article
Full-text available
Currently, radiation oncology-specific electronic medical records (EMRs) allow providers to input the radiation treatment site using free text. The purpose of this study is to develop a natural language processing (NLP) tool to extract encoded data from radiation treatment sites in an EMR. Treatment sites were extracted from all patients who completed treatment in our department from April 1, 2011, to April 30, 2013. A system was designed to extract the Unified Medical Language System (UMLS) concept codes using a sample of 11,018 unique site names from 31118 radiation therapy (RT) sites. Among those, 5500 unique site name strings that constitute approximately half of the sample were spared as a test set to evaluate the final system. A dictionary and calculated n-gram statistics using UMLS concepts from related semantic types were combined with manually encoded data. There was an average of 2.2 sites per patient. Prior to extraction, the 20 most common unique treatment sites were used 4215 times (38.3%). The most common treatment site was whole brain RT, which was entered using 27 distinct terms for a total of 1063 times. The customized NLP solution displayed great gains as compared to other systems, with a recall of 0.99 and a precision of 0.99. A customized NLP tool was extracting encoded data from radiation treatment sites in an EMR with great accuracy. This can be integrated into a repository of demographic, genomic, treatment, and outcome data to advance personalized oncologic care.
... Most clinical information resources including Electronic Medical Records (EMRs), Electronic Health Records (EHRs) and medical knowledge contain considerable amount of information. However, much of this information comes in unstructured form, also called free-text [4]. NLP is crucial for transforming relevant unstructured information hidden in free-text into structured information and is extremely useful in improving healthcare and advancing medicine [5]. ...
Article
Full-text available
Background: Natural language processing (NLP) has become an increasingly significant role in advancing medicine. Rich research achievements of NLP methods and applications for medical information processing are available. It is of great significance to conduct a deep analysis to understand the recent development of NLP-empowered medical research field. However, limited study examining the research status of this field could be found. Therefore, this study aims to quantitatively assess the academic output of NLP in medical research field. Methods: We conducted a bibliometric analysis on NLP-empowered medical research publications retrieved from PubMed in the period 2007-2016. The analysis focused on three aspects. Firstly, the literature distribution characteristics were obtained with a statistics analysis method. Secondly, a network analysis method was used to reveal scientific collaboration relations. Finally, thematic discovery and evolution was reflected using an affinity propagation clustering method. Results: There were 1405 NLP-empowered medical research publications published during the 10 years with an average annual growth rate of 18.39%. 10 most productive publication sources together contributed more than 50% of the total publications. The USA had the highest number of publications. A moderately significant correlation between country's publications and GDP per capita was revealed. Denny, Joshua C was the most productive author. Mayo Clinic was the most productive affiliation. The annual co-affiliation and co-country rates reached 64.04% and 15.79% in 2016, respectively. 10 main great thematic areas were identified including Computational biology, Terminology mining, Information extraction, Text classification, Social medium as data source, Information retrieval, etc. CONCLUSIONS: A bibliometric analysis of NLP-empowered medical research publications for uncovering the recent research status is presented. The results can assist relevant researchers, especially newcomers in understanding the research development systematically, seeking scientific cooperation partners, optimizing research topic choices and monitoring new scientific or technological activities.
... The section headers are often regarded as 'containers' of the clinical information providing relevant context, where coding systems and terminologies are considered as 'contents' [27]. Although many clinical NLP applications do not recognize section headers explicitly, several NLP systems can identify predefined sections among clinical notes [27][28][29][30]. ...
Article
Knowledge acquisition of relations between biomedical entities is critical for many automated biomedical applications, including pharmacovigilance and decision support. Automated acquisition of statistical associations from biomedical and clinical documents has shown some promise. However, acquisition of clinically meaningful relations (i.e. specific associations) remains challenging because textual information is noisy and co-occurrence does not typically determine specific relations. In this work, we focus on acquisition of two types of relations from clinical reports: disease-manifestation related symptom (MRS) and drug-adverse drug event (ADE), and explore the use of filtering by sections of the reports to improve performance. Evaluation indicated that applying the filters improved recall (disease-MRS: from 0.85 to 0.90; drug-ADE: from 0.43 to 0.75) and precision (disease-MRS: from 0.82 to 0.92; drug-ADE: from 0.16 to 0.31). This preliminary study demonstrates that selecting information in narrative electronic reports based on the sections improves the detection of disease-MRS and drug-ADE types of relations. Further investigation of complementary methods, such as more sophisticated statistical methods, more complex temporal models and use of information from other knowledge sources, is needed.
... Settimi shows that realizing there is no semantical difference between the sentences " the user shall view " and " the system shall display " cannot be corrected with only a thesaurus since it will not be able to distinguish between opposite viewpoints of the same action [19]. Looking at the natural language understanding (NLU) side, Meystre and Haug show that focusing on small domains using highly specified ontologies leads to satisfying results [20], [21], [22]. Using non-specified ontologies on non-limited natural language texts has not been done so far. ...
Conference Paper
Full-text available
Abstract—Automatic,model creation from textual specifications is a complex,task. We show how,ontologies can be used to improve the quality of automatically,created UML models. An evaluation of a model,transformation,from,a textual specification of the World,Chess Federation to UML is used,as an example. The resulting UML models,are substantially improved.
Thesis
Full-text available
La France connaît un vieillissement de sa population sans précédent. La part des séniors s’accroît et notre société se doit de repenser son organisation pour tenir compte de ce changement et mieux connaître cette population.De nombreuses cohortes de personnes âgées existent déjà à travers le monde dont quatre en France et, bien que la part de cette population vivant dans des structures d’hébergement collectif (EHPAD, cliniques de soins de suite) augmente, la connaissance de ces seniors reste lacunaire.Aujourd’hui les groupes privés de maisons de retraite et d’établissements sanitaires comme Korian ou Orpéa s’équipent de grandes bases de données relationnelles permettant d’avoir de l’information en temps réel sur leurs patients/résidents. Depuis 2010 les dossiers de tous les résidents Korian sont dématérialisés et accessibles par requêtes. Ils comprennent à la fois des données médico-sociales structurées décrivant les résidents et leurs traitements et pathologies, mais aussi des données textuelles explicitant leur prise en charge au quotidien et saisies par le personnel soignant.Au fil du temps et alors que le dossier résident informatisé (DRI) avait surtout été conçu comme une application de gestion de base de données, il est apparu comme une nécessité d’exploiter cette mine d’informations et de construire un outil d’aide à la décision destiné à améliorer l’efficacité des soins. L’Institut du Bien Vieillir IBV devenu entretemps la Fondation Korian pour le Bien Vieillir a alors choisi, dans le cadre d’un partenariat Public/Privé de financer un travail de recherche destiné à mieux comprendre le potentiel informatif de ces données, d’évaluer leur fiabilité et leur capacité à apporter des réponses en santé publique. Ce travail de recherche et plus particulièrement cette thèse a alors été pensée en plusieurs étapes.-D’abord l’analyse de contenu du data warehouse DRI, l’objectif étant de construire une base de données recherche, avec un versant social et un autre de santé. Ce fut le sujet du premier article.-Ensuite, par extraction directe des informations socio-démographiques des résidents dès leur entrée, de leurs hospitalisations et décès puis, par un processus itératif d’extractions d’informations textuelles de la table des transmissions et l’utilisation de la méthode Delphi, nous avons généré vingt-quatre syndromes, ajouté les hospitalisations et les décès et construit une base de données syndromique, la Base du Bien Vieillir (BBV) . Ce système d’informations d’un nouveau type a permis la constitution d’une cohorte de santé publique à partir de la population des résidents de la BBV et l’organisation d’un suivi longitudinal syndromique de celle-ci. La BBV a également été évaluée scientifiquement dans un cadre de surveillance et de recherche en santé publique au travers d’une analyse de l’existant : contenu, périodicité, qualité des données. La cohorte construite a ainsi permis la constitution d’un outil de surveillance. Cet échantillon de population a été suivi en temps réel au moyen des fréquences quotidiennes d’apparitions des 26 syndromes des résidents. La méthodologie d’évaluation était celle des systèmes de surveillance sanitaire proposée par le CDC d’Atlanta et a été utilisée pour les syndromes grippaux et les gastro entérites aiguës. Ce fut l’objet du second article.-Enfin la construction d’un nouvel outil de santé publique : la distribution de chacun des syndromes dans le temps (dates de transmissions) et l’espace (les EHPAD de transmissions) a ouvert le champ de la recherche à de nouvelles méthodes d’exploration des données et permis d’étudier plusieurs problématiques liées à la personne âgée : chutes répétées, cancer, vaccinations et fin de vie.
Article
We examine recent published research on the extraction of information from textual documents in the Electronic Health Record (EHR). Literature review of the research published after 1995, based on PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers already included. 174 publications were selected and are discussed in this review in terms of methods used, pre-processing of textual documents, contextual features detection and analysis, extraction of information in general, extraction of codes and of information for decision-support and enrichment of the EHR, information extraction for surveillance, research, automated terminology management, and data mining, and de-identification of clinical text. Performance of information extraction systems with clinical text has improved since the last systematic review in 1995, but they are still rarely applied outside of the laboratory they have been developed in. Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.
Article
Full-text available
X-rays are widely used in medical examinations. In particular, chest X-rays are the most frequent imaging test. However, observations are usually recorded in a free-text format. Therefore, it is difficult to standardize the information provided to construct a database for the sharing of clinical data. Here, we describe a simple X-ray observation entry system that can interlock with an electronic medical record system. We investigated common diagnosis indices. Based on the indices, we have designed an entry system which consists of 5 parts: 1) patient lists, 2) image selection, 3) diagnosis result entry, 4) image view, and 5) main menu. The X-ray observation results can be extracted in an Excel format. The usefulness of the proposed system was assessed in a study using over 500 patients' chest X-ray images. The data was readily extracted in a format that allowed convenient assessment. We proposed the chest X-ray observation entry system. The proposed X-ray observation system, which can be linked with an electronic medical record system, allows easy extraction of standardized clinical information to construct a database. However, the proposed entry system is limited to chest X-rays and it is impossible to interpret the semantic information. Therefore, further research into domains using other interpretation methods is required.
Article
Full-text available
To improve the completeness of an electronic problem list, we have developed a system using Natural Language Processing to automatically extract potential medical problems from clinical free-text documents; these problems are then proposed for inclusion in an electronic problem list management application. A prospective randomized controlled evaluation of this system in an intensive care unit is reported here. A total of 105 patients were randomly assigned to a control or an intervention group. In the latter, patients had their documents analyzed by the system and medical problems discovered were proposed for inclusion into their problem list. In this population, our system significantly increased the sensitivity of the problem lists, from 8.9% to 41%, and to 77.4% if problems automatically proposed but not acknowledged by users were also considered.
Article
Temporal information is crucial in electronic medical records and biomedical information systems. Processing temporal information in medical narrative data is a very challenging area. It lies at the intersection of temporal representation and reasoning (TRR) in artificial intelligence and medical natural language processing (MLP). Some fundamental concepts and important issues in relation to TRR have previously been discussed, mainly in the context of processing structured data in biomedical informatics; however, it is important that these concepts be re-examined in the context of processing narrative data using MLP. Theoretical and methodological TRR studies in biomedical informatics can be classified into three main categories: category 1 applies theories and models from temporal reasoning in AI; category 2 defines frameworks that meet needs from clinical applications; category 3 resolves issues such as temporal granularity and uncertainty.
Article
This study evaluated a computerized method for extracting numeric clinical measurements related to diabetes care from free text in electronic patient records (EPR) of general practitioners. Accuracy of this number-oriented approach was compared to manual chart abstraction. Audits measured performance in clinical practice for two commonly used electronic record systems. Numeric measurements embedded within free text of the EPRs constituted 80% of relevant measurements. For 11 of 13 clinical measurements, the study extraction method was 94%-100% sensitive with a positive predictive value (PPV) of 85%-100%. Post-processing increased sensitivity several points and improved PPV to 100%. Application in clinical practice involved processing times averaging 7.8 minutes per 100 patients to extract all relevant data. The study method converted numeric clinical information to structured data with high accuracy, and enabled research and quality of care assessments for practices lacking structured data entry.