Article

Comparing a Rule-Based Versus Statistical System for Automatic Categorization of MEDLINE Documents According to Biomedical Specialty

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings(®) (MeSH(®)) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI) based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for one hundred MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures, performance is comparable, and for one measure, JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule based) might be combined and then evaluated showing they are complementary to one another.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... A partir das escolhas dos voluntários foi criado um ranking de relevância de categorias [Humphrey et al. 2009Sebastiani 2002, formado pela quantidade de seleções de cada categoria para cada página por todos os avaliadores. A categoria com mais seleções pelos avaliadores para uma página foi considerada como sua primeira categoria do ranking [Rosso 2005], e assim foi feito para cada posição do ranking, até a quinta posição. ...
... Os resultados dos classificadores de padrões utilizados em trabalhos anteriores [Sousa 2011, Sousa et al. 2012] foram resumidos na Tabela 1. Os classificadores utilizados foram: Naive Bayes [John and Langley 1995], com extração de atributos por: ocorrência do termo (nb-to), ocorrência binária (nb-bo), frequência dos termos (nb-tf), e tf.idf (nb-tfidf) [Salton and Buckley 1988]; e Journal Descriptor Indexing [Humphrey et al. 2009], com extração de atributos por contagem de coocorrência por palavras (jdiwc), e contagem de coocorrência por documentos (jdi-dc) [Humphrey et al. 2009]. Na Figura 1 é mostrado o comportamento da revocação para as 5 posições mais relevantes do ranking de relevância de categorias, tanto para a classificação humana quanto para a automática. ...
... Os resultados dos classificadores de padrões utilizados em trabalhos anteriores [Sousa 2011, Sousa et al. 2012] foram resumidos na Tabela 1. Os classificadores utilizados foram: Naive Bayes [John and Langley 1995], com extração de atributos por: ocorrência do termo (nb-to), ocorrência binária (nb-bo), frequência dos termos (nb-tf), e tf.idf (nb-tfidf) [Salton and Buckley 1988]; e Journal Descriptor Indexing [Humphrey et al. 2009], com extração de atributos por contagem de coocorrência por palavras (jdiwc), e contagem de coocorrência por documentos (jdi-dc) [Humphrey et al. 2009]. Na Figura 1 é mostrado o comportamento da revocação para as 5 posições mais relevantes do ranking de relevância de categorias, tanto para a classificação humana quanto para a automática. ...
Article
Lay people show difficult when they look for health information on Web. This study evaluated the adequacy of automatic multi-label suggestion for health web pages in Brazilian Portuguese language. We collected 57 health web pages and asked 21 volunteers to evaluate them. We measured the recall, consensus between evaluators and consensus between evaluators and automatic classifiers. Recall reached 100%, with high consensus between evaluators to the 5 most relevant categories, suggesting that the automatic multi-labeling of health Web pages helps information retrieval by lay people. Resumo. Pessoas leigas apresentam dificuldades quando procuram por informações sobre saúde na web. Este estudo avaliou a adequação da sugestão automática de multirrótulos para páginas web de saúde em português brasileiro. Foram coletadas 57 páginas web de saúde e convidados 21 voluntários para a classificação manual. Mensurou-se a revocação, consenso entre avaliadores, e consenso entre avaliadores e classificadores automáticos Naive Bayes e Journal Descriptor Indexing. A revocação atingiu 100%, com alto consenso entre avaliadores para as 5 categorias mais relevantes, sugerindo que a multirrotulação automática de páginas web de saúde colabora com recuperação de informação por pessoas leigas.
... It uses predefined patterns they classify the question type to determine the answer [13] [14].The main drawback of this system is creating the rules need manually. An in depth knowledge and the syntax of a language is needed to write the rules manually [15]. The importance of statistical approaches for QA systems has been increased due to the tremendous growth of text material which is available online. ...
... 14 Sandra moved to the kitchen. 15 Where is Sandra? kitchen 14 ...
Article
Full-text available
Question Answering (QA) system is a field of Natural language processing, which allows users to ask questions using the natural language sentence and return a brief answer to the users rather than a list of documents. This work intends to use a Recurrent Neural Network (RNN) based Deep Learning algorithms in order to solve the Question Answering System problem. The use of recurrent neural networks allows us to expand and apply this model to a variety of question answering tasks. In this work, a simple RNN based Question Answering System is implemented and its performance is evaluated with a simple and complex question answering tasks using bAbI dataset. The performance of training and testing with suitable metrics is studied and the difference in performance in the two question answering tasks is observed.
... Biomedical information extraction (BMIE) has captured the interests of researchers in recent years. The literature on BMIE has been primarily focused on the detection of biomedical topics [1][2][3] and extraction of gene and protein information from text. [4][5][6] Although the volume of publications on disease and chemical compound detection in unstructured text has been steadily growing, [7] there are very few studies published on detecting diagnostic laboratory information and related biomedical named entities. ...
... Current methods and tools have been tested on various biomedical entities. [1][2][3][4][5][6][7] Even though laboratory test information are crucial components of any clinical narrative, the authors could not find in the literature a single pathology informatics study on detecting and extracting laboratory test information from narrative text. ...
Article
Full-text available
No previous study reported the efficacy of current natural language processing (NLP) methods for extracting laboratory test information from narrative documents. This study investigates the pathology informatics question of how accurately such information can be extracted from text with the current tools and techniques, especially machine learning and symbolic NLP methods. The study data came from a text corpus maintained by the U.S. Food and Drug Administration, containing a rich set of information on laboratory tests and test devices. THE AUTHORS DEVELOPED A SYMBOLIC INFORMATION EXTRACTION (SIE) SYSTEM TO EXTRACT DEVICE AND TEST SPECIFIC INFORMATION ABOUT FOUR TYPES OF LABORATORY TEST ENTITIES: Specimens, analytes, units of measures and detection limits. They compared the performance of SIE and three prominent machine learning based NLP systems, LingPipe, GATE and BANNER, each implementing a distinct supervised machine learning method, hidden Markov models, support vector machines and conditional random fields, respectively. Machine learning systems recognized laboratory test entities with moderately high recall, but low precision rates. Their recall rates were relatively higher when the number of distinct entity values (e.g., the spectrum of specimens) was very limited or when lexical morphology of the entity was distinctive (as in units of measures), yet SIE outperformed them with statistically significant margins on extracting specimen, analyte and detection limit information in both precision and F-measure. Its high recall performance was statistically significant on analyte information extraction. Despite its shortcomings against machine learning methods, a well-tailored symbolic system may better discern relevancy among a pile of information of the same type and may outperform a machine learning system by tapping into lexically non-local contextual information such as the document structure.
... These rules are usually have been written manually based on the lexical and syntactic structure of the question and the paragraph text which contains the answer. In order to produce such rules, a deep understanding of the languages is required (Humphrey et al., 2009). With increasing the volume of the documents on the Internet, statistical approaches were considered the dominant approach to the QA problem. ...
Article
Full-text available
Nowadays, a considerable volume of news articles is produced daily by news agencies worldwide. Since there is an extensive volume of news on the web, finding exact answers to the users’ questions is not a straightforward task. Developing Question Answering (QA) systems for the news articles can tackle this challenge. Due to the lack of studies on Persian QA systems and the importance and wild applications of QA systems in the news domain, this research aims to design and implement a QA system for the Persian news articles. This is the first attempt to develop a Persian QA system in the news domain to our best knowledge. We first create FarsQuAD: a Persian QA dataset for the news domain. We analyze the type and complexity of the users’ questions about the Persian news. The results show that What and Who questions have the most and Why and Which questions have the least occurrences in the Persian news domain. The results also indicate that the users usually raise complex questions about the Persian news. Then we develop FarsNewsQA: a QA system for answering questions about Persian news. We developed three models of the FarsNewsQA using BERT, ParsBERT, and ALBERT. The best version of the FarsNewsQA offers an F1 score of 75.61%, which is comparable with that of QA system on the English SQuAD dataset made by the Stanford university, and shows the new Bert-based technologies works well for Persian news QA systems.
... The main disadvantage of this method is time consuming. [14] [15]. ...
Article
Full-text available
Question Answering (QA) system is a field of Natural language processing, in which the users can post query in their own languages. The system also gives precise answer instead of list of documents. A memory network has the ability to perform reasoning with inference components and long-term memory component. The two components are used efficiently to find the answers from the story context for a given query. In our earlier work [26] we evaluated the performance of MemN2N network with complex and easy question answering tasks and found that the MemN2N fail to produce good results with some complex QA tasks of bAbI dataset. This work intends to improve the performance with a state of the art Bi-Model end to end memory network (BiMemN2N_I) model for such complex QA tasks and compare its performance with the standard MemN2N model and MemNN models. In this work, a Bi-model MemN2N Network based question answering system is implemented and its performance is evaluated with a complex question answering tasks from bAbI dataset. In addition, the performance of training and testing with suitable metrics are studied and identified the difference in the performance of two question answering tasks.
... The difficulty in this approach was the rules should be written. The user needs an in-depth knowledge about the structure and semantics of a language to write the rules [15]. ...
Article
Full-text available
Question Answering (QA) system is a field of Natural language processing, which allows users to post questions in natural language sentence and return a short and precise answer to the users rather than a set of documents. This work aims to evaluate three deep learning models RNN, LSTM and GRU on question answering tasks. The use of deep learning networks allows us to expand and apply these models to a variety of question answering tasks. In this work, we implement three deep learning model based question answering systems and evaluate their performance with a simple and complex question answering tasks from bAbI dataset. We will study the performance of training and testing with suitable metrics and find the difference in performance in the two question answering tasks.
... A major drawback of rule-based question answering systems are it should be written manually and also this consumes lot of time [14].To write the rules manually, an in-depth knowledge about the structure of language is needed. [15]. ...
Article
Full-text available
Question Answering (QA) system is a field of Natural language processing, which allows users to ask questions using the natural language sentence and return a brief answer to the users rather than a list of documents. Memory networks are capable of reasoning with inference components combined with a long-term memory component and they learn how to use these two components in an efficient way to predict answers from the story text for a specific question. This work intends to evaluate the performance of an earlier keras implementation of memory network (MemNN) model and compare its performance with three standard deep learning models RNN, LSTM and GRU. In this work, we implement a Keras implementation of MemNN model based question answering systems and evaluate their performance with a simple and complex question answering tasks from bAbI dataset. We will study the performance of training and testing with suitable metrics and find the difference in performance in the two question answering tasks.
... A major drawback of rule-based question answering systems was that the heuristic rules needed to be manually crafted. To devise these rules an in-depth knowledge of the semantics of a language was a necessity [5]. With the rapid growth of text material available online the importance of statistical approaches for QA has also increased. ...
Preprint
Question Answering has recently received high attention from artificial intelligence communities due to the advancements in learning technologies. Early question answering models used rule-based approaches and moved to the statistical approach to address the vastly available information. However, statistical approaches are shown to underperform in handling the dynamic nature and the variation of language. Therefore, learning models have shown the capability of handling the dynamic nature and variations in language. Many deep learning methods have been introduced to question answering. Most of the deep learning approaches have shown to achieve higher results compared to machine learning and statistical methods. The dynamic nature of language has profited from the nonlinear learning in deep learning. This has created prominent success and a spike in work on question answering. This paper discusses the successes and challenges in question answering question answering systems and techniques that are used in these challenges.
... To build the SDI classifier, we used the approach of the Journal Descriptor Indexing (JDI) (13) , developed by the National Library of Medicine (NLM). The JDI is an automatic categorization tool of texts that present significant results for classification and indexing of scientific articles. ...
Article
Full-text available
Objective: To present results of a sentiment classification methodology, here denominated Sentiment Descriptor Indexing (SDI), to be applied in Brazilian Portuguese Twitter's messages related to health topics. Methods: The first step considered the construction of an algorithm that is based on the co-occurrence of Twitter terms with sentiment descriptor vocabulary known as ANEW-BR. In the second stage, an evaluation of SDI algorithm performance for messages about "cancer" of a period of three weeks was performed. The ratings were paired, to generate a performance appraisal.Results: The precision and recall values were 0.68 and 0.67, respectively. A total of 25,230 messages on the topic "cancer" with a positive feeling classification (71%) were collected. Conclusion: The contributions of this work aim to fill the lack of methods of analysis of feelings for the Portuguese Portuguese language. RESUMO Objetivo: Apresentar os resultados de uma metodologia de classificação de sentimento, aqui denominada Sentiment Descriptor Indexing (SDI), para aplicar em mensagens do Twitter em português brasileiro relacionadas a temas de saúde. Métodos: A primeira etapa considerou a construção do algoritmo SDI que se baseia na coocorrência de termos do Twitter com descritores do vocabulário ANEW-BR. Na segunda etapa foi realizada uma avaliação do desempenho do algoritmo SDI para mensagens sobre o tema "câncer" de um período de três semanas. As mensagens foram classificadas por voluntários e em paralelo pelo SDI. As classificações foram pareadas gerando uma avaliação de desempenho. Resultados: Os valores de precisão e recuperação resultaram 0,68 e 0,67 respectivamente. Coletou-se um total de 25.230 mensagens sobre o tema "câncer" com classificação de sentimento positiva (71%). Conclusão: As contribuições deste trabalho visam suprir a falta de métodos de análise de sentimentos para a língua portuguesa brasileira. RESUMEN Objetivo: Presentar los resultados de una sensación de metodología de clasificación, aquí se llama Sentiment Descriptor Indexing (SDI), para aplicar en los mensajes de Twitter relacionados con temas de salud por el idioma portugués de Brasil. Métodos: El primer paso considera la construcción del algoritmo de SDI que se basa en la co-ocurrencia de términos do Twitter con el vocabulario de sentimiento conocido como ANEW-BR. En la segunda etapa se llevó a cabo una evaluación del rendimiento de los algoritmos SDI para los mensajes sobre "cáncer" de un período de tres semanas. Los mensajes se ordenan por voluntarios y en paralelo por SDI. Las clasificaciones fueron emparejados generar una evaluación de desempeño. Resultados: Los valores de precisión y recuperación fueron 0,68 y 0,67, respectivamente. Se recogió un total de 25.230 mensajes sobre "cáncer" con la clasificación de sentimiento positivo (71%). Conclusión: Las contribuciones de este trabajo tienen como objetivo hacer frente a la falta de métodos de análisis de sentimientos por el idioma portugués de Brasil.
... Probable reasons for the sustained interest of the LIS professionals in the classification could be that they (and other professionals, particularly information technology professionals) have found and are finding newer applications of classification. These newer applications include use of classification in text categorisation/ automatic classification [18][19][20] ; in the management web contents [21][22][23][24] ; in organising resources in the institutional repositories 25 ; in resource discovery from internet 26 . The other new areas where classification is applied are the creation and maintenance of semantic web tools such as taxonomy 27 , ontology [28][29] , folksonomy 30 . ...
Article
Full-text available
This paper analyses the literature of classification published during 2000 to 2009 and finds that there is sustainability in the growth of literature on classification in the first decade of the 21st century. It traces the pattern in scattering of literature on classification in library and information science (LIS) journals and concludes that the literature adheres to the Bradford’s law of scattering. It produces rank list of journals publishing the literature on classification and identifies authorship patterns and the prominent writers in classification. The research finds that the Indian LIS writers have shown sustained interest in classification domain.
... These techniques can be divided into two main streams: the rule-based approach (Friedman et al, 2004;Hahn, Romacker & Schulz, 2002) and the statistical approach (Taira & Soderland, 1999;Sebastiani, 2002). A comparison between the two methods involved the testing of systems using both approaches to the automatic categorization of MEDLINE abstracts (Humphrey et al, 2009) and found comparable results for most evaluated items. The results favored the statistical approach, though the authors suggested the combination of both approaches. ...
Article
Full-text available
The activities of organizing knowledge recorded in texts and obtaining knowledge from human experts – the knowledge acquisition process – are essential for scientific development. In this article, we propose methodological steps for knowledge acquisition, which have been applied to the construction of biomedical ontologies. The methodologi-cal steps are tested in a real case of knowledge acquisition in the do-main of the human blood. We hope to contribute to the improvement of knowledge acquisition for the representation of scientific knowledge in ontologies.
... Nevertheless, the semantic links where created based on the known how of professional librarians and medical experts, with the help of the Network of NLM using the Medlib-L listserv. Furthermore, this validity was recently compared to NLM Journal Descriptors to categorize scientific articles and no significant difference was observed [7]. ...
Article
Full-text available
General practitioners and medical specialists mainly rely on one "general medical" journal to keep their medical knowledge up to date. Nevertheless, it is not known if these journals display the same overview of the medical knowledge in different specialties. The aims of this study were to measure the relative weight of the different specialties in the major journals of general medicine, to evaluate the trends in these weights over a ten-year period and to compare the journals. The 14,091 articles published in The Lancet, the NEJM, the JAMA and the BMJ in 1997, 2002 and 2007 were analyzed. The relative weight of the medical specialities was determined by categorization of all the articles, using a categorization algorithm which inferred the medical specialties relevant to each article MEDLINE file from the MeSH terms used by the indexers of the US National Library of Medicine to describe each article. The 14,091 articles included in our study were indexed by 22,155 major MeSH terms, which were categorized into 81 different medical specialties. Cardiology and Neurology were in the first 3 specialties in the 4 journals. Five and 15 specialties were systematically ranked in the first 10 and first 20 in the four journals respectively. Among the first 30 specialties, 23 were common to the four journals. For each speciality, the trends over a 10-year period were different from one journal to another, with no consistency and no obvious explanatory factor. Overall, the representation of many specialties in the four journals in general and internal medicine included in this study may differ, probably due to different editorial policies. Reading only one of these journals may provide a reliable but only partial overview.
... For example, MetaMap [29] developed at the National Library of Medicine maps natural language text to UMLS concepts. In another study [30], techniques for correlating descriptors within MeSH to biomedical scientific articles are compared, with the aim of providing an alternative to manual indexation. However, in both studies, difficulties in mapping the scientific content using MeSH were reported , which demonstrates the challenge involved in dealing with medical terminology, in classifying scientific texts. ...
Article
Internet users are increasingly using the worldwide web to search for information relating to their health. This situation makes it necessary to create specialized tools capable of supporting users in their searches. To apply and compare strategies that were developed to investigate the use of the Portuguese version of Medical Subject Headings (MeSH) for constructing an automated classifier for Brazilian Portuguese-language web-based content within or outside of the field of healthcare, focusing on the lay public. 3658 Brazilian web pages were used to train the classifier and 606 Brazilian web pages were used to validate it. The strategies proposed were constructed using content-based vector methods for text classification, such that Naive Bayes was used for the task of classifying vector patterns with characteristics obtained through the proposed strategies. A strategy named InDeCS was developed specifically to adapt MeSH for the problem that was put forward. This approach achieved better accuracy for this pattern classification task (0.94 sensitivity, specificity and area under the ROC curve). Because of the significant results achieved by InDeCS, this tool has been successfully applied to the Brazilian healthcare search portal known as Busca Saúde. Furthermore, it could be shown that MeSH presents important results when used for the task of classifying web-based content focusing on the lay public. It was also possible to show from this study that MeSH was able to map out mutable non-deterministic characteristics of the web.
Article
Full-text available
Natural languages are ambiguous and computers are not capable of understanding the natural languages in the way people really understand them. Natural Language Processing (NLP) is concerned with the development of computational models based on the aspects of human language processing. Question Answering (QA) system is a field of Natural Language Processing that provides precise answer for the user question which is given in natural language. In this work, a MemN2N model based question answering system is implemented and its performance is evaluated with a complex question answering tasks using bAbI dataset of three different language text corpuses. The scope of this work is to understand the language independent and dependant aspects of a deep learning network. For this, we will study the performance of the deep learning network by training and testing it with different kinds of question answering tasks with different languages and also try to understand the difference in performance with respect to the languages
Conference Paper
Full-text available
Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources, and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end users expert in the subject, end users inexperienced in the subject, and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.
Conference Paper
Full-text available
Through the past years, an incredible increase in the biomedical data amount presented on the web is enlarged due to the increased data volume in the medical and biological domains. Hence, the search for documents and information on the internet became increasingly complicated. In the current work, a new approach for information extraction using the Natural Language Processing (NLP) tools and ontology was proposed. It described a system to extract relations between the concepts from biomedical texts using morphological analysis and information extraction techniques. In the first step, the system segmented the input text into sentences. Each sentence is then segmented into words that were tagged with part-of-speech labels and concept classes (food, drug, and gene). A set of relation extraction rules (regular expression patterns) are applied on the annotated sentences. If a pattern matches, the concepts and relations are extracted. The system has been tested on a set of 700 MEDLINE abstracts. For performance evaluation, the precision, recall and F-score were calculated. The proposed approach created by information retrieval from MEDLINE to gather a set of abstracts related to a given domain. Then, these texts were annotated using an automaton and ontology via recognizing interesting concepts for morphological analysis. After the annotation step, some rules were summarizing in an automaton that help gene-disease-food relationships discovery. This work proposed an approach for identifying relations between medical concepts using NLP tools. An evaluation experiment reported good effectiveness results.
Article
Most physicians have received only limited training in occupational medicine (OM) during their studies. Since they rely mainly on one 'general medical' journal to keep their medical knowledge up to date, it is worthwhile questioning the importance of OM in these journals. The aim of this study was to measure the relative weight of OM in the major journals of general medicine and to compare the journals. The 14,091 articles published in the Lancet, the NEJM, the JAMA and the BMJ in 1997, 2002 and 2007 were analysed. The relative weight of OM and the other medical specialties was determined by categorisation of all the articles, using a categorisation algorithm, which inferred the medical specialties relevant to each MEDLINE article file from the major medical subject headings (MeSH) terms used by the indexers of the US National Library of Medicine to describe each article. The 14,091 articles included in this study were indexed by 22,155 major MeSH terms, which were categorised into 73 different medical specialties. Only 0.48% of the articles had OM as a main topic. OM ranked 44th among the 73 specialties, with limited differences between the four journals studied. There was no clear trend over the 10-year period. The importance of OM is very low in the four major journals of general and internal medicine, and we can consider that physicians get a very limited view of the evolution of knowledge in OM.
Article
The Catalogue and index of French-language medical sites (CISMeF) is a medical portal that provides users with results as pertinent as possible according to their requirements, expectations, and context of use. Indexing and single-term research are based on theMedical subject headings(MeSH) thesaurus. The integration of new medical terminology for indexing the catalogue's resources is intended to minimize false-negatives during searches and to contextualize the users' needs. The creation of a drug information portal makes more targeted research possible, with numerous entries according to user (physicians, pharmacists, chemists, and pharmacologists). For simplicity's sake, the catalogue's index of resources by different nomenclatures is not entirely displayed. The choice of display is left to the user, with MeSH only as the default. These multi-nomenclature tools should be applicable as well to electronic patient records. In this case, the objective is to improve patient care by better searches and identification of the information required during consultations and hospitalization.
Conference Paper
Full-text available
This paper describes the application of an ensemble of indexing and classification systems, which have been shown to be successful in information retrieval and classification of medical literature, to a new task of assigning ICD-9-CM codes to the clinical history and impression sections of radiology reports. The basic methods used are: a modification of the NLM Medical Text Indexer system, SVM, k-NN and a simple pattern-matching method. The basic methods are combined using a variant of stacking. Evaluated in the context of a Medical NLP Challenge, fusion produced an F-score of 0.85 on the Challenge test set, which is considerably above the mean Challenge F-score of 0.77 for 44 participating groups.
Article
Full-text available
This paper describes the application of an ensemble of indexing and classification systems, which have been shown to be successful in information retrieval and classification of medical literature, to a new task of assigning ICD-9-CM codes to the clinical history and impression sections of radiology reports. The basic methods used are: a modification of the NLM Medical Text Indexer system, SVM, k-NN and a simple pattern-matching method. The basic methods are combined using a variant of stacking. Evaluated in the context of a Medical NLP Challenge, fusion produced an F-score of 0.85 on the Challenge test set, which is considerably above the mean Challenge F-score of 0.77 for 44 participating groups.
Article
Full-text available
The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.
Article
Full-text available
We present BITOLA, an interactive literature-based biomedical discovery support system. The goal of this system is to discover new, potentially meaningful relations between a given starting concept of interest and other concepts, by mining the bibliographic database MEDLINE. To make the system more suitable for disease candidate gene discovery and to decrease the number of candidate relations, we integrate background knowledge about the chromosomal location of the starting disease as well as the chromosomal location of the candidate genes from resources such as LocusLink and Human Genome Organization (HUGO). BITOLA can also be used as an alternative way of searching the MEDLINE database. The system is available at http://www.mf.uni-lj.si/bitola/.
Article
Full-text available
In this paper, we describe the design and preliminary evaluation of a new type of tools to speed up the encoding of episodes of care using the SNOMED CT terminology. The proposed system can be used either as a search tool to browse the terminology or as a categorization tool to support automatic annotation of textual contents with SNOMED concepts. The general strategy is similar for both tools and is based on the fusion of two complementary retrieval strategies with thesaural resources. The first classification module uses a traditional vector-space retrieval engine which has been fine-tuned for the task, while the second classifier is based on regular variations of the term list. For evaluating the system, we use a sample of MEDLINE. SNOMED CT categories have been restricted to Medical Subject Headings (MeSH) using the SNOMED-MeSH mapping provided by the UMLS (version 2006). Consistent with previous investigations applied on biomedical terminologies, our results show that performances of the hybrid system are significantly improved as compared to each single module. For top returned concepts, a precision at high ranks (P0) of more than 80% is observed. In addition, a manual and qualitative evaluation on a dozen of MEDLINE abstracts suggests that SNOMED CT could represent an improvement compared to existing medical terminologies such as MeSH. Although the precision of the SNOMED categorizer seems sufficient to help professional encoders, it is concluded that clinical benchmarks as well as usability studies are needed to assess the impact of our SNOMED encoding method in real settings. AVAILABILITIES: The system is available for research purposes on: http://eagl.unige.ch/SNOCat.
Article
Full-text available
Journal Descriptor Indexing (JDI) is a vector-based text classification system developed at NLM (National Library of Medicine), originally in Lisp and now as a Java tool. Consequently, a testing suite was developed to verify training set data and results of the JDI tool. A methodology was developed and implemented to compare two sets of JD vectors, resulting in a single index (from 0 - 1) measuring their similarity. This methodology is fast, effective, and accurate.
Article
Full-text available
CISMeF is a French quality-controlled health gateway that uses the MeSH thesaurus. We introduced two new concepts, metaterms (medical specialty which has semantic links with one or more MeSH terms, subheadings and resource types) and resource types. Evaluate precision and recall of metaterms. We created 16 pairs of queries. Each pair concerned the same topic, but one used metaterms and one MeSH terms. To assess precision, each document retrieved by the query was classified as irrelevant, partly relevant or fully relevant. The 16 queries yielded 943 documents for metaterm queries and 139 for MeSH term queries. The recall of MeSH term queries was 0.44 (compared to 1 for metaterm queries) and the precision were identical for MeSH term and metaterm queries. Metaconcept such as CISMeF metaterms allows a better recall with a similar precision that MeSH terms in a quality controlled health gateway.
Article
Full-text available
The objective of NLM's Indexing Initiative (IND) is to investigate methods whereby automated indexing methods partially or completely substitute for current indexing practices. The project will be considered a success if methods can be designed and implemented that result in retrieval performance that is equal to or better than the retrieval performance of systems based principally on humanly assigned index terms. We describe the current state of the project and discuss our plans for the future.
Article
Full-text available
Considerable research is being directed at extracting molecular biology information from text. Particularly challenging in this regard is to identify relations between entities, such as protein-protein interactions or molecular pathways. In this paper we present a natural language processing method for extracting causal relations between genetic phenomena and diseases. After presenting the results of preliminary evaluation, we suggest the use of a graphical display application for viewing the semantic predications produced by the system.
Article
Full-text available
The Medical Text Indexer (MTI) is a program for producing MeSH indexing recommendations. It is the major product of NLM's Indexing Initiative and has been used in both semi-automated and fully automated indexing environments at the Library since mid 2002. We report here on an experiment conducted with MEDLINE indexers to evaluate MTI's performance and to generate ideas for its improvement as a tool for user-assisted indexing. We also discuss some filtering techniques developed to improve MTI's accuracy for use primarily in automatically producing the indexing for several abstracts collections.
Article
Full-text available
In the context of the BioCreative competition, where training data were very sparse, we investigated two complementary tasks: 1) given a Swiss-Prot triplet, containing a protein, a GO (Gene Ontology) term and a relevant article, extraction of a short passage that justifies the GO category assignment; 2) given a Swiss-Prot pair, containing a protein and a relevant article, automatic assignment of a set of categories. Sentence is the basic retrieval unit. Our classifier computes a distance between each sentence and the GO category provided with the Swiss-Prot entry. The Text Categorizer computes a distance between each GO term and the text of the article. Evaluations are reported both based on annotator judgements as established by the competition and based on mean average precision measures computed using a curated sample of Swiss-Prot. Our system achieved the best recall and precision combination both for passage retrieval and text categorization as evaluated by official evaluators. However, text categorization results were far below those in other data-poor text categorization experiments The top proposed term is relevant in less that 20% of cases, while categorization with other biomedical controlled vocabulary, such as the Medical Subject Headings, we achieved more than 90% precision. We also observe that the scoring methods used in our experiments, based on the retrieval status value of our engines, exhibits effective confidence estimation capabilities. From a comparative perspective, the combination of retrieval and natural language processing methods we designed, achieved very competitive performances. Largely data-independent, our systems were no less effective that data-intensive approaches. These results suggests that the overall strategy could benefit a large class of information extraction tasks, especially when training data are missing. However, from a user perspective, results were disappointing. Further investigations are needed to design applicable end-user text mining tools for biologists.
Article
Full-text available
We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods.
Article
Full-text available
Categorization is designed to enhance resource description by organizing content description so as to enable the reader to grasp quickly and easily what are the main topics discussed in it. The objective of this work is to propose a categorization algorithm to classify a set of scientific articles indexed with the MeSH thesaurus, and in particular those of the MEDLINE bibliographic database. In a large bibliographic database such as MEDLINE, finding materials of particular interest to a specialty group, or relevant to a particular audience, can be difficult. The categorization refines the retrieval of indexed material. In the CISMeF terminology, metaterms can be considered as super-concepts. They were primarily conceived to improve recall in the CISMeF quality-controlled health gateway. The MEDLINE categorization algorithm (MCA) is based on semantic links existing between MeSH terms and metaterms on the one hand and between MeSH subheadings and metaterms on the other hand. These links are used to automatically infer a list of metaterms from any MeSH term/subheading indexing. Medical librarians manually select the semantic links. The MEDLINE categorization algorithm lists the medical specialties relevant to a MEDLINE file by decreasing order of their importance. The MEDLINE categorization algorithm is available on a Web site. It can run on any MEDLINE file in a batch mode. As an example, the top 3 medical specialties for the set of 60 articles published in BioMed Central Medical Informatics & Decision Making, which are currently indexed in MEDLINE are: information science, organization and administration and medical informatics. We have presented a MEDLINE categorization algorithm in order to classify the medical specialties addressed in any MEDLINE file in the form of a ranked list of relevant specialties. The categorization method introduced in this paper is based on the manual indexing of resources with MeSH (terms/subheadings) pairs by NLM indexers. This algorithm may be used as a new bibliometric tool.
Article
Full-text available
Genomic functional information is valuable for biomedical research. However, such information frequently needs to be extracted from the scientific literature and structured in order to be exploited by automatic systems. Natural language processing is increasingly used for this purpose although it inherently involves errors. A postprocessing strategy that selects relations most likely to be correct is proposed and evaluated on the output of SemGen, a system that extracts semantic predications on the etiology of genetic diseases. Based on the number of intervening phrases between an argument and its predicate, we defined a heuristic strategy to filter the extracted semantic relations according to their likelihood of being correct. We also applied this strategy to relations identified with co-occurrence processing. Finally, we exploited postprocessed SemGen predications to investigate the genetic basis of Parkinson's disease. The filtering procedure for increased precision is based on the intuition that arguments which occur close to their predicate are easier to identify than those at a distance. For example, if gene-gene relations are filtered for arguments at a distance of 1 phrase from the predicate, precision increases from 41.95% (baseline) to 70.75%. Since this proximity filtering is based on syntactic structure, applying it to the results of co-occurrence processing is useful, but not as effective as when applied to the output of natural language processing. In an effort to exploit SemGen predications on the etiology of disease after increasing precision with postprocessing, a gene list was derived from extracted information enhanced with postprocessing filtering and was automatically annotated with GFINDer, a Web application that dynamically retrieves functional and phenotypic information from structured biomolecular resources. Two of the genes in this list are likely relevant to Parkinson's disease but are not associated with this disease in several important databases on genetic disorders. Information based on the proximity postprocessing method we suggest is of sufficient quality to be profitably used for subsequent applications aimed at uncovering new biomedical knowledge. Although proximity filtering is only marginally effective for enhancing the precision of relations extracted with co-occurrence processing, it is likely to benefit methods based, even partially, on syntactic structure, regardless of the relation.
Article
Full-text available
A JDI (Journal Descriptor Indexing) tool has been developed at NLM that automatically categorizes biomedical text as input, returning a ranked list, with scores between 0-1, of either JDs (Journal Descriptors, corresponding to biomedical disciplines) or STs (UMLS Semantic Types). Possible applications include WSD (Word Sense Disambiguation) and retrieval according to discipline. The Lexical Systems Group plans to distribute an open source JAVA version of this tool.
Article
A new, fully automated approach for indexing documents is presented based on associating textwords in a training set of bibliographic citations with the indexing of journals. This journal-level indexing is in the form of a consistent, timely set of journal descriptors (JDs) indexing the individual journals themselves. This indexing is maintained in journal records in a serials authority database. The advantage of this novel approach is that the training set does not depend on previous manual indexing of hundreds of thousands of documents (i.e., any such indexing already in the training set is not used), but rather the relatively small intellectual effort of indexing at the journal level, usually a matter of a few thousand unique journals for which retrospective indexing to maintain consistency and currency may be feasible. If successful, JD indexing would provide topical categorization of documents outside the training set, i.e., journal articles, monographs, WEB documents, reports from the grey literature, etc., and therefore be applied in searching. Because JDs are quite general, corresponding to subject domains, their most probable use would be for improving or refining search results.
Article
An experiment was performed at the National Library of Medicine((R)) (NLM((R))) in word sense disambiguation (WSD) using the Journal Descriptor Indexing (JDI) methodology. The motivation is the need to solve the ambiguity problem confronting NLM's MetaMap system, which maps free text to terms corresponding to concepts in NLM's Unified Medical Language System((R)) (UMLS((R))) Metathesaurus((R)). If the text maps to more than one Metathesaurus concept at the same high confidence score, MetaMap has no way of knowing which concept is the correct mapping. We describe the JDI methodology, which is ultimately based on statistical associations between words in a training set of MEDLINE((R)) citations and a small set of journal descriptors (assigned by humans to journals per se) assumed to be inherited by the citations. JDI is the basis for selecting the best meaning that is correlated to UMLS semantic types (STs) assigned to ambiguous concepts in the Metathesaurus. For example, the ambiguity transport has two meanings: "Biological Transport" assigned the ST Cell Function and "Patient transport" assigned the ST Health Care Activity. A JDI-based methodology can analyze text containing transport and determine which ST receives a higher score for that text, which then returns the associated meaning, presumed to apply to the ambiguity itself. We then present an experiment in which a baseline disambiguation method was compared to four versions of JDI in disambiguating 45 ambiguous strings from NLM's WSD Test Collection. Overall average precision for the highest-scoring JDI version was 0.7873 compared to 0.2492 for the baseline method, and average precision for individual ambiguities was greater than 0.90 for 23 of them (51%), greater than 0.85 for 24 (53%), and greater than 0.65 for 35 (79%). On the basis of these results, we hope to improve performance of JDI and test its use in applications.
Article
The volume of biomedical literature has experienced explosive growth in recent years. This is reflected in the corresponding increase in the size of MEDLINE, the largest bibliographic database of biomedical citations. Indexers at the US National Library of Medicine (NLM) need efficient tools to help them accommodate the ensuing workload. After reviewing issues in the automatic assignment of Medical Subject Headings (MeSH terms) to biomedical text, we focus more specifically on the new subheading attachment feature for NLM's Medical Text Indexer (MTI). Natural Language Processing, statistical, and machine learning methods of producing automatic MeSH main heading/subheading pair recommendations were assessed independently and combined. The best combination achieves 48% precision and 30% recall. After validation by NLM indexers, a suitable combination of the methods presented in this paper was integrated into MTI as a subheading attachment feature producing MeSH indexing recommendations compliant with current state-of-the-art indexing practice.
Article
The quality of indexing of periodicals in a bibliographic data base cannot be measured directly, as there is no one "correct" way to index an item. However, consistency can be used to measure the reliability of indexing. To measure consistency in MEDLINE, 760 twice-indexed articles from 42 periodical issues were identified in the data base, and their indexing compared. Consistency, expressed as a percentage, was measured using Hooper's equation. Overall, checktags had the highest consistency. Medical Subject Headings (MeSH) and subheadings were applied more consistently to central concepts than to peripheral points. When subheadings were added to a main heading, consistency was lowered. "Floating" subheadings were more consistent than were attached subheadings. Indexing consistency was not affected by journal indexing priority, language, or length of the article. Terms from MeSH Tree Structure categories A, B, and D appeared more often than expected in the high-consistency articles; whereas terms from categories E, F, H, and N appeared more often than expected in the low-consistency articles. MEDLINE, with its excellent controlled vocabulary, exemplary quality control, and highly trained indexers, probably represents the state of the art in manually indexed data bases.
Article
CISMeF is a Quality Controlled Health Gateway using a terminology based on the Medical Subject Headings (MeSH) thesaurus that displays medical specialties (metaterms) and the relationships existing between them and MeSH terms. Objective: The need to classify the resources within the catalogue has led us to combine this type of semantic information with domain expert knowledge for health resources categorization purposes. Material and A two-step categorization process consisting of mapping resource keywords to CISMeF metaterms and ranking metaterms by decreasing coverage in the resource has been developed. We evaluate this algorithm on a random set of 123 resources extracted from the CISMeF catalogue. Our gold standard for this evaluation is the manual classification provided by a domain expert, viz. a librarian of the team. The CISMeF algorithm shows 81% precision and 93% recall, and 62% of the resources were assigned a "fully relevant" or "fairly relevant" categorization according to strict standards. A thorough analysis of the results has enabled us to find gaps in the knowledge modeling of the CISMeF terminology. The necessary adjustments having been made, the algorithm is currently used in CISMeF for resource categorization.
Article
The amount of health information available on the Internet is considerable. In this context, several health gateways have been developed. Among them, CISMeF (Catalogue and Index of Health Resources in French) was designed to catalogue and index health resources in French. The goal of this article is to describe the various enhancements to the MeSH thesaurus developed by the CISMeF team to adapt this terminology to the broader field of health Internet resources instead of scientific articles for the medline bibliographic database. CISMeF uses two standard tools for organizing information: the MeSH thesaurus and several metadata element sets, in particular the Dublin Core metadata format. The heterogeneity of Internet health resources led the CISMeF team to enhance the MeSH thesaurus with the introduction of two new concepts, respectively, resource types and metaterms. CISMeF resource types are a generalization of the publication types of medline. A resource type describes the nature of the resource and MeSH keyword/qualifier pairs describe the subject of the resource. A metaterm is generally a medical specialty or a biological science, which has semantic links with one or more MeSH keywords, qualifiers and resource types. The CISMeF terminology is exploited for several tasks: resource indexing performed manually, resource categorization performed automatically, visualization and navigation through the concept hierarchies and information retrieval using the Doc'CISMeF search engine. The CISMeF health gateway uses several MeSH thesaurus enhancements to optimize information retrieval, hierarchy navigation and automatic indexing.
Semantic relations asserting the etiology of genetic diseases Automatic assignment of biomedical categories: Toward a generic approach
  • T C Rindflesch
  • B Libbus
  • D Hristovski
  • A R Aronson
  • H Kilicoglu
Rindflesch, T.C., Libbus, B., Hristovski, D., Aronson, A.R., & Kilicoglu, H. (2003). Semantic relations asserting the etiology of genetic diseases. In Proceedings of the American Medical Informatics Association (pp. 554–558). Retrieved July 14, 2009, from http://www.pubmedcentral. nih.gov/picrender.fcgi?artid=1480275&blobtype=pdf Ruch, P. (2006). Automatic assignment of biomedical categories: Toward a generic approach. Bioinformatics 22(6), 658–664.
Darmoni to the to the Lister Hill Center Visitors Program, sponsored by the National Library of Medicine and administered by the Oak Ridge Institute for Science and Education JAMA & Archives Topic Collections The NLM Indexing Initiative
  • S J Aronson
  • A R Bodenreider
  • O Chang
  • H F Humphrey
  • S M Mork
  • J G Nelson
appointment of A. Névéol to the Lister Hill Center Fellows Program; appointments of P. Ruch and S.J. Darmoni to the to the Lister Hill Center Visitors Program, sponsored by the National Library of Medicine and administered by the Oak Ridge Institute for Science and Education. Reference * * American Medical Association (2008). JAMA & Archives Topic Collections. Retrieved July 1, 2009, from http://pubs.ama-assn.org/collections Aronson, A.R., Bodenreider, O., Chang, H.F., Humphrey, S.M., Mork, J.G., Nelson, S.J., et al. (2000). The NLM Indexing Initiative. In Pro-ceedings of the American Medical Informatics Association Annual Symposium. (pp. 17–21). Retrieved July 14, 2009, from http://www. pubmedcentral.nih.gov/picrender.fcgi?artid=2243970&blobtype=pdf * * Regarding references with PMCID: When a PMCID is searched in NLM's PubMed, the reference is retrieved with a link to the free full text article in PubMed Central.
Automatic med-ical encoding with SNOMED categories Introduction to modern information retrieval
  • P Ruch
  • J Gobeill
  • C Lovis
  • A Geissbühler
  • S
  • G Salton
  • M J Mcgill
Ruch, P., Gobeill, J., Lovis, C., & Geissbühler, A. (2008). Automatic med-ical encoding with SNOMED categories. BMC Medical Informatics and Decision Making, 27(8) Suppl 1:S6. Retrieved July 14, 2009, from http://biomedcentral.com/1472-6947/8/S1/S6 Salton, G., & McGill, M.J. (1983). Introduction to modern information retrieval (pp. 63–66). New York: McGraw-Hill.
Evaluation of meta-concepts for information retrieval in a quality-controlled health gateway Using literature-based discovery to identify disease candi-date genes
  • J F Gehanno
  • B Thirion
  • S J Darmoni
Gehanno, J.F., Thirion, B., & Darmoni, S.J. (2007). Evaluation of meta-concepts for information retrieval in a quality-controlled health gateway. In Proceedings of the American Medical Informatics Association (pp. 269–273). Retrieved July 14, 2009, from http://telemedicina.unifesp.br/ pub/AMIA/2007%20AMIA%20Proceedings/data/papers/papers/AMIA-0085-S2007.pdf Hristovski, D., Peterlin, B., Mitchell, J.A., & Humphrey, S.M. (2005). Using literature-based discovery to identify disease candi-date genes. International Journal of Medical Informatics, 74(2–4), 289–298.
(86%) 0.7427 (75%) 0.8703 JD Text DC + MH 0
  • Jd Text
  • Wc Mh
JD Text WC + MH 0.6468 0.9612 0.4680 (71%) 0.2840 (86%) 0.7427 (75%) 0.8703 JD Text DC + MH 0.6562 0.9495 0.4740 (72%) 0.2840 (86%) 0.7470 (75%) 0.8690 JD highest attainable 1.0000 1.0000 0.6560 0.3310 0.9950 1.0000
A method for verifying a vector-based text classification systems Foundations of statistical natural language processing
  • Cj Lu
  • Sm Humphrey
  • Ac Browne
Lu CJ, Humphrey SM, Browne AC. A method for verifying a vector-based text classification systems. AMIA … Annual Symposium Proceedings/AMIA Symposium. AMIA Symposium 2008:1030. [PubMed: 18998786]PMCID: forthcoming Manning, CD.; Schütze, H. Cambridge, MA: The MIT Press; 1999. Foundations of statistical natural language processing; p. 268-269.534–8
Author manuscript; available in PMC
J Am Soc Inf Sci Technol. Author manuscript; available in PMC 2009 December 1.
Enhancing the MeSH thesaurus to retrieve French online health resources in a quality-controlled gateway; Health Information and Libraries Journal Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot
  • M Douyère
  • Lf Soualmia
  • A Névéol
  • A Rogozan
  • B Dahamna
  • Jp Leroy
  • B Thirion
  • Sj Darmoni
  • F Ehrler
  • A Geissbühler
  • A Jimeno
  • P Ruch
Douyère, M.; Soualmia, LF.; Névéol, A.; Rogozan, A.; Dahamna, B.; Leroy, JP.; Thirion, B.; Darmoni, SJ. Enhancing the MeSH thesaurus to retrieve French online health resources in a quality-controlled gateway; Health Information and Libraries Journal. 2004 [Retrieved November 21, 2008]. p. 253-261.from http://www3.interscience.wiley.com/cgi-bin/fulltext/118813886/PDFSTART Ehrler F, Geissbühler A, Jimeno A, Ruch P. Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot. BMC Bioinformatics 2005;6:S23. [PubMed: 15960836]PMCID: PMC1869016
Additional subject subset for pubMed
  • Chu Hôpitaux De Rouen
NLM Medical Text Indexer: A tool for automatic and assisted indexing
  • A R Aronson
  • J G Mork
  • F M Lang
  • W J Rogers
CISMeF: Catalog and Index of French-language health internet resources. A quality-controlled subject gateway
  • Chu Hôpitaux De Rouen
The NLM Indexing Initiative's Medical Text Indexer; Studies in Health Technology and Informatics
  • Ar Aronson
  • Jg Mork
  • Cw Gay
  • Sm Humphrey
  • Rogers
  • Wj
Aronson, AR.; Mork, JG.; Gay, CW.; Humphrey, SM.; Rogers, WJ. The NLM Indexing Initiative's Medical Text Indexer; Studies in Health Technology and Informatics. 2004 [Retrieved November 21, 2008]. p. 268-272.from http://skr.nlm.nih.gov/papers/references/aronson-medinfo04.wheader.pdf
Catalogue et Index des Sites Medicaux Francophones
  • Chu Hôpitaux De Rouen