ArticlePDF Available

Update on XplorMed: A web server for exploring scientific literature

August 2003
Nucleic Acids Research 31(13):3866-8

August 2003
31(13):3866-8

DOI:10.1093/nar/gkg538

Source
PubMed

Authors:

Carol Perez-Iratxeta

Ottawa Hospital Research Institute

Antonio J Pérez-Pulido

Universidad Pablo de Olavide

Miguel Andrade

Johannes Gutenberg-Universität Mainz

As scientific literature databases like MEDLINE increase in size, so does the time required to search them. Scientists must frequently inspect long lists of references manually, often just reading the titles. XplorMed is a web tool that aids MEDLINE searching by summarizing the subjects contained in the results, thus allowing users to focus on subjects of interest. Here we describe new features added to XplorMed during the last 2 years (http://www.bork.embl-heidelberg.de/xplormed/).

Content uploaded by Carol Perez-Iratxeta

Content may be subject to copyright.

Update on XplorMed: a web server for exploring

scientiﬁc literature

Carolina Perez-Iratxeta

1,2,

*, Antonio J. Pe

´rez

, Peer Bork

1,2

and Miguel A. Andrade

1,2

European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany,

Max Delbru¨ck

Center for Molecular Medicine, Robert Ro

¨sler-Str. 10, 13125 Berlin-Buch, Germany and

University of Malaga,

Facultad de CC. Campus de Teatinos, 29071 Malaga, Spain

Received February 14, 2003; Revised and Accepted March 21, 2003

ABSTRACT

As scientiﬁc literature databases like MEDLINE

increase in size, so does the time required to search

them. Scientists must frequently inspect long lists of

references manually, often just reading the titles.

XplorMed is a web tool that aids MEDLINE searching

by summarizing the subjects contained in the

results, thus allowing users to focus on subjects of

interest. Here we describe new features added to

XplorMed during the last 2 years (http://www.bork.

embl-heidelberg.de/xplormed/).

BACKGROUND AND GOALS

A scientist searching the scientiﬁc literature (for example,

MEDLINE with Entrez at the NCBI’s PubMed server, http://

www.ncbi.nlm.nih.gov/entrez/) may initially retrieve an

unmanageable number of references, typically hundreds.

Even for very speciﬁc subjects, it is not always clear how to

narrow the search to focus on the most relevant matches. For

example, imagine you are a researcher interested in the

possible role of the interaction between heparin and proteins in

Alzheimer’s disease, who queries the PubMed server with the

terms ‘Alzheimer and heparin’. This search returns presently

>100 references to literature, of which only some mention

proteins. Finding these currently requires manual examination

of the abstracts which can be time-consuming. In such

instances XplorMed can be useful (1).

XplorMed is a web tool that summarizes MEDLINE search

results according to subjects and allows you to navigate

through abstracts in an interactive fashion. Here we give details

as to the use of XplorMed. A detailed tutorial is also avail-

able online (http://www.bork.embl-heidelberg.de/xplormed/

example/).

INPUT TO XplorMed

There are two ways to provide input to XplorMed. You can

type a PubMed query directly into our server or you can supply

a ﬁle containing a set of abstracts. XplorMed can handle

several abstract formats: MEDLINE (default), EndNote, XML

and XplorMed (see page 1 of the tutorial for details).

A third way to query XplorMed is to start from literature

linked to a particular entry from one of the MEDLINE, OMIM

(2), SMART (3) or SWISS-PROT/SpTrEMBL (4) databases.

Here you simply need to provide the identiﬁer of the entry of

your interest and the corresponding database name. The initial

set of abstracts of each XplorMed session is kept in the server

for a week, enabling you to recover your session. We

recommend you start with sets of 30 references, though the

current maximum is 500 abstracts.

OVERVIEW OF AN ANALYSIS

The ﬁrst step involves a coarse overlapping clustering of the

abstracts. References are classiﬁed into eight classes depend-

ing on their subject. Classes correspond to MeSH main

categories, such as ‘Anatomy’, ‘Organisms’, ‘Chemical and

Drugs’, ‘Biological Sciences’, etc. (see http://www.nlm.nih.

gov/mesh/meshhome.html). You can impose an initial ﬁltering

to restrict the search to categories of interest and it is also

possible to ﬁlter the search results by publication date (see

page 2 of the tutorial).

The next web page displays keywords in the selected

abstracts. The method for computing keywords and relations

between them can be found in literature (5). The list of

extracted keywords provides a summary of the subjects within

the query results and these are listed in order of relevance

(more important concepts are listed ﬁrst). Considering the

above example of heparin and Alzheimer, XplorMed gives

expected terms—‘protein’, ‘heparin’, ‘alzheimer’ and

‘disease’—in addition to others that may be new to you,

for example, ‘tau’ and ‘app’.

At this stage, you can choose whether to go directly to the

next step or to start a deeper analysis of the displayed subjects.

The latter involves a context analysis of the subjects

represented by the keywords and it is outlined brieﬂy below

(see Context Analysis of the Subjects). Alternatively, if you

choose to go further, several groups or chains of closely related

keywords are then presented to you.

You can modify the number of chains and their length by

means of two parameters: alpha and score (see page 3 of

*To whom correspondence should be addressed. Tel: +49 6221 387 456; Fax: +49 6221 387 517; Email: cperez@embl-heidelberg.de

3866–3868 Nucleic Acids Research, 2003, Vol. 31, No. 13

DOI: 10.1093/nar/gkg538

Published by Oxford University Press 2003

Figure 1. (A) XplorMed’s home page. (B) Words related to ‘app’. (C) Sentences containing the words ‘app’ and ‘outgrowth’.

Nucleic Acids Research, 2003, Vol. 31, No. 13 3867

tutorial for details). Each chain is preceded by a number that

indicates how many abstracts contain both words. By selecting

one or more of these chains, you perform a sub-query of the

original set. For example, suppose you are interested in protein

domains that could bind heparin. Accordingly, you would

inspect the pair {protein, domain}, which appears in 13

references. You can select an alternative or additional word

chain if you do not ﬁnd what you wanted among the proposals

of the system.

The next web page provides an ordered list of abstracts;

those likely to be most interesting according to your selection

are highlighted on top (in our example, the papers dealing with

the heparin binding domain). If you checked in the previous

page the boxes for cross-linking to the corresponding

databases, several hyperlinked symbols will label some

abstracts (see Cross Linking to Molecular Biology Databases).

The ﬁltered subset of papers can now be used as a new

XplorMed starting point at the computation-of-keywords step

(see above). Alternatively, you can expand this subset with

new papers among their MEDLINE neighbors (see Expanding

the Query through Related Bibliography). New keywords

focusing more closely on your subject of interest will appear at

this stage. The procedure can be performed repetitively and the

recovery of the set of abstracts is possible at any stage.

CONTEXT ANALYSIS OF THE SUBJECTS

When the list of keywords is presented, you can explore both

their meanings and relationships. By clicking on a word you

can see all the sentences in the abstracts that contain that word

and each sentence is linked to its MEDLINE abstract. In this

way you can learn why a particular word is mentioned across

the abstracts. Moreover, you can also discover interesting

information by examining the words strongly related to a

particular word (for example, ‘app’, see Fig. 1B). By clicking

on the [R] next to each word, a window displaying closely

related words (such as ‘outgrowth’or ‘zinc’) will be shown.

Clicking on the [X] near any related word (like ‘outgrowth’)

shows the sentences containing either of the words (‘app’or

‘outgrowth’) in abstracts containing both words [for example,

‘The results indicate that the binding of APP to HSPG in the

ECM may stimulate the effects of APP on neurite outgrowth.’

(6)]. Words and sentences are highlighted in different colors

for an easy identiﬁcation (see page 3 of the tutorial for details).

Clicking the button ‘Explore the context of any word’allows

you to do this kind of analysis in a more ﬂexible way by typing

other keywords of interest.

CROSS-LINKING TO MOLECULAR BIOLOGY

DATABASES

As was mentioned above, the list of selected abstracts can be

optionally hyperlinked to objects in several databases,

currently MEDLINE, OMIM, SMART, SWISS-PROT and

SpTrEMBL. The diverse symbols indicate the database and in

the case of SWISS-PROT, the subject of the article, such as

‘describes protein function’,‘reports a 3D structure’, etc. Note

that the hyperlink to PubMed is always supplied, allowing you

to check the content of the abstract. An additional symbol

denotes review articles.

EXPANDING THE QUERY THROUGH

RELATED BIBLIOGRAPHY

As mentioned above, once you have selected a subset of

abstracts, it is possible to re-enter the analysis with the ﬁltered

set at the computation-of-keywords step. You can also expand

this set of abstracts by retrieving neighbors from MEDLINE.

Neighbors are those references that deal with the same

(or similar) subject (7). To opt for this expansion you have

to check the box at the bottom of the list of references. You

can also change the number of neighbors included.

CONCLUSION

We have summarized how you can use the web tool XplorMed

to deal more efﬁciently with MEDLINE literature. Because

our server is being continually developed for the inclusion of

new features, any suggestion from users is warmly welcomed

and will be acknowledged.

ACKNOWLEDGEMENTS

We are grateful to the members of our group for their

suggestions and to Robert B. Russell and to Sea

´nI.

O’Donoghue for comments to our manuscript. XplorMed uses

in one step, TreeTagger, a part of speech tagger. We are

grateful to Helmut Schmid (IMS, Stuttgart, Germany) for

developing TreeTagger and making it publicly available.

REFERENCES

1. Perez-Iratxeta,C., Bork,P. and Andrade,M.A. (2001). Xplormed: a tool for

exploring MEDLINE abstracts. Trends Biochem. Sci.,26, 573–575.

2. Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans

Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD)

and National Center for Biotechnology Information, National Library of

Medicine (Bethesda, MD), 2000.

3. Letunic,I., Goodstadt,L., Dickens,N.J., Doerks,T., Schultz,J., Mott,R.,

Ciccarelli,F., Copley,R.R., Ponting,C.P. and Bork,P. (2002) Recent

improvements to the SMART domain-based sequence annotation

resource. Nucleic Acids Res.,30, 242–244.

4. Boeckmann,B., Bairoch,A., Apweiler,R., Blatter,M.C., Estreicher,A.,

Gasteiger,E., Martin,M.J., Michoud,K., O’Donovan,C., Phan,I. et al.

(2003) The SWISS-PROT protein knowledgebase and its supplement

TrEMBL in 2003. Nucleic Acids Res.,31, 365–370.

5. Perez-Iratxeta,C., Bork,P. and Andrade,M.A. (2002). Computing fuzzy

associations for the analysis of biological literature. Biotechniques,32,

1380–1385.

6. Small,D.H., Nurcombe,V., Reed,G., Clarris,H., Moir,R., Beyreuther,K.

and Masters,C.L. (1994). A heparin-binding domain in the amyloid

protein precursor of Alzheimer’s disease is involved in the regulation of

neurite outgrowth. J. Neurosci.,14, 2117–2127.

7. Wilbur,W.J. and Yang,Y. (1996). An analysis of statistical term strength

and its use in the indexing and retrieval of molecular biology texts.

Comput. Biol. Med.,26, 209–222.

3868 Nucleic Acids Research, 2003, Vol. 31, No. 13

Biomedical Literature Exploration through Latent Semantics

Article

Full-text available

Aug 2013

The fast increasing amount of articles published in the biomedical field is creating difficulties in the way this wealth of information can be efficiently exploited by researchers. As a way of overcoming these limitations and potentiating a more efficient use of the literature, we propose an approach for structuring the results of a literature search based on the latent semantic information extracted from a corpus. Moreover, we show how the results of the Latent Semantic Analysis method can be adapted so as to evidence differences between results of different searches. We also propose different visualization techniques that can be applied to explore these results. Used in combination, these techniques could empower users with tools for literature guided knowledge exploration and discovery.

ANDDigest: a new web-based module of ANDSystem for the search of knowledge in the scientific literature

Article

Full-text available

Sep 2020
BMC BIOINFORMATICS

Background: The rapid growth of scientific literature has rendered the task of finding relevant information one of the critical problems in almost any research. Search engines, like Google Scholar, Web of Knowledge, PubMed, Scopus, and others, are highly effective in document search; however, they do not allow knowledge extraction. In contrast to the search engines, text-mining systems provide extraction of knowledge with representations in the form of semantic networks. Of particular interest are tools performing a full cycle of knowledge management and engineering, including automated retrieval, integration, and representation of knowledge in the form of semantic networks, their visualization, and analysis. STRING, Pathway Studio, MetaCore, and others are well-known examples of such products. Previously, we developed the Associative Network Discovery System (ANDSystem), which also implements such a cycle. However, the drawback of these systems is dependence on the employed ontologies describing the subject area, which limits their functionality in searching information based on user-specified queries. Results: The ANDDigest system is a new web-based module of the ANDSystem tool, permitting searching within PubMed by using dictionaries from the ANDSystem tool and sets of user-defined keywords. ANDDigest allows performing the search based on complex queries simultaneously, taking into account many types of objects from the ANDSystem's ontology. The system has a user-friendly interface, providing sorting, visualization, and filtering of the found information, including mapping of mentioned objects in text, linking to external databases, sorting of data by publication date, citations number, journal H-indices, etc. The system provides data on trends for identified entities based on dynamics of interest according to the frequency of their mentions in PubMed by years. Conclusions: The main feature of ANDDigest is its functionality, serving as a specialized search for information about multiple associative relationships of objects from the ANDSystem's ontology vocabularies, taking into account user-specified keywords. The tool can be applied to the interpretation of experimental genetics data, the search for associations between molecular genetics objects, and the preparation of scientific and analytical reviews. It is presently available at https://anddigest.sysbio.ru/ .

Inferring gene and protein interactions using PubMed citations and consensus Bayesian networks

Article

Full-text available

Oct 2017
PLOS ONE

The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.

PubMed Smarter: Query Expansion with Implicit Words Based on Gene Ontology

Conference Paper

Full-text available

Jan 2007

The biomedical literature is increasing rapidly, but most information retrieval systems for biomedicine are not what we really expect. In general, users suffer from exactly specifying what they want to the information retrieval systems, thereby getting back unsatisfied results from these systems. In this paper, we proposed PubMed Smarter that improves the effectiveness of information retrieval in PubMed. We built the word-relationship tree for biomedicine used to find implicit words. The implicit words are the ones correlative to a user query, and facilitate searching the PubMed database. Finally, we also used a fair assessment to evaluate the effectiveness of the system.

Handbook of Statistical Analysis and Data Mining Applications

Book

Jun 2009

The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries - from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions.

Use Of Text-Mining Tools For Systematic Reviews

Article

Full-text available

May 2016
VALUE HEALTH

EPC Methods: An Exploration of the Use of Text-Mining Software in Systematic Reviews

Technical Report

Full-text available

Apr 2016

Objective. This project’s goal was to provide a preliminary sketch of the use of text-mining tools as an emerging methodology within a number of systematic review processes. We sought to provide information addressing pressing questions individuals and organizations face when considering utilizing text-mining tools. Methods. We searched the literature to identify and summarize research on the use of text-mining tools within the systematic review context. We conducted telephone interviews with Key Informants (KIs; n=8) using a semi-structured instrument and subsequent qualitative analysis to explore issues surrounding the implementation and use of text-mining tools. Lastly, we compiled a list of text-mining tools to support systematic review methods and evaluated the tools using an informal descriptive appraisal tool. Results. The literature review identified 122 articles that met inclusion criteria, including two recent systematic reviews on the use of text-mining tools in the screening and data abstraction steps of systematic reviews. In addition to these two steps, a preliminary exploration of the literature on searching and other less-studied steps are presented. Support for the use of text-mining was strong amongst the KIs overall, though most KIs noted some performance caveats and/or areas in which further research is necessary. We evaluated 111 text-mining tools identified from the literature review and KI interviews. Conclusions. Text-mining tools are currently being used within several systematic review organizations for a variety of review processes (e.g., searching, screening abstracts), and the published evidence-base is growing fairly rapidly in breadth and levels of evidence. Several outstanding questions remain for future empirical research to address regarding the reliability and validity of using these emerging technologies across a variety of review processes and whether these generalize across the scope of review topics. Guidance on reporting the use of these tools would be useful.

Text Mining

Chapter

Dec 2014

It is becoming increasingly difficult to keep up with the amount of information published in the scientific literature, both for domain experts and for the sake of maintaining up-to-date biological databases based on manual curation of articles. This issue has been addressed with the help of text mining technologies specifically adapted to the biomedical domain. The aim of these strategies is to be more efficient in the retrieval and classification of relevant documents and the detection of bio-entities in text. Text mining is used for the automatic extraction of interactions between and functional annotations of biological substances and links articles with existing objects in the annotation databases. This chapter provides a general overview of the main tasks in biomedical text mining and natural language processing, introducing the underlying methods and existing applications tailored to handle the rapidly growing amount of literature data.

Visual Analytics for Supporting Evidence-Based Interpretation of Molecular Cytogenomic Findings

Conference Paper

Oct 2015

Interpreting molecular cytogenomic findings that cover the human genome (e.g., microarray results) is challenging, as it requires accessing and working with multiple, diverse sources of data that are often large and heterogeneous. These data need to be accessed, queried, and simultaneously integrated to achieve open-ended goals, such as interpreting findings to make diagnoses and engage in genetic counselling. Currently, typical workflows of users are laborious, as data sources are often not integrated and must be accessed separately. Furthermore, large document sets often have to be combed through to assist in interpretation. Analytics tools are needed to help users process and distill large bodies of information into manageable sizes so the most relevant portions can be focused on. Current tools typically do not offer support for interactively exploring and engaging with visual representations of important entities and relationships (e.g., chromosomes, gene-phenotype relationships, and scientific articles). We present VErdICT, a visual analytics tool that can support users in their interpretation of molecular cytogenomic findings. A participatory design approach was taken to make VErdICT human-centered. We describe its development, usability and usefulness, and outline some future research challenges.

Relation extraction from biomedical text

Article

Jan 2007

Zhongmin Shi

A Heparin-binding Domain in the Amyloid Protein Precursor of Alzheimer's Disease Is Involved in the Regulation of Neurite Outgrowth

Article

Full-text available

May 1994

The amyloid protein precursor (APP) of Alzheimer's disease is synthesized as an integral transmembrane protein that is released from cells in culture following proteolytic cleavage. The function of released APP is not known, although there is evidence that the protein may bind to components of the extracellular matrix (ECM). In the present study, substratum-bound APP stimulated neurite outgrowth in cultures of chick sympathetic and mouse hippocampal neurons. This effect was dependent upon the presence of substratum-bound heparan sulfate proteoglycans (HSPG). The effect of APP on neurite outgrowth was comparable to that of laminin. A 14 K N-terminal fragment of APP was found to bind heparin and a region close to the N-terminus of APP (residues 96-110) identified as a potential heparin-binding domain based on secondary structure predictions and molecular modeling. Mutagenesis of three basic residues (lysine-99, arginine-100, and arginine-102) resulted in a recombinant protein (APPhep) with decreased heparin-binding capacity. A peptide homologous to the heparin-binding domain was synthesized and found to bind strongly to heparin and to inhibit binding of 125I-labeled APP to heparin (IC50 approximately 10(-7) M). The peptide blocked the effect of APP on neurite outgrowth (IC50 approximately 10(-7) M), whereas two other peptides homologous to other domains in APP had no effect. The results indicate that the binding of APP to HSPG in the ECM may stimulate the effects of APP on neurite outgrowth.

Recent improvements to the SMART domain-based sequence annotation resource

Article

Full-text available

Feb 2002
NUCLEIC ACIDS RES

SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users’ documents. A SMART mirror has been created at http://smart.ox.ac.uk.

Computing Fuzzy Associations for the Analysis of Biological Literature

Article

Full-text available

Jul 2002

The increase of information in biology makes it difficult for researchers in any field to keep current with the literature. The MEDLINE database of scientific abstracts can be quickly scanned using electronic mechanisms. Potentially interesting abstracts can be selected by matching words joined by Boolean operators. However this means of selecting documents is not optimal. Nonspecific queries have to be effected, resulting in large numbers of irrelevant abstracts that have to be manually scanned To facilitate this analysis, we have developed a system that compiles a summary of subjects and related documents on the results of a MEDLINE query. For this, we have applied a fuzzy binary relation formalism that deduces relations between words present in a set of abstracts preprocessed with a standard grammatical tagger. Those relations are used to derive ensembles of related words and their associated subsets of abstracts. The algorithm can be used publicly at http:// www.bork.embl-heidelberg.de/xplormed/.

The Swiss-Prot Protein Knowledgebase and its supplement TrEMBL in 2003

Article

Full-text available

Feb 2003
NUCLEIC ACIDS RES

The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot{at}expasy.org.

XplorMed: a tool for exploring MEDLINE abstracts

Article

Oct 2001
TRENDS BIOCHEM SCI

The most frequent access to the MEDLINE database of scientific abstracts is by keyword search. However, this is often not sufficient because although the user might find all the useful abstracts, these are buried in hundreds that are irrelevant. The exploratory tool XplorMed has been developed to analyse the result of any MEDLINE query. It suggests main groups of related topics and documents, sparing the user the need of reading all abstracts.

An analysis of statistical term strength and its use in the indexing and retrieval of molecular biology texts

Article

Jun 1996
COMPUT BIOL MED

The biological literature presents a difficult challenge to information processing in its complexity, diversity, and in its sheer volume. Much of the diversity resides in its technical terminology, which has also become voluminous. In an effort to deal more effectively with this large vocabulary and improve information processing, a method of focus has been developed which allows one to classify terms based on a measure of their importance in describing the content of the documents in which they occur. The measurement is called the strength of a term and is a measure of how strongly the term's occurrences correlate with the subjects of documents in the database. If term occurrences are random then there will be no correlation and the strength will be zero, but if for any subject, the term is either always present or never present its strength will be one. We give here a new, information theoretical interpretation of term strength, review some of its uses in focusing the processing of documents for information retrieval and describe new results obtained in document categorization.

Exploring MEDLINE abstracts with XplorMed

Article

Jul 2002
DRUG TODAY

XplorMed is a publicly available web tool conceived to make life easier for MEDLINE(c) users looking for scientific information. Searching scientific literature is an information retrieval problem. Abstracts that are of possible interest to the user are usually selected by a keyword search followed by manual screening, which often results in the retrieval of a large number of abstracts. Interesting references can be buried among irrelevant ones because of nonspecific queries. XplorMed is intended to extract dependency relations between the words of the abstracts. These relations can be filtered and arranged to deduce different subjects in the query and offer a condensed view of the abstract, allowing users to select texts of interest without having to read them all. XplorMed is available http://www.bork. embl-heidelberg.de/xplormed.

McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information

Online
Inheritance
Omim Man
Tm

Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), 2000.

OMIM (TM) McKusick-Nathans Institute for Genetic Medicine

Jan 2000

Update on XplorMed: A web server for exploring scientific literature

Abstract

Recommended publications

On Performance Evaluation of a Generic IP Signaling

Social Impact of Broadband Internet: A Case Study in the Shippagan Area, a Rural Zone in Atlantic Ca...

How to Summarize an OWL Domain Ontology

Structured summarization for news events