ArticlePDF Available

Update on XplorMed: A web server for exploring scientific literature

Authors:

Abstract

As scientific literature databases like MEDLINE increase in size, so does the time required to search them. Scientists must frequently inspect long lists of references manually, often just reading the titles. XplorMed is a web tool that aids MEDLINE searching by summarizing the subjects contained in the results, thus allowing users to focus on subjects of interest. Here we describe new features added to XplorMed during the last 2 years (http://www.bork.embl-heidelberg.de/xplormed/).
Update on XplorMed: a web server for exploring
scientific literature
Carolina Perez-Iratxeta
1,2,
*, Antonio J. Pe
´rez
3
, Peer Bork
1,2
and Miguel A. Andrade
1,2
1
European Molecular Biology Laboratory, Meyerhofstr. 1, 69117 Heidelberg, Germany,
2
Max Delbru¨ck
Center for Molecular Medicine, Robert Ro
¨sler-Str. 10, 13125 Berlin-Buch, Germany and
3
University of Malaga,
Facultad de CC. Campus de Teatinos, 29071 Malaga, Spain
Received February 14, 2003; Revised and Accepted March 21, 2003
ABSTRACT
As scientific literature databases like MEDLINE
increase in size, so does the time required to search
them. Scientists must frequently inspect long lists of
references manually, often just reading the titles.
XplorMed is a web tool that aids MEDLINE searching
by summarizing the subjects contained in the
results, thus allowing users to focus on subjects of
interest. Here we describe new features added to
XplorMed during the last 2 years (http://www.bork.
embl-heidelberg.de/xplormed/).
BACKGROUND AND GOALS
A scientist searching the scientific literature (for example,
MEDLINE with Entrez at the NCBI’s PubMed server, http://
www.ncbi.nlm.nih.gov/entrez/) may initially retrieve an
unmanageable number of references, typically hundreds.
Even for very specific subjects, it is not always clear how to
narrow the search to focus on the most relevant matches. For
example, imagine you are a researcher interested in the
possible role of the interaction between heparin and proteins in
Alzheimer’s disease, who queries the PubMed server with the
terms ‘Alzheimer and heparin’. This search returns presently
>100 references to literature, of which only some mention
proteins. Finding these currently requires manual examination
of the abstracts which can be time-consuming. In such
instances XplorMed can be useful (1).
XplorMed is a web tool that summarizes MEDLINE search
results according to subjects and allows you to navigate
through abstracts in an interactive fashion. Here we give details
as to the use of XplorMed. A detailed tutorial is also avail-
able online (http://www.bork.embl-heidelberg.de/xplormed/
example/).
INPUT TO XplorMed
There are two ways to provide input to XplorMed. You can
type a PubMed query directly into our server or you can supply
a file containing a set of abstracts. XplorMed can handle
several abstract formats: MEDLINE (default), EndNote, XML
and XplorMed (see page 1 of the tutorial for details).
A third way to query XplorMed is to start from literature
linked to a particular entry from one of the MEDLINE, OMIM
(2), SMART (3) or SWISS-PROT/SpTrEMBL (4) databases.
Here you simply need to provide the identifier of the entry of
your interest and the corresponding database name. The initial
set of abstracts of each XplorMed session is kept in the server
for a week, enabling you to recover your session. We
recommend you start with sets of 30 references, though the
current maximum is 500 abstracts.
OVERVIEW OF AN ANALYSIS
The first step involves a coarse overlapping clustering of the
abstracts. References are classified into eight classes depend-
ing on their subject. Classes correspond to MeSH main
categories, such as ‘Anatomy’, ‘Organisms’, ‘Chemical and
Drugs’, ‘Biological Sciences’, etc. (see http://www.nlm.nih.
gov/mesh/meshhome.html). You can impose an initial filtering
to restrict the search to categories of interest and it is also
possible to filter the search results by publication date (see
page 2 of the tutorial).
The next web page displays keywords in the selected
abstracts. The method for computing keywords and relations
between them can be found in literature (5). The list of
extracted keywords provides a summary of the subjects within
the query results and these are listed in order of relevance
(more important concepts are listed first). Considering the
above example of heparin and Alzheimer, XplorMed gives
expected terms—‘protein’, ‘heparin’, ‘alzheimer’ and
‘disease’—in addition to others that may be new to you,
for example, ‘tau’ and ‘app’.
At this stage, you can choose whether to go directly to the
next step or to start a deeper analysis of the displayed subjects.
The latter involves a context analysis of the subjects
represented by the keywords and it is outlined briefly below
(see Context Analysis of the Subjects). Alternatively, if you
choose to go further, several groups or chains of closely related
keywords are then presented to you.
You can modify the number of chains and their length by
means of two parameters: alpha and score (see page 3 of
*To whom correspondence should be addressed. Tel: +49 6221 387 456; Fax: +49 6221 387 517; Email: cperez@embl-heidelberg.de
3866–3868 Nucleic Acids Research, 2003, Vol. 31, No. 13
DOI: 10.1093/nar/gkg538
Published by Oxford University Press 2003
Figure 1. (A) XplorMed’s home page. (B) Words related to ‘app’. (C) Sentences containing the words ‘app’ and ‘outgrowth’.
Nucleic Acids Research, 2003, Vol. 31, No. 13 3867
tutorial for details). Each chain is preceded by a number that
indicates how many abstracts contain both words. By selecting
one or more of these chains, you perform a sub-query of the
original set. For example, suppose you are interested in protein
domains that could bind heparin. Accordingly, you would
inspect the pair {protein, domain}, which appears in 13
references. You can select an alternative or additional word
chain if you do not nd what you wanted among the proposals
of the system.
The next web page provides an ordered list of abstracts;
those likely to be most interesting according to your selection
are highlighted on top (in our example, the papers dealing with
the heparin binding domain). If you checked in the previous
page the boxes for cross-linking to the corresponding
databases, several hyperlinked symbols will label some
abstracts (see Cross Linking to Molecular Biology Databases).
The ltered subset of papers can now be used as a new
XplorMed starting point at the computation-of-keywords step
(see above). Alternatively, you can expand this subset with
new papers among their MEDLINE neighbors (see Expanding
the Query through Related Bibliography). New keywords
focusing more closely on your subject of interest will appear at
this stage. The procedure can be performed repetitively and the
recovery of the set of abstracts is possible at any stage.
CONTEXT ANALYSIS OF THE SUBJECTS
When the list of keywords is presented, you can explore both
their meanings and relationships. By clicking on a word you
can see all the sentences in the abstracts that contain that word
and each sentence is linked to its MEDLINE abstract. In this
way you can learn why a particular word is mentioned across
the abstracts. Moreover, you can also discover interesting
information by examining the words strongly related to a
particular word (for example, app, see Fig. 1B). By clicking
on the [R] next to each word, a window displaying closely
related words (such as outgrowthor zinc) will be shown.
Clicking on the [X] near any related word (like outgrowth)
shows the sentences containing either of the words (appor
outgrowth) in abstracts containing both words [for example,
The results indicate that the binding of APP to HSPG in the
ECM may stimulate the effects of APP on neurite outgrowth.
(6)]. Words and sentences are highlighted in different colors
for an easy identication (see page 3 of the tutorial for details).
Clicking the button Explore the context of any wordallows
you to do this kind of analysis in a more exible way by typing
other keywords of interest.
CROSS-LINKING TO MOLECULAR BIOLOGY
DATABASES
As was mentioned above, the list of selected abstracts can be
optionally hyperlinked to objects in several databases,
currently MEDLINE, OMIM, SMART, SWISS-PROT and
SpTrEMBL. The diverse symbols indicate the database and in
the case of SWISS-PROT, the subject of the article, such as
describes protein function,reports a 3D structure, etc. Note
that the hyperlink to PubMed is always supplied, allowing you
to check the content of the abstract. An additional symbol
denotes review articles.
EXPANDING THE QUERY THROUGH
RELATED BIBLIOGRAPHY
As mentioned above, once you have selected a subset of
abstracts, it is possible to re-enter the analysis with the ltered
set at the computation-of-keywords step. You can also expand
this set of abstracts by retrieving neighbors from MEDLINE.
Neighbors are those references that deal with the same
(or similar) subject (7). To opt for this expansion you have
to check the box at the bottom of the list of references. You
can also change the number of neighbors included.
CONCLUSION
We have summarized how you can use the web tool XplorMed
to deal more efciently with MEDLINE literature. Because
our server is being continually developed for the inclusion of
new features, any suggestion from users is warmly welcomed
and will be acknowledged.
ACKNOWLEDGEMENTS
We are grateful to the members of our group for their
suggestions and to Robert B. Russell and to Sea
´nI.
ODonoghue for comments to our manuscript. XplorMed uses
in one step, TreeTagger, a part of speech tagger. We are
grateful to Helmut Schmid (IMS, Stuttgart, Germany) for
developing TreeTagger and making it publicly available.
REFERENCES
1. Perez-Iratxeta,C., Bork,P. and Andrade,M.A. (2001). Xplormed: a tool for
exploring MEDLINE abstracts. Trends Biochem. Sci.,26, 573575.
2. Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans
Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD)
and National Center for Biotechnology Information, National Library of
Medicine (Bethesda, MD), 2000.
3. Letunic,I., Goodstadt,L., Dickens,N.J., Doerks,T., Schultz,J., Mott,R.,
Ciccarelli,F., Copley,R.R., Ponting,C.P. and Bork,P. (2002) Recent
improvements to the SMART domain-based sequence annotation
resource. Nucleic Acids Res.,30, 242244.
4. Boeckmann,B., Bairoch,A., Apweiler,R., Blatter,M.C., Estreicher,A.,
Gasteiger,E., Martin,M.J., Michoud,K., ODonovan,C., Phan,I. et al.
(2003) The SWISS-PROT protein knowledgebase and its supplement
TrEMBL in 2003. Nucleic Acids Res.,31, 365370.
5. Perez-Iratxeta,C., Bork,P. and Andrade,M.A. (2002). Computing fuzzy
associations for the analysis of biological literature. Biotechniques,32,
13801385.
6. Small,D.H., Nurcombe,V., Reed,G., Clarris,H., Moir,R., Beyreuther,K.
and Masters,C.L. (1994). A heparin-binding domain in the amyloid
protein precursor of Alzheimers disease is involved in the regulation of
neurite outgrowth. J. Neurosci.,14, 21172127.
7. Wilbur,W.J. and Yang,Y. (1996). An analysis of statistical term strength
and its use in the indexing and retrieval of molecular biology texts.
Comput. Biol. Med.,26, 209222.
3868 Nucleic Acids Research, 2003, Vol. 31, No. 13
... However, as most information retrieval (IR) systems, PubMed uses query proximity models to search documents matching a user's query terms, returning results in the form of a list. Similarly, several other IR tools have been developed based on the MEDLINE literature database, including GoPubMed [Doms and Schroeder, 2005], XplorMed [Perez-Iratxeta et al., 2003], Chilibot [Chen and Sharp, 2004], FACTA [Tsuruoka et al., 2008], EBIMed [Rebholz-Schuhmann et al., 2007] and PolySearch [Cheng et al., 2008]. All of these tools allow for some form of literature exploration. ...
Article
Full-text available
The fast increasing amount of articles published in the biomedical field is creating difficulties in the way this wealth of information can be efficiently exploited by researchers. As a way of overcoming these limitations and potentiating a more efficient use of the literature, we propose an approach for structuring the results of a literature search based on the latent semantic information extracted from a corpus. Moreover, we show how the results of the Latent Semantic Analysis method can be adapted so as to evidence differences between results of different searches. We also propose different visualization techniques that can be applied to explore these results. Used in combination, these techniques could empower users with tools for literature guided knowledge exploration and discovery.
... The search employs multiple information sources, including PubMed, OMIM [68], DrugBank [69], and Swiss-Prot [70]. Among the other systems that provide similar functionality are XplorMed [71], MedlineR [72], LitMiner [73]. ...
Article
Full-text available
Background: The rapid growth of scientific literature has rendered the task of finding relevant information one of the critical problems in almost any research. Search engines, like Google Scholar, Web of Knowledge, PubMed, Scopus, and others, are highly effective in document search; however, they do not allow knowledge extraction. In contrast to the search engines, text-mining systems provide extraction of knowledge with representations in the form of semantic networks. Of particular interest are tools performing a full cycle of knowledge management and engineering, including automated retrieval, integration, and representation of knowledge in the form of semantic networks, their visualization, and analysis. STRING, Pathway Studio, MetaCore, and others are well-known examples of such products. Previously, we developed the Associative Network Discovery System (ANDSystem), which also implements such a cycle. However, the drawback of these systems is dependence on the employed ontologies describing the subject area, which limits their functionality in searching information based on user-specified queries. Results: The ANDDigest system is a new web-based module of the ANDSystem tool, permitting searching within PubMed by using dictionaries from the ANDSystem tool and sets of user-defined keywords. ANDDigest allows performing the search based on complex queries simultaneously, taking into account many types of objects from the ANDSystem's ontology. The system has a user-friendly interface, providing sorting, visualization, and filtering of the found information, including mapping of mentioned objects in text, linking to external databases, sorting of data by publication date, citations number, journal H-indices, etc. The system provides data on trends for identified entities based on dynamics of interest according to the frequency of their mentions in PubMed by years. Conclusions: The main feature of ANDDigest is its functionality, serving as a specialized search for information about multiple associative relationships of objects from the ANDSystem's ontology vocabularies, taking into account user-specified keywords. The tool can be applied to the interpretation of experimental genetics data, the search for associations between molecular genetics objects, and the preparation of scientific and analytical reviews. It is presently available at https://anddigest.sysbio.ru/ .
... In order to maximize the retrieval and use of the information available within PubMed, many post-processing computational tools have been developed. Plikus [2] provides a comprehensive group of third-party search interfaces [3][4][5] as well as systems that associate PubMed literature with ontology databases [6][7][8]. ...
Article
Full-text available
The PubMed database offers an extensive set of publication data that can be useful, yet inherently complex to use without automated computational techniques. Data repositories such as the Genomic Data Commons (GDC) and the Gene Expression Omnibus (GEO) offer experimental data storage and retrieval as well as curated gene expression profiles. Genetic interaction databases, including Reactome and Ingenuity Pathway Analysis, offer pathway and experiment data analysis using data curated from these publications and data repositories. We have created a method to generate and analyze consensus networks, inferring potential gene interactions, using large numbers of Bayesian networks generated by data mining publications in the PubMed database. Through the concept of network resolution, these consensus networks can be tailored to represent possible genetic interactions. We designed a set of experiments to confirm that our method is stable across variation in both sample and topological input sizes. Using gene product interactions from the KEGG pathway database and data mining PubMed publication abstracts, we verify that regardless of the network resolution or the inferred consensus network, our method is capable of inferring meaningful gene interactions through consensus Bayesian network generation with multiple, randomized topological orderings. Our method can not only confirm the existence of currently accepted interactions, but has the potential to hypothesize new ones as well. We show our method confirms the existence of known gene interactions such as JAK-STAT-PI3K-AKT-mTOR, infers novel gene interactions such as RAS- Bcl-2 and RAS-AKT, and found significant pathway-pathway interactions between the JAK-STAT signaling and Cardiac Muscle Contraction KEGG pathways.
... The biomedical text mining system proposed by Hsu et al. [7] focuses on the gene-related information. XplorMed [12], [13], This [19] was developed to analyze the results of any MEDLINE query. GOPubMed [4], [5] extracts GO terms from the retrieved abstracts in PubMed, and presents the relevant sub-ontology for browsing. ...
Conference Paper
Full-text available
The biomedical literature is increasing rapidly, but most information retrieval systems for biomedicine are not what we really expect. In general, users suffer from exactly specifying what they want to the information retrieval systems, thereby getting back unsatisfied results from these systems. In this paper, we proposed PubMed Smarter that improves the effectiveness of information retrieval in PubMed. We built the word-relationship tree for biomedicine used to find implicit words. The implicit words are the ones correlative to a user query, and facilitate searching the PubMed database. Finally, we also used a fair assessment to evaluate the effectiveness of the system.
Book
The Handbook of Statistical Analysis and Data Mining Applications is a comprehensive professional reference book that guides business analysts, scientists, engineers and researchers (both academic and industrial) through all stages of data analysis, model building and implementation. The Handbook helps one discern the technical and business problem, understand the strengths and weaknesses of modern data mining algorithms, and employ the right statistical methods for practical application. Use this book to address massive and complex datasets with novel statistical approaches and be able to objectively evaluate analyses and solutions. It has clear, intuitive explanations of the principles and tools for solving problems using modern analytic techniques, and discusses their application to real problems, in ways accessible and beneficial to practitioners across industries - from science and engineering, to medicine, academia and commerce. This handbook brings together, in a single resource, all the information a beginner will need to understand the tools and issues in data mining to build successful data mining solutions.
Technical Report
Full-text available
Objective. This project’s goal was to provide a preliminary sketch of the use of text-mining tools as an emerging methodology within a number of systematic review processes. We sought to provide information addressing pressing questions individuals and organizations face when considering utilizing text-mining tools. Methods. We searched the literature to identify and summarize research on the use of text-mining tools within the systematic review context. We conducted telephone interviews with Key Informants (KIs; n=8) using a semi-structured instrument and subsequent qualitative analysis to explore issues surrounding the implementation and use of text-mining tools. Lastly, we compiled a list of text-mining tools to support systematic review methods and evaluated the tools using an informal descriptive appraisal tool. Results. The literature review identified 122 articles that met inclusion criteria, including two recent systematic reviews on the use of text-mining tools in the screening and data abstraction steps of systematic reviews. In addition to these two steps, a preliminary exploration of the literature on searching and other less-studied steps are presented. Support for the use of text-mining was strong amongst the KIs overall, though most KIs noted some performance caveats and/or areas in which further research is necessary. We evaluated 111 text-mining tools identified from the literature review and KI interviews. Conclusions. Text-mining tools are currently being used within several systematic review organizations for a variety of review processes (e.g., searching, screening abstracts), and the published evidence-base is growing fairly rapidly in breadth and levels of evidence. Several outstanding questions remain for future empirical research to address regarding the reliability and validity of using these emerging technologies across a variety of review processes and whether these generalize across the scope of review topics. Guidance on reporting the use of these tools would be useful.
Chapter
It is becoming increasingly difficult to keep up with the amount of information published in the scientific literature, both for domain experts and for the sake of maintaining up-to-date biological databases based on manual curation of articles. This issue has been addressed with the help of text mining technologies specifically adapted to the biomedical domain. The aim of these strategies is to be more efficient in the retrieval and classification of relevant documents and the detection of bio-entities in text. Text mining is used for the automatic extraction of interactions between and functional annotations of biological substances and links articles with existing objects in the annotation databases. This chapter provides a general overview of the main tasks in biomedical text mining and natural language processing, introducing the underlying methods and existing applications tailored to handle the rapidly growing amount of literature data.
Conference Paper
Interpreting molecular cytogenomic findings that cover the human genome (e.g., microarray results) is challenging, as it requires accessing and working with multiple, diverse sources of data that are often large and heterogeneous. These data need to be accessed, queried, and simultaneously integrated to achieve open-ended goals, such as interpreting findings to make diagnoses and engage in genetic counselling. Currently, typical workflows of users are laborious, as data sources are often not integrated and must be accessed separately. Furthermore, large document sets often have to be combed through to assist in interpretation. Analytics tools are needed to help users process and distill large bodies of information into manageable sizes so the most relevant portions can be focused on. Current tools typically do not offer support for interactively exploring and engaging with visual representations of important entities and relationships (e.g., chromosomes, gene-phenotype relationships, and scientific articles). We present VErdICT, a visual analytics tool that can support users in their interpretation of molecular cytogenomic findings. A participatory design approach was taken to make VErdICT human-centered. We describe its development, usability and usefulness, and outline some future research challenges.
Article
Full-text available
The amyloid protein precursor (APP) of Alzheimer's disease is synthesized as an integral transmembrane protein that is released from cells in culture following proteolytic cleavage. The function of released APP is not known, although there is evidence that the protein may bind to components of the extracellular matrix (ECM). In the present study, substratum-bound APP stimulated neurite outgrowth in cultures of chick sympathetic and mouse hippocampal neurons. This effect was dependent upon the presence of substratum-bound heparan sulfate proteoglycans (HSPG). The effect of APP on neurite outgrowth was comparable to that of laminin. A 14 K N-terminal fragment of APP was found to bind heparin and a region close to the N-terminus of APP (residues 96-110) identified as a potential heparin-binding domain based on secondary structure predictions and molecular modeling. Mutagenesis of three basic residues (lysine-99, arginine-100, and arginine-102) resulted in a recombinant protein (APPhep) with decreased heparin-binding capacity. A peptide homologous to the heparin-binding domain was synthesized and found to bind strongly to heparin and to inhibit binding of 125I-labeled APP to heparin (IC50 approximately 10(-7) M). The peptide blocked the effect of APP on neurite outgrowth (IC50 approximately 10(-7) M), whereas two other peptides homologous to other domains in APP had no effect. The results indicate that the binding of APP to HSPG in the ECM may stimulate the effects of APP on neurite outgrowth.
Article
Full-text available
SMART (Simple Modular Architecture Research Tool, http://smart.embl-heidelberg.de) is a web-based resource used for the annotation of protein domains and the analysis of domain architectures, with particular emphasis on mobile eukaryotic domains. Extensive annotation for each domain family is available, providing information relating to function, subcellular localization, phyletic distribution and tertiary structure. The January 2002 release has added more than 200 hand-curated domain models. This brings the total to over 600 domain families that are widely represented among nuclear, signalling and extracellular proteins. Annotation now includes links to the Online Mendelian Inheritance in Man (OMIM) database in cases where a human disease is associated with one or more mutations in a particular domain. We have implemented new analysis methods and updated others. New advanced queries provide direct access to the SMART relational database using SQL. This database now contains information on intrinsic sequence features such as transmembrane regions, coiled-coils, signal peptides and internal repeats. SMART output can now be easily included in users’ documents. A SMART mirror has been created at http://smart.ox.ac.uk.
Article
Full-text available
The increase of information in biology makes it difficult for researchers in any field to keep current with the literature. The MEDLINE database of scientific abstracts can be quickly scanned using electronic mechanisms. Potentially interesting abstracts can be selected by matching words joined by Boolean operators. However this means of selecting documents is not optimal. Nonspecific queries have to be effected, resulting in large numbers of irrelevant abstracts that have to be manually scanned To facilitate this analysis, we have developed a system that compiles a summary of subjects and related documents on the results of a MEDLINE query. For this, we have applied a fuzzy binary relation formalism that deduces relations between words present in a set of abstracts preprocessed with a standard grammatical tagger. Those relations are used to derive ensembles of related words and their associated subsets of abstracts. The algorithm can be used publicly at http:// www.bork.embl-heidelberg.de/xplormed/.
Article
Full-text available
The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot{at}expasy.org.
Article
The most frequent access to the MEDLINE database of scientific abstracts is by keyword search. However, this is often not sufficient because although the user might find all the useful abstracts, these are buried in hundreds that are irrelevant. The exploratory tool XplorMed has been developed to analyse the result of any MEDLINE query. It suggests main groups of related topics and documents, sparing the user the need of reading all abstracts.
Article
The biological literature presents a difficult challenge to information processing in its complexity, diversity, and in its sheer volume. Much of the diversity resides in its technical terminology, which has also become voluminous. In an effort to deal more effectively with this large vocabulary and improve information processing, a method of focus has been developed which allows one to classify terms based on a measure of their importance in describing the content of the documents in which they occur. The measurement is called the strength of a term and is a measure of how strongly the term's occurrences correlate with the subjects of documents in the database. If term occurrences are random then there will be no correlation and the strength will be zero, but if for any subject, the term is either always present or never present its strength will be one. We give here a new, information theoretical interpretation of term strength, review some of its uses in focusing the processing of documents for information retrieval and describe new results obtained in document categorization.
Article
XplorMed is a publicly available web tool conceived to make life easier for MEDLINE(c) users looking for scientific information. Searching scientific literature is an information retrieval problem. Abstracts that are of possible interest to the user are usually selected by a keyword search followed by manual screening, which often results in the retrieval of a large number of abstracts. Interesting references can be buried among irrelevant ones because of nonspecific queries. XplorMed is intended to extract dependency relations between the words of the abstracts. These relations can be filtered and arranged to deduce different subjects in the query and offer a condensed view of the abstract, allowing users to select texts of interest without having to read them all. XplorMed is available http://www.bork. embl-heidelberg.de/xplormed.
McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information
  • Online
  • Inheritance
  • Omim Man
  • Tm
Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), 2000.
OMIM (TM) McKusick-Nathans Institute for Genetic Medicine
Online Mendelian Inheritance in Man, OMIM (TM). McKusick-Nathans Institute for Genetic Medicine, Johns Hopkins University (Baltimore, MD) and National Center for Biotechnology Information, National Library of Medicine (Bethesda, MD), 2000.