Conference PaperPDF Available

A new method for prioritizing drug repositioning candidates extracted by literature-based discovery

Authors:

Figures

Content may be subject to copyright.
A new method for prioritizing drug repositioning
candidates extracted by literature-based discovery
Majid Rastegar-Mojarad1,2, Ravikumar Komandur Elayavilli1, Dingcheng Li1, Rashmi Prasad2, Hongfang Liu1
1Department of Health Sciences Research, Mayo Clinic, USA
2University of Wisconsin-Milwaukee, Milwaukee, WI, USA
Email: {Mojarad.Majid, komandurelayavilli.ravikumar, Li.Dingcheng, Liu.Hongfang}@mayo.edu
prasadr@uwm.edu
Abstract— Drug repositioning has been a topic of great
attention to researchers and pharmaceutical companies due to its
significant impact on the cost of drug discovery. There are
several approaches to identify potentially novel drug candidates
through repurposing. Literature mining has played a critical role
in mining such information from scientific articles. In this paper,
we used drug-gene and gene-disease semantic predications
extracted from Medline abstracts to generate a list of potential
drug-disease pairs. We further ranked the generated pairs, by
assigning scores based on the predicates that qualify drug-gene
and gene-disease relationships. On comparing the top-ranked
drug-disease pairs against the Comparative Toxicogenomics
Database (CTD), a curated database for drug-disease relations,
we found that a significant percentage of top ranked pairs
appeared in CTD. Co-occurrence of these high-ranked pairs in
Medline abstracts further improves the confidence in our
approach to rank the inferred drug-disease relations higher in
the list. Finally, manual evaluation of top ten pairs ranked by our
approach revealed that nine of them have some biological
significance based on expert judgment.
Keywords; Drug repositioning; Literature-based discovery;
Semantic Predication
I. INTRODUCTION
New drug development costs between 500 million and 2
billion dollars, takes 10-15 years [1], and the success rate is
less than 10% [2]. A well-known alternative way to reduce the
risk and cost of developing new drugs is drug repositioning [3],
i.e., finding new targets for drugs that are already available in
the market. Drug repositioning (or drug repurposing) reduces
the bulk of the effort during the early stages of drug
development, resulting in significant reduction of time and
cost. Drug repositioning alone accounts for approximately 30%
of the newly US Food and Drug Administration (FDA)-
approved drugs and vaccines in recent years [4]. To identify
new indications for available drugs, several approaches have
been studied using various types of data such as clinical data
[5], genetic data [6], [7], and biomedical literature [8], [9].
Literature mining plays a critical role in identifying the
indirect (hidden) relationships between a drug and its potential
targets, since it is nearly impossible for experts to manually
review the ever-increasing body of scientific literature to
identify hidden relationships. Literature-based discovery
(LBD) [10] is a popular approach for unfolding potential novel
findings from the biomedical literature, and involves the
application of the relation of transitivity to discovering
relations. Specifically, LBD systems relate two entities, with a
common entity to provide the link between them. For example,
in order to generate a list of potential drug-disease relations, the
LBD system may attempt to find a common entity (often
genes) that potentially links them. Once the indirect connection
between the drug-disease pairs are established, it is necessary
to eliminate the false positives, and identify only true relations
(novel discoveries).
Distinguishing novel discoveries from the others is not a
trivial task. Typically, the LBD method consists of two steps:
1) extracting and mining relations from the text, and 2)
eliminating the false positives and identifying only the true
relations. As a final step, however, it is also important to have a
rigorous validation of the candidate relations before we
proceeding to laboratory or clinical investigations, since these
are not only expensive but also time consuming. The
effectiveness of a LBD system, therefore, lies in its rigorous
validation. Most prior studies lack such vigorous validation,
including ranking of the generated candidates generated
through LBD process. Though there are a few prior attempts
[12], [13] in this direction, this area has been largely
underexplored.
In this study, we intend to address the critical issue of the
validation of candidate pairs identified through LBD. We
propose and evaluate the effectiveness of two ranking methods
for prioritizing potential drug repositioning candidates
generated by LBD.
The rest of the paper proceeds as follows. First, we discuss
the background and related work in this domain. Here, we
provide an overview for repositioning and LBD and the
resources often used for LBD (semantic medline database in
particular) before we review some of the prior approaches for
ranking the discoveries by LBD systems. Subsequently, we
discuss our approach to rank the LBD-based drug-disease
pairs. Finally, we present the evaluation, results and discussion,
and highlight the limitations of the study.
II. BACKGROUND AND RELATED WORKS
Drug repositioning [15] – also known as repurposing or
reprofiling – is the process of discovering new indications for
existing or shelved drugs. It enables researchers to speed up the
process of developing drugs, with lower cost and risk. There
have been many approaches proposed for drug repositioning,
which could be categorized in different ways. Dudley et al [16]
reviewed computational methods for drug repositioning and
categorized the methods into two classes: drug-based and
disease-based, based on whether drug or disease perspective
initiates the discovery. In another review paper [17], Hurle et
al. reviewed computational techniques for systematic analysis
of transcriptomics, side effects, and genetics data. Wei et al [8]
categorized the drug repositioning methods into literature-
based and ontology-based.
Literature-based discovery (LBD) strives to find novel
connections or correlations between concepts by using
scientific literature. Many LBD studies have been conducted
[10], [18]–[20] in the biomedical domain to generate new
hypotheses that potentially could lead to new discoveries. In
1986, Don Swanson hypothsized that fish oil may have
beneficial effects in patients with Raynaud’s syndrome. He
came up with this hypothesis after reviewing the literature and
observing that (1) Raynaud’s syndrome (A) patients have blood
viscosity (B) disorder, and (2) Fish oil (C) can reduce blood
viscosity. Later, the hypothesis was verified by clinical trials.
Swanson designed a software called ArrowSmith [21] and
implemented the model, called Swanson’s ABC model [10], to
identify more LBDs. LBD deploys general text mining
techniques such as named entity recognition and information
extraction. Sub tasks in LBD are: named entity recognition
(term recognition), term normalization, information extraction,
association mining, and ranking. One of the essential tasks for
a LBD system is to decide if two concepts are correlated or not
(relation extraction). Most commonly used approaches are co-
occurrence analysis [20], Association Rules [22], TF-IDF, Z-
Score, and Mutual Information Measure [19]. Some
approaches are available to identify associations between
concepts and terms, which do not co-occur with one another in
the biomedical literature [12][23]–[25]. Another approach to
identify correlated concepts is using semantic predictions [18].
Hristovski et al. [18] proposed a approach to augment co-
occurrence analysis with semantic predications provided by
two natural language processing systems. Ahlers et al [13] used
semantic medline to propose discovery patterns for the use of
antipsychotic agents in treatment of cancer. Cohen et al. [12]
proposed an approach named Predication-based Semantic
Indexing to generate discovery patterns.
Semantic Medline Database (SemMedDB) [26] contains
approximately 70 million semantic predications, which
extracted by a rule-based system, SemRep [11], from Medline
titles and abstracts. Each semantic predication is a subject-
relation (predicate)-object triple. Subject and object are
concepts from the UMLS Metathesaurus and predicate is a
relation from the UMLS Semantic Network. There are 30
different types of predicate in Semantic Medline database such
as: affects, causes, associated with, treats, etc. Besides these
predicates, there are negative predicates, which show negative
relation between subject and object.
SemMedDB is a relational database and has been used in
several studies to facilitate knowledge discovery [27]–[29].
Workman and Stoddart [27] proposed to use Semantic Medline
as a potential decision support system for point of care.
Previously, our group integrated semantic predications into a
system, called Ask Mayo Expert (AME), to retrieve the most
relevant literature to support the evidence-based clinical
decision making process at point of care [28]. In another study
[29], SemMedDB is utilized to investigate the significance of
extracting information from multiple sentences specifically in
the context of drug-disease relation discovery.
The challenging and expensive task after generating LBDs
is validation. The discoveries can be confirmed or rejected
through human judgment, laboratory methods, or clinical
investigations. The validation could be facilitated with ranking
and prioritizing the generated LBDs, which is the last step in a
LBD system. There have been several studies, which proposed
the ranking algorithm. Wren proposed an algorithm called
average minimum weight (AMV) [30]. The algorithm
calculates a weight for each discoveries (A-C) based on the
strength of A-B and B-C relations. The strength of each
relation is calculated based on mutual information. The
algorithm considers all possible B concepts, which have
relation with A and C in the calculation. Another approach to
rank the findings is proposed by Yetisgen-Yildiz and Pratt [31],
[32]. They used the number of B concepts, which link A to C
as the indication of a strong correlation. The method, which
called Linking Term Count with Average Minimum Weight
(LTC-AMW), uses AMV in case of tie. Swanson et al. [33]
introduced a measure to rank the discoveries based on MeSH
terms in literature called Literature Cohesiveness. AMV and
LTC-AMV are generic and can be used in different LBD
systems, but these algorithms do not consider semantic in their
calculation.
III. METHOD
Our approach to identify drug-repositioning candidates
from literature was inspired by Swanson’s ABC model [10].
Our premise in inferring a correlation between the concept A
and concept C depends upon how strong the association was in
the two associations (A to B and B to C). In our study drug is
concept A, disease is target concept C, and gene serves as an
intermediate Concept B linking A and C. We retrieved the
drug-gene and gene-disease semantic predications from
SemMedDB [26] to infer the link between drug and disease.
Consider the following two examples of semantic predications
extracted from SemMedDB:
Example 1: Strepsils (Drug), INTERACTS WITH,
CA2 (Gene)
Example 2: CA2 (Gene), AUGMENTS, Chagas
(Disease)
From the pairs in the above examples, the LBD system
generates Strepsils - Chagas as a potential drug-disease pair.
While the ABC model is common to all the LBD studies,
using semantic predications allow us 1) to consider the
predicates that qualify the drug-gene and gene-disease
relationships 2) to explore a systematic approach to eliminate
potential false positive drug-disease pairs from the potential list
and provide a meaningful ranking to the final drug-disease
pairs. From the initial list we can use series of methods to
eliminate erroneous extraction. However, filtering approaches
are not exhaustive thereby leaving room to a large list of drug-
disease pairs. It is pertinent that we rank these pairs based on
certain parameters, which may help identify a threshold, below
which the drug-disease candidates can be discarded.
As a first step, we use two approaches to filter the drug-
gene and gene-disease pairs. 1) Pairs qualified by predicates,
which are negated such as “did not inhibit” were not at all
considered. 2) We also do not consider pairs qualified by
predicates such as “co-exists”, which do not semantically
define a relationship between the pairs. For subsequent steps
we relied on the notion that the assertions of NLP extraction
based on semantic predicates will also have the potential to
establish biological relatedness between the drug-disease pair.
Hence we used the predicates that qualify the binary
relationships as a feature to rank the final drug-disease
relationships. For example to rank the above pair, Strepsils –
Chagas, we used the predicate between drug-gene,
“INTERACTS WITH”, and the predicate between gene-
disease, “AUGMENTS”. The semantic predicates of both the
drug-gene and gene-disease pair play a determining role in
qualifying a drug-disease pair. We attempted to find a
meaningful co-relation between the predicates that qualify
drug-gene and gene-disease relationships, which we called
them, intermediate predicates, and the likelihood of generating
a true drug-disease pair. The importance of predicate to
determine the relevance of drug-disease relations is even more
important given the fact that the individual semantic
predication can occur in more than one document. In order to
assert a relationship that is inferred from two relationships from
multiple documents, we propose that the predicate co-relation
between the two relationships is one of the key factors.
Besides, the relation and the predicates may have many-to-
many relationships meaning that more than one predicate can
qualify a relation between the two entities. For example, there
is only one citation for the relationship between Strepsils and
CA2, while there are six citations that contain the relationship
between CA2 and Chagas. Out of these six, SemRep identified
three of them as “ASSOCIATED WITH”, two of them as,
“AUGMENTS”, and one as “AFFECTS” relationship. The
actual predicate type also plays a critical role in our ranking
schema.
As a final step, we rely on curated resources to further
refine our ranking approach. While NLP assertions do play a
role in identifying biologically related drug-disease pairs, it is
quite pertinent to take advantage of the existing curated
resources to further filter the irrelevant pairs. There are
numerous resources such as UMLS and Comparative
Toxicogenomics Database (CTD) [34], which catalog drug-
disease relationships. In this study we used UMLS as the gold
standard to evaluate the effect of intermediate predicates in
generating a true drug-disease pair. To identify already known
drug-disease relations in our generated list, we cross-referenced
the generated list of drug-disease pairs with UMLS drug-
disease relations. To limit our study to drug-repositioning
candidates, we only considered drug-disease relations in
UMLS, which their type is “May_be_treated_by”. As
SemMedDB stores Concept Unique Identifier (CUI), assigned
by UMLS to each biomedical entity, we used CUIs to cross-
reference our list and UMLS.
In this study we explored two different ranking approaches
based on two assumptions. In the first approach we considered
the predicate of drug-gene and gene-disease to be independent
of each other, while in the second we considered the
dependencies between the predicates of the two pairs.
A. Ranking based on predicate independence
In this approach, we had the fundamental assumption that
the predicates of drug-gene pair and gene-disease pair are
independent of each other while estimating their relevance in
pairing a drug with disease. Besides this assumption we also
had other following criteria for scoring the relevance of drug-
disease pair:
1. Percentage frequency of the individual predicates in
drug-gene (PpDG)1 and gene-disease (PpGD) relations
was one major criterion for determining the relevancy
of the drug-disease pair. We further refined this notion
by considering only those predicates, whose drug-
disease pair is represented in UMLS drug-disease
relations as a “May_be_treated_by” relations. We
showed the refined version with (PpDG-U)2 and (PpGD-U).
The one issue with the choice of UMLS based
validation is the possibility of lag in the curation of
drug-disease relation in the UMLS. There is a
possibility in eliminating lot of potentially relevant
predicates in identifying the right drug-disease pairs.
2. As an additional validation step to normalize the
percentages, we also determined the respective
percentage frequency of the drug-gene (PpDG-S) and
gene-disease (PpGD-S)3 predicates in the literature mined
relations in SemMedDB.
3. The raw score for a given drug-disease pair inferred
from the individual pair (drug-gene (DG) and gene-
disease (GD)) is calculated as per the equation 1.
  log
log


 1
Where n shows the number of semantic predications
between the drug-gene extracted from literature and m the
same number for the gene-disease relationship. Figure 1 shows
the steps of calculating the independence scores. For example,
consider the above drug-disease pair (Strepsils - Chagas). In
order to calculate the score for the pair, we added the ratio of
log scores of the individual predicates as outlined in equation
(1). For this example, we added the score of the only predicate,
INTERACTS WITH” that defines the relationship between
drug-gene (Example 1) pair with the score of all six predicates
between gene-disease pair (Example 2). As mentioned before,
more than one predicate may qualify a drug-gene/gene-disease
pair, which we consider the summation of the ratio of log
scores of all of them. At this point we do not consider the
semantic relatedness of the predicates while calculating their
scores.
1 The first “P” stands for percentage and the second one stands for predicate
and “DG” stands for “Drug-Gene”.
2 “U” stands for “UMLS”.
3 In this notation, “S” stands for “SemMedDB”.
B. Ranking based on predicate inter-dependence
In the second ranking, the predicates of drug-gene pair and
gene-disease pair are dependent on each other while estimating
their relevance in pairing a drug with disease. Here are the
steps:
1. Instead of the individual Percentage frequency we
compute the Percentage Frequency of the combined
predicates between the drug-gene and drug-disease
pair (PpDG-pGD). We limit this calculation to only those
drug-disease pairs that are represented in UMLS drug-
disease relations, which showed with this notation
(PpDG-pGD-U).
Fig. 1. Steps of calculating independence scores
2. Our approach to normalizing the percentages, were
very similar to the earlier one except that we used the
percentage frequency of the combined predicates from
SemMedDB (PpDG-pGD-S) as outlined in the following
equation:
 # #
##
,

 2
where n and m presents the number of all different
predicates between drug-gene and gene-disease,
respectively. # shows the frequency of the drug-
gene predicate in SemMedDB, and # shows the
frequency of gene-disease.
3. Using the percentage frequency we calculated the raw
score for a given generated drug-disease pair as given
in the following equation (3).
 log

 3
where n presents the number of combinations which
generate that drug-disease pair. For example, if there
are 2 different predicates between drug-gene and 3
different predicates between gene-disease, 6 different
combinations can generate the same drug-disease pair.
C. Validation and evaluation
To validate our ranking methods, we used two resources,
CTD and Medline citations. CTD contains curated drug-disease
relations. We cross-referenced the ranked drug-disease pairs
with CTD and studied existence of any correlation between our
ranking methods and being true drug-disease pairs (existence in
CTD). Also, we calculated the percentage of top ranked drug-
disease pairs, which appeared in CTD and compared it with the
same percentage for low ranked pairs. These results are used to
validate and compare our two methods, predicate independence
and inter-dependence. In the second step, we measured the
correlation between the score assigned to each generated pair
and co-occurrence of drug-disease in Medline abstracts. The
logic behind this validation is that more co-occurred drug-
disease pairs, more likely have relationship and our methods
should assign higher score to those pairs. In order to count the
number of co-occurrence of drug-disease pairs, we indexed all
Medline abstracts via ElasticSearch and searched the pairs in
titles and abstracts. Then the percentage of top ranked pairs,
which co-occurred in Medline citations are calculated and
compared with the low ranked pairs. As the last step of
validation, we reviewed top 10 ranked drug-disease pairs
manually and investigated the type of their relationship.
IV. RESULTS
All drug-gene and gene-disease semantic predications were
retrieved from SemMedDB. There were 19,993 drug-gene
pairs (12,666 unique) and 59,945 gene-disease pairs (33,489
unique). When we applied Swanson’s model to these pairs, it
resulted in the generation of 653,108 potential drug-disease
pairs (245,102 unique). All generated drug-disease pairs were
further cross-referenced with UMLS and 1,204 of the
generated pairs were found in this resource. Using these 1,204
pairs and following our ranking methods, independent and
inter-dependence, we calculated percentage frequency related
to each drug-gene and gene-disease predicate. From there and
by eq. 1 and eq. 3, two scores (for each method) are calculated
for the generated potential drug-disease pairs.
To validate our ranking methods, we calculated the
correlation between the scores and the number of co-
occurrence of pairs in Medline abstracts. The results showed
that inter-dependence method is correlated with co-occurrence
of the drug-disease pairs in Medline abstracts (using T-test, P-
value < 2.2e-16). We calculated the percentage of high and low
ranked drug-disease pairs, which co-occurred in Medline
abstracts. Figure 2 shows this comparison for the both
methods. In this figure Y-axis shows the percentage of pairs
co-occurred in Medline and X-axis shows the number of top
ranked pairs.
Fig. 2. Comparison of the percentage of high and low ranked drug-disease
pairs co-occurred in Medline abstracts.
We did the similar experiment on the percentage of
appearance of high and low ranked drug-disease pairs in CTD.
Figure 3 illustrates the result of this experiment.
Fig. 3. Comparison of the percentage of high and low ranked drug-disease
pairs appeared in CTD.
Table I includes the result of our manual investigation of
top ten ranked drug-disease pairs.
TABLE I. TOP TEN RANKED DRUG DISEASE PAIRS
Drug
Disease Type Reference
Omalizumab Asthma Treatment [35]
Nifedipine Tetanus Treatment Wikipedia
Nifedipine Ischemia Treatment [36]
Omalizumab Dermatitis, atopic Treatment [37]
Nifedipine Heart failure Treatment [38]
Nifedipine Renal tubular disorder Relation [39]
Calan Hypertensive disease Treatment
Airol Asthma -
Ezetimibe Coronary heart disease Treatment [40]
Cyclosporine Asthma Treatment [41]
V. DISCUSSIO N
In our study we found that interdependence based ranking
of drug-disease pairs (especially the top ranked pairs) identified
through LBD had strong literature evidence than the pairs
ranked using independent ranking approach. Figure 2 shows
that 82% of the top 100 drug-disease pairs, ranked using inter-
dependence approach had strong literature evidence. These
pairs were found to co-occur within a single abstract in
Medline. However there is a noticeable decline in the
percentage of pairs as we go below 100. We observed that pairs
ranked using independent ranking approach had relatively
lower co-occurrence evidence in the biomedical literature.
We observed a similar trend when we evaluated the
confidence levels of the top ranked pairs identified using both
approaches against a curated knowledgebase such as CTD.
Figure 3 further confirms the distinct advantage of the inter-
dependence ranking over the independence ranking. Finally,
manual evaluation of top ten pairs ranked by inter-dependence
approach revealed that the pairs have some biological
significance based on expert judgment. This indicates that
inter-dependence method has higher chance in identifying
biologically relevant drug-disease pairs. We also like to draw
the attention to the fact that nine out of ten top ranked drug-
disease pairs are valid relations, which belong to DRUG-
TREATS-DISESE relationship category.
There are two main limitations in this study. First, we did
not have a gold standard of drug-disease treatment pairs to
evaluate the performance of our approaches. Evaluation of our
system against a gold standard alone will help us to accurately
benchmark its actual performance. Second, there is an inherent
limitation either in the choice of resource (choice of CTD as a
resource) or the measure (literature co-occurrence) to evaluate
the confidence levels of top ranked drug-disease pairs
identified by the system. CTD though a manually curated
resource do not annotate the type of relationship between the
drug and disease. Hence while evaluating our system against
CTD we ignored the semantic predications extracted by the
system, which would have resulted in loss of valuable
information. Alternatively, we relied on document level co-
occurrence in literature as a measure to validate drug-disease
relationship. Document level co-occurrence of a relation is not
a strong indicator for a valid drug-disease relation. There are
also limitations in our ranking methods. Using some rigorous
statistical validation may further refine the notion of semantic
predication as evidence for relation between biomedical entities
for LBD. In future, we plan to create a gold standard of drug-
disease treatment relations to evaluate our methods more
accurately and compare our methods with other approaches.
We intend to improve our methods to be able to determine a
threshold score, which the pairs below that score considered as
false positive candidates.
VI. CONCLUSION
In this study, we proposed and evaluated two methods for
ranking and prioritizing potential drug-repositioning
discoveries extracted from literature. We used drug-gene and
gene-disease predications, extracted by SemRep, to generate
potential drug-disease pairs. The predicates between dug-gene
and gene-disease pairs are used to rank the generated drug-
disease pairs. Our results showed using combination of drug-
gene and gene-disease predicates can be a metric to rank more
likely true drug-repositioning candidates higher in the list.
ACKNOWLEDGMENT
This study was made possible by National Institute of
Health R01 GM102282-01.
REFERENCES
[1] C. P. Adams and V. V. Brantner, “Estimating The Cost Of New Drug
Development: Is It Really $802 Million?,” Health Aff, vol. 25, no. 2,
pp. 420–428, Mar. 2006.
[2] J. Gilbert, P. Henske, and A. Singh, Rebuilding Big Pharma’s
business model. In Vivo, 2003.
[3] E. L. Tobinick, “The value of drug repositioning in the current
pharmaceutical market,” Drug News Perspect., Mar. 2009.
[4] G. Jin and S. T. C. Wong, “Toward better drug repositioning:
prioritizing and integrating existing methods into efficient pipelines,”
Drug Discovery Today, vol. 19, no. 5, pp. 637–644, May 2014.
[5] H. Xu, M. C. Aldrich, Q. Chen, H. Liu, N. B. Peterson, Q. Dai, M.
Levy, A. Shah, X. Han, X. Ruan, M. Jiang, Y. Li, J. S. Julien, J.
Warner, C. Friedman, D. M. Roden, and J. C. Denny, “Validating
drug repurposing signals using electronic health records: a case study
of metformin associated with reduced cancer mortality,” J Am Med
Inform Assoc, Jul. 2014.
[6] P. Sanseau, P. Agarwal, M. R. Barnes, T. Pastinen, J. B. Richards, L.
R. Cardon, and V. Mooser, “Use of genome-wide association studies
for drug repositioning,Nat Biotech, vol. 30, no. 4, Apr. 2012.
[7] M. Rastegar-Mojarad, Z. Ye, J. M. Kolesar, S. J. Hebbring, and S. M.
Lin, “Opportunities for drug repositioning from phenome-wide
association studies,” Nat Biotech, vol. 33, no. 4, pp. 342–345, Apr.
2015.
[8] C.-P. Wei, K.-A. Chen, and L.-C. Chen, “Mining Biomedical
Literature and Ontologies for Drug Repositioning Discovery,” in
Advances in Knowledge Discovery and Data Mining, V. S. Tseng, T.
B. Ho, Z.-H. Zhou, A. L. P. Chen, and H.-Y. Kao, Eds. Springer
International Publishing, 2014, pp. 373–384.
[9] C. Andronis, A. Sharma, V. Virvilis, S. Deftereos, and A. Persidis,
“Literature mining, ontologies and information visualization for drug
repurposing,” Brief. Bioinformatics, vol. 12, no. 4, Jul. 2011.
[10] D. R. Swanson, “Migraine and magnesium: eleven neglected
connections,Perspect. Biol. Med., vol. 31, no. 4, pp. 526–557, 1988.
[11] T. C. Rindflesch and M. Fiszman, “The interaction of domain
knowledge and linguistic structure in natural language processing:
interpreting hypernymic propositions in biomedical text,J Biomed
Inform, vol. 36, no. 6, pp. 462–477, Dec. 2003.
[12] T. Cohen, D. Widdows, R. W. Schvaneveldt, P. Davies, and T. C.
Rindflesch, “Discovering discovery patterns with predication-based
Semantic Indexing,” Journal of Biomedical Informatics, vol. 45, no. 6,
pp. 1049–1065, Dec. 2012.
[13] C. B. Ahlers, D. Hristovski, H. Kilicoglu, and T. C. Rindflesch,
“Using the Literature-Based Discovery Paradigm to Investigate Drug
Mechanisms,” AMIA Annu Symp Proc, vol. 2007, pp. 6–10, 2007.
[14] O. Bodenreider, “The Unified Medical Language System (UMLS):
integrating biomedical terminology,” Nucl. Acids Res., vol. 32, no.
suppl 1, pp. D267–D270, Jan. 2004.
[15] T. T. Ashburn and K. B. Thor, “Drug repositioning: identifying and
developing new uses for existing drugs,” Nat Rev Drug Discov, vol. 3,
no. 8, pp. 673–683, Aug. 2004.
[16] J. T. Dudley, T. Deshpande, and A. J. Butte, “Exploiting drug-disease
relationships for computational drug repositioning,” Brief.
Bioinformatics, vol. 12, no. 4, pp. 303–311, Jul. 2011.
[17] M. R. Hurle, L. Yang, Q. Xie, D. K. Rajpal, P. Sanseau, and P.
Agarwal, “Computational Drug Repositioning: From Data to
Therapeutics,” Clinical Pharmacology & Therapeutics, Apr. 2013.
[18] D. Hristovski, C. Friedman, T. C. Rindflesch, and B. Peterlin,
“Exploiting Semantic Relations for Literature-Based Discovery,”
AMIA Annu Symp Proc, vol. 2006, pp. 349–353, 2006.
[19] M. Yetisgen-Yildiz and W. Pratt, “A new evaluation methodology for
literature-based discovery systems,” J Biomed Inform, vol. 42, no. 4,
pp. 633–643, Aug. 2009.
[20] M. Weeber, H. Klein, L. T. W, J. Berg, and D. R. S. Has, “Using
concepts in literature-based discovery: Simulating Swanson’s
Raynaud-fish oil and migrainemagnesium discoveries,” J. Am. Soc.
Inf. Sci. Tech, pp. 548–557, 2001.
[21] N. R. Smalheiser and D. R. Swanson, “Using ARROWSMITH: a
computer-assisted approach to formulating and assessing scientific
hypotheses,” Comput Methods Programs Biomed, vol. 57, no. 3, pp.
149–153, Nov. 1998.
[22] D. Hristovski, B. Peterlin, and S. Dzeroski, “Literature-based
Discovery Support System and Its Application to Disease Gene
Identification,Proc AMIA Symp, p. 928, 2001.
[23] M. D. Gordon and S. Dumais, “Using Latent Semantic Indexing for
Literature Based Discovery,” J. Am. Soc. Inf. Sci., vol. 49, no. 8, pp.
674–685, Jun. 1998.
[24] R. J. Cole and P. D. Bruza, “A Bare Bones Approach to Literature-
Based Discovery: An Analysis of the Raynaud’s/Fish-Oil and
Migraine-Magnesium Discoveries in Semantic Space,” in Discovery
Science, A. Hoffmann, H. Motoda, and T. Scheffer, Eds. Springer
Berlin Heidelberg, 2005, pp. 84–98.
[25] T. Cohen, R. Schvaneveldt, and D. Widdows, “Reflective Random
Indexing and indirect inference: A scalable method for discovery of
implicit connections,” Journal of Biomedical Informatics, vol. 43, no.
2, pp. 240–256, Apr. 2010.
[26] H. Kilicoglu, D. Shin, M. Fiszman, G. Rosemblat, and T. C.
Rindflesch, “SemMedDB: a PubMed-scale repository of biomedical
semantic predications,” Bioinformatics, vol. 28, Dec. 2012.
[27] M. J. Cairelli, C. M. Miller, M. Fiszman, T. E. Workman, and T. C.
Rindflesch, “Semantic MEDLINE for discovery browsing: using
semantic predications and the literature-based discovery paradigm to
elucidate a mechanism for the obesity paradox,” AMIA Annu Symp
Proc, vol. 2013, pp. 164–173, 2013.
[28] M. Rastegar-Mojarad, D. Li, and H. Liu, “Operationalizing Semantic
Medline for meeting the information needs at point of care,” presented
at the AMIA Clinical Research Informatics Summit, 2015.
[29] M. Rastegar-Mojarad, R. Komandur Elayavilli, D. Li, and H. Liu,
“Assessing the Need of Discourse-Level Analysis in Identifying
Evidences for Drug-Disease Relations in Scientific Literature,”
presented at the Medinfo, 2015.
[30] J. D. Wren, “Extending the mutual information measure to rank
inferred literature relationships,” BMC Bioinformatics, vol. 5, no. 1, p.
145, Oct. 2004.
[31] W. Pratt and M. Yetisgen-Yildiz, “LitLinker: Capturing Connections
Across the Biomedical Literature,” in Proceedings of the 2Nd
International Conference on Knowledge Capture, New York, NY,
USA, 2003, pp. 105–112.
[32] M. Yetisgen-Yildiz and W. Pratt, “Using statistical and knowledge-
based approaches for literature-based discovery,” J Biomed Inform,
vol. 39, no. 6, pp. 600–611, Dec. 2006.
[33] D. R. Swanson, N. R. Smalheiser, and V. I. Torvik, “Ranking indirect
connections in literature-based discovery: The role of Medical Subject
HEADINGS (MeSH),” J. AM. SOC. INFORMATION SCI.
TECHNOL, vol. 57, pp. 1427–1439, 2006.
[34] A. P. Davis, C. J. Grondin, K. Lennon-Hopkins, C. Saraceni-Richards,
D. Sciaky, B. L. King, T. C. Wiegers, and C. J. Mattingly, “The
Comparative Toxicogenomics Database’s 10th year anniversary:
update 2015,” Nucleic Acids Res., Oct. 2014.
[35] R. C. Strunk and G. R. Bloomberg, “Omalizumab for Asthma,” New
England Journal of Medicine, vol. 354, no. 25, Jun. 2006.
[36] R. A. Kloner, “Nifedipine in Ischemic Heart Disease,” Circulation,
vol. 92, no. 5, pp. 1074–1078, Sep. 1995.
[37] M. C. Fernández-Antón Martínez, V. Leis-Dosil, F. Alfageme-Roldán,
A. Paravisini, S. Sánchez-Ramón, and R. Suárez Fernández,
“Omalizumab for the treatment of atopic dermatitis,” Actas
Dermosifiliogr, vol. 103, no. 7, pp. 624–628, Sep. 2012.
[38] C. V. Leier, T. J. Patrick, J. Hermiller, K. D. Pacht, P. Huss, R. D.
Magorien, and D. V. Unverferth, “Nifedipine in congestive heart
failure: effects on resting and exercise hemodynamics and regional
blood flow,” Am. Heart J., vol. 108, no. 6, pp. 1461–1468, Dec. 1984.
[39] J. R. Diamond, J. Y. Cheung, and L. S. Fang, “Nifedipine-induced
renal dysfunction. Alterations in renal hemodynamics,” Am. J. Med.,
vol. 77, no. 5, pp. 905–909, Nov. 1984.
[40] C. M. Rotella, A. Zaninelli, C. Le Grazie, M. E. Hanson, and G. F.
Gensini, “Ezetimibe/simvastatin vs simvastatin in coronary heart
disease patients with or without diabetes,” Lipids Health Dis, vol. 9, p.
80, Jul. 2010.
[41] E. Nizankowska, J. Soja, G. Pinis, G. Bochenek, K. Sładek, B.
Domagała, A. Pajak, and A. Szczeklik, “Treatment of steroid-
dependent bronchial asthma with cyclosporin,” Eur. Respir. J., vol. 8,
no. 7, pp. 1091–1099, Jul. 1995.
... Due to the huge costs and excessive amount of time involved in developing new drugs, it is regarded as a better alternative. Several studies [18,19,21,23,25] generated a list of potential drug-disease pairs by using drug-gene and genedisease semantic predications. Phenotypes and symptoms have also been used as the linking concept between drug and disease [16]. ...
... Phenotypes and symptoms have also been used as the linking concept between drug and disease [16]. Some studies have used knowledge-graph based drug discovery methods [18][19][20]. ...
... Relatedness between a pair of concepts is then derived based on the similarity between the vectors. Various distributional semantic techniques which have been proposed include Semantic Predications [18,25,30], Latent Semantic Analysis (LSA) [37], Predication-based Semantic Indexing (PSI) [28] and composite feature vectors [29]. Mower et al. [29] have shown that distributional models perform better than co-occurrence-based models. ...
Article
PurposeLiterature-Based Discovery (LBD) is a text mining technique used to generate novel hypotheses from vast amounts of literature sources, by identifying links between concepts from disparate sources. One of the main areas where it has been predominantly applied is the healthcare domain, whereby promising results, in the form of novel hypotheses, have been reported. The purpose of this work was to conduct a systematic literature review of recent publications on LBD in the healthcare domain in order to assess the trends in the approaches used and to identify issues and challenges for such systems.Methods The review was conducted following the principles of the Kitchenham method. The selected studies have been scrutinized and the derived findings have been reported following the PRISMA guidelines.ResultsThe review results reveal useful information regarding the application areas, the data sources considered, the approaches used, the performance in terms of accuracy and reliability and future research challenges. The results of this review will be beneficial to LBD researchers and other stakeholders in the healthcare domain, by providing them with useful insights on the approaches to adopt, data sources to consider, evaluation model to use and challenges to reflect on.Conclusion The synthesis of the results of this work has shed light on recent issues and challenges that drive new LBD models and provides avenues for their application in other diverse areas in the healthcare domain. To the best of our knowledge, no such recent review has been conducted.
... The primary application areas of LBD have been for drug development (Hu et al., 2003;Hristovski et al., 2010;Zhang et al., 2014), drug repurposing (Ahlers et al., 2007;Baker, 2010;Deftereos et al., 2011;Cohen et al., 2014;Zhang et al., 2014;Rastegar-Mojarad et al., 2015Yang et al., 2017;Zhang et al., 2020), and adverse drug event prediction (Deftereos et al., 2011;Banerjee et al., 2014;Shang et al., 2014;Hristovski et al., 2016;Mower et al., 2016). Although we were unable to find prior studies focused specifically on applying LBD to metabolomic knowledge discovery, biochemical pathways are frequently an area of investigation for new drug development. ...
... The initial A to B and B to C relationships are identified based on explicit relationships in text. These relationships may be based on co-occurrences (Swanson and Smalheiser, 1997;Weeber et al., 2001;Yetisgen-Yildiz and Pratt, 2006), semantics (Hristovski et al., 2006;Preiss et al., 2015;Rastegar-Mojarad et al., 2015), vector operations (Gordon and Dumais, 1998;Bruza et al., 2004;Cohen et al., 2010Cohen et al., , 2011, or other methods Goodwin et al., 2012;Workman et al., 2016). ...
... However, there are a huge variety of ranking methods (Henry and McInnes, 2017). These include cooccurrence measures (Gordon and Lindsay, 1996;Swanson and Smalheiser, 1997;Hristovski et al., 2001Hristovski et al., , 2005Swanson et al., 2006), statistical measures (Wren, 2004;Yetisgen-Yildiz and Pratt, 2009;Rastegar-Mojarad et al., 2015;, vector-based measures (Gordon and Dumais, 1998;Bruza et al., 2004), and graph-based measures (Wilkowski et al., 2011;Eronen and Toivonen, 2012). ...
Article
Full-text available
In this paper, we describe how we applied LBD techniques to discover lecithin cholesterol acyltransferase (LCAT) as a druggable target for cardiac arrest. We fully describe our process which includes the use of high-throughput metabolomic analysis to identify metabolites significantly related to cardiac arrest, and how we used LBD to gain insights into how these metabolites relate to cardiac arrest. These insights lead to our proposal (for the first time) of LCAT as a druggable target; the effects of which are supported by in vivo studies which were brought forth by this work. Metabolites are the end product of many biochemical pathways within the human body. Observed changes in metabolite levels are indicative of changes in these pathways, and provide valuable insights toward the cause, progression, and treatment of diseases. Following cardiac arrest, we observed changes in metabolite levels pre- and post-resuscitation. We used LBD to help discover diseases implicitly linked via these metabolites of interest. Results of LBD indicated a strong link between Fish Eye disease and cardiac arrest. Since fish eye disease is characterized by an LCAT deficiency, it began an investigation into the effects of LCAT and cardiac arrest survival. In the investigation, we found that decreased LCAT activity may increase cardiac arrest survival rates by increasing ω-3 polyunsaturated fatty acid availability in circulation. We verified the effects of ω-3 polyunsaturated fatty acids on increasing survival rate following cardiac arrest via in vivo with rat models.
... Generally speaking, we can group drug repurposing approaches into three distinct groups: text-mining approaches [15][16][17][18][19][20][21], semantics-based approaches [22][23][24], and finally network-based approaches [25][26][27][28][29][30][31][32][33][34][35][36]. The latter takes into the account the relationship and interactions between genes in their corresponding pathways. ...
... Also, we limit the number of pairs to a maximum of 100 top pairs of such drugs, if there is more than 100 pairs with better score than the best single drug. Moreover, Tables 13,14,15,16,17,18,19,20,21 and 22 show top ten pairs of drugs for each of the ten breast cancer subtypes. Observing these tables, we infer that many of the top ranked pairs of drugs contain at least one individual top ranked drug, though there are some notable exceptions. ...
Article
Full-text available
‘De novo’ drug discovery is costly, slow, and with high risk. Repurposing known drugs for treatment of other diseases offers a fast, low-cost/risk and highly-efficient method toward development of efficacious treatments. The emergence of large-scale heterogeneous biomolecular networks, molecular, chemical and bioactivity data, and genomic and phenotypic data of pharmacological compounds is enabling the development of new area of drug repurposing called ‘in silico’ drug repurposing, i.e., computational drug repurposing (CDR). The aim of CDR is to discover new indications for an existing drug (drug-centric) or to identify effective drugs for a disease (disease-centric). Both drug-centric and disease-centric approaches have the common challenge of either assessing the similarity or connections between drugs and diseases. However, traditional CDR is fraught with many challenges due to the underlying complex pharmacology and biology of diseases, genes, and drugs, as well as the complexity of their associations. As such, capturing highly non-linear associations among drugs, genes, diseases by most existing CDR methods has been challenging. We propose a network-based integration approach that can best capture knowledge (and complex relationships) contained within and between drugs, genes and disease data. A network-based machine learning approach is applied thereafter by using the extracted knowledge and relationships in order to identify single and pair of approved or experimental drugs with potential therapeutic effects on different breast cancer subtypes. Indeed, further clinical analysis is needed to confirm the therapeutic effects of identified drugs on each breast cancer subtype.
... Unsupervised systems extract relationships without the need for annotated text. These approaches utilize linguistic patterns such as the frequency of two entities appearing in a sentence together more often than chance, commonly referred to as co-occurrence [10][11][12][13][14][15][16][17][18]. For example, a possible system would say gene X is associated with disease Y because gene X and disease Y appear together more often than chance [10]. ...
Article
Full-text available
Background Knowledge graphs support biomedical research efforts by providing contextual information for biomedical entities, constructing networks, and supporting the interpretation of high-throughput analyses. These databases are populated via manual curation, which is challenging to scale with an exponentially rising publication rate. Data programming is a paradigm that circumvents this arduous manual process by combining databases with simple rules and heuristics written as label functions, which are programs designed to annotate textual data automatically. Unfortunately, writing a useful label function requires substantial error analysis and is a nontrivial task that takes multiple days per function. This bottleneck makes populating a knowledge graph with multiple nodes and edge types practically infeasible. Thus, we sought to accelerate the label function creation process by evaluating how label functions can be re-used across multiple edge types. Results We obtained entity-tagged abstracts and subsetted these entities to only contain compounds, genes, and disease mentions. We extracted sentences containing co-mentions of certain biomedical entities contained in a previously described knowledge graph, Hetionet v1. We trained a baseline model that used database-only label functions and then used a sampling approach to measure how well adding edge-specific or edge-mismatch label function combinations improved over our baseline. Next, we trained a discriminator model to detect sentences that indicated a biomedical relationship and then estimated the number of edge types that could be recalled and added to Hetionet v1. We found that adding edge-mismatch label functions rarely improved relationship extraction, while control edge-specific label functions did. There were two exceptions to this trend, Compound-binds-Gene and Gene-interacts-Gene, which both indicated physical relationships and showed signs of transferability. Across the scenarios tested, discriminative model performance strongly depends on generated annotations. Using the best discriminative model for each edge type, we recalled close to 30% of established edges within Hetionet v1. Conclusions Our results show that this framework can incorporate novel edges into our source knowledge graph. However, results with label function transfer were mixed. Only label functions describing very similar edge types supported improved performance when transferred. We expect that the continued development of this strategy may provide essential building blocks to populating biomedical knowledge graphs with discoveries, ensuring that these resources include cutting-edge results.
... Since its inception, LBD has been applied to drug repurposing in several contexts. In recent applications it has been applied to identify general [36] or cancer-specific [37] therapeutics for repurposing based on common or proximal gene/protein targets under the ABC framework. Using a KG completion framework, LBD has been applied to NLP-extracted KGs to identify repurposing opportunities for prostate cancer drugs based on gene-mediated motifs in the network [38]. ...
Article
Full-text available
The cost of drug development continues to rise and may be prohibitive in cases of unmet clinical need, particularly for rare diseases. Artificial intelligence-based methods are promising in their potential to discover new treatment options. The task of drug repurposing hypothesis generation is well-posed as a link prediction problem in a knowledge graph (KG) of interacting of drugs, proteins, genes and disease phenotypes. KGs derived from biomedical literature are semantically rich and up-to-date representations of scientific knowledge. Inference methods on scientific KGs can be confounded by unspecified contexts and contradictions. Extracting context enables incorporation of relevant pharmacokinetic and pharmacodynamic detail, such as tissue specificity of interactions. Contradictions in biomedical KGs may arise when contexts are omitted or due to contradicting research claims. In this review, we describe challenges to creating literature-scale representations of pharmacological knowledge and survey current approaches toward incorporating context and resolving contradictions.
... Other text mining approaches on literature-based discovery for drug repurposing, e.g. [47][48][49], also provide rankings for the extracted drug disease associations evaluated by comparing them to factual databases, e.g. the Comparative Toxicogenomics Database (CTD) [50,51], or to expert judgment on the significance of the involved biological pathways. These types of evaluation follow the assumption that biological interactions, e.g. ...
Article
Full-text available
Background Drug repurposing can improve the return of investment as it finds new uses for existing drugs. Literature-based analyses exploit factual knowledge on drugs and diseases, e.g. from databases, and combine it with information from scholarly publications. Here we report the use of the Open Discovery Process on scientific literature to identify non-explicit ties between a disease, namely epilepsy, and known drugs, making full use of available epilepsy-specific ontologies. Results We identified characteristics of epilepsy-specific ontologies to create subsets of documents from the literature; from these subsets we generated ranked lists of co-occurring neurological drug names with varying specificity. From these ranked lists, we observed a high intersection regarding reference lists of pharmaceutical compounds recommended for the treatment of epilepsy. Furthermore, we performed a drug set enrichment analysis, i.e. a novel scoring function using an adaptive tuning parameter and comparing top-k ranked lists taking into account the varying length and the current position in the list. We also provide an overview of the pharmaceutical space in the context of epilepsy, including a final combined ranked list of more than 70 drug names. Conclusions Biomedical ontologies are a rich resource that can be combined with text mining for the identification of drug names for drug repurposing in the domain of epilepsy. The ranking of the drug names related to epilepsy provides benefits to patients and to researchers as it enables a quick evaluation of statistical evidence hidden in the scientific literature, useful to validate approaches in the drug discovery process.
Article
High cost and risks are common issues in traditional drug research and development. Usually, it takes a long time to research and develop a drug, the effects of which are limited to relatively few targets. At present, studies are aiming to identify unknown new uses for existing drugs. Drug repositioning enables drugs to be quickly launched into clinical practice at a low cost because they have undergone clinical safety testing during the development process, which can greatly reduce costs and the risks of failed development. In addition to existing drugs with known indications, drugs that were shelved because of clinical trial failure can also be options for repositioning. In fact, many widely used drugs are identified via drug repositioning at present. This article reviews some popular research areas in the field of drug repositioning and briefly introduces the advantages and disadvantages of these methods, aiming to provide useful insights into future development in this field.
Article
Full-text available
Objective: To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. Methods: We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from both PubMed and COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant, and used this subset to construct a knowledge graph. Five SOTA, neural knowledge graph completion algorithms were used to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. Results: Accuracy classifier based on PubMedBERT achieved the best performance (F1= 0.854) in classifying semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, Hits@1=0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as some candidate drugs that have not yet been studied. Discovery patterns enabled generation of plausible hypotheses regarding the relationships between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (paclitaxel, SB 203580, alpha 2-antiplasmin, pyrrolidine dithiocarbamate, and butylated hydroxytoluene) with their mechanistic explanations were further discussed. Conclusion: We show that an LBD approach can be feasible for discovering drug candidates for COVID-19, and for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions.
Article
Objective To discover candidate drugs to repurpose for COVID-19 using literature-derived knowledge and knowledge graph completion methods. Methods We propose a novel, integrative, and neural network-based literature-based discovery (LBD) approach to identify drug candidates from PubMed and other COVID-19-focused research literature. Our approach relies on semantic triples extracted using SemRep (via SemMedDB). We identified an informative and accurate subset of semantic triples using filtering rules and an accuracy classifier developed on a BERT variant. We used this subset to construct a knowledge graph, and applied five state-of-the-art, neural knowledge graph completion algorithms (TransE, RotatE, DistMult, ComplEx, and STELP) to predict drug repurposing candidates. The models were trained and assessed using a time slicing approach and the predicted drugs were compared with a list of drugs reported in the literature and evaluated in clinical trials. These models were complemented by a discovery pattern-based approach. Results Accuracy classifier based on PubMedBERT achieved the best performance (F1 = 0.854) in classifying semantic predications. Among five knowledge graph completion models, TransE outperformed others (MR = 0.923, [email protected] = 0.417). Some known drugs linked to COVID-19 in the literature were identified, as well as others that have not yet been studied. Discovery patterns enabled identification of additional candidate drugs and generation of plausible hypotheses regarding the links between the candidate drugs and COVID-19. Among them, five highly ranked and novel drugs (paclitaxel, SB 203580, alpha 2-antiplasmin, metoclopramide, and oxymatrine) and the mechanistic explanations for their potential use are further discussed. Conclusion We showed that a LBD approach can be feasible not only for discovering drug candidates for COVID-19, but also for generating mechanistic explanations. Our approach can be generalized to other diseases as well as to other clinical questions. Source code and data are available at https://github.com/kilicogluh/lbd-covid.
Conference Paper
Full-text available
Relation extraction typically involves extraction of relations between two or more entities occurring within a single or multiple sentences. The current state of the art techniques predominantly involve the extraction of relations only from a single sentence (i.e., sentence-level relation extraction). In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.
Conference Paper
Full-text available
Relation extraction typically involves the extraction of relations between two or more entities occurring within a single or multiple sentences. In this study, we investigated the significance of extracting information from multiple sentences specifically in the context of drug-disease relation discovery. We used multiple resources such as Semantic Medline, a literature based resource, and Medline search (for filtering spurious results) and inferred 8,772 potential drug-disease pairs. Our analysis revealed that 6,450 (73.5%) of the 8,772 potential drug-disease relations did not occur in a single sentence. Moreover, only 537 of the drug-disease pairs matched the curated gold standard in Comparative Toxicogenomics Database (CTD), a trusted resource for drug-disease relations. Among the 537, nearly 75% (407) of the drug-disease pairs occur in multiple sentences. Our analysis revealed that the drug-disease pairs inferred from Semantic Medline or retrieved from CTD could be extracted from multiple sentences in the literature. This highlights the significance of the need of discourse-level analysis in extracting the relations from biomedical literature.
Conference Paper
Full-text available
Scientific literature is one of the popular resources for providing decision support at point of care. It is highly desirable to bring the most relevant literature to support the evidence-based clinical decision making process. Motivated by the recent advance in semantically enhanced information retrieval, we have developed a system, which aims to bring semantically enriched literature, Semantic Medline, to meet the information needs at point of care. This study reports our work towards operationalizing the system for real time use. We demonstrate that the migration of a relational database implementation to a NoSQL (Not only SQL) implementation significantly improves the performance and makes the use of Semantic Medline at point of care decision support possible.
Article
Latent semantic indexing (LSI) is a statistical technique for improving information retrieval effectiveness. Here, we use LSI to assist in literature-based discoveries. The idea behind literature-based discoveries is that different authors have already published certain underlying scientific ideas that, when taken together, can be connected to hypothesize a new discovery, and that these connections can be made by exploring the scientific literature. We explore latent semantic indexing's effectiveness on two discovery processes: uncovering "nearby" relationships that are necessary to initiate the literature based discovery process; and discovering more distant relationships that may genuinely generate new discovery hypotheses.
Conference Paper
Drug development is time-consuming, costly, and risky. Approximate 80% to 90% of drug development projects fail before they ever get into clinical trials. To reduce the high risk of failure for drug development, pharmaceutical companies are exploring the drug repositioning approach for drug development. Previous studies have shown the feasibility of using computational methods to help extract plausible drug repositioning candidates, but they all encountered some limitations. In this study, we propose a novel drug-repositioning discovery method that takes into account multiple information sources, including more than 18,000,000 biomedical research articles and some existing ontologies that cover detailed relations between drugs, proteins and diseases. We design two experiments to evaluate our proposed drug repositioning discovery method. Overall, our evaluation results demonstrate the capability and superiority of our proposed drug repositioning method for discovering potential, novel drug-disease relationships.
Article
Nifedipine caused acute, reversible deterioration in renal function in four patients with chronic renal insufficiency. The absence of hypotension, clinical course, benign urinary sediments, and normal results of renal ultrasound examinations excluded acute tubular necrosis, pyelonephritis, interstitial nephritis, obstructive uropathy, and acute glomerulonephritis. It is postulated that this slow calcium channel blocker produced deleterious intrarenal hemodynamic alterations in the setting of moderate to severe renal functional impairment. Nifedipine may alter renal function by blocking calcium entry into renal vascular smooth muscle, thereby reducing the efficacy of vasoconstrictor hormones in regulation of renal blood flow and glomerular filtration rate. An alternative explanation is that nifedipine may inhibit the compensatory synthesis of vasodilatory prostaglandin E2 analogous to the clinical observation of acute deterioration in renal function by nonsteroidal anti-inflammatory drugs in patients with pre-existing renal insufficiency. These observations suggest that clinicians should monitor renal function closely and exercise caution when administering nifedipine to patients with underlying renal insufficiency.
Article
Drug repositioning is the process of developing new indications for existing drugs or biologics. Increasing interest in drug repositioning has occurred due to sustained high failure rates and costs involved in attempts to bring new drugs to market, It has been estimated that it may cost more than USD 800 million to develop a new drug de novo. In addition, due to regulatory requirements regarding safety, efficacy and quality, the time required to develop a new drug de novo has been estimated to be 10 to 17 years. De novo drug discovery has failed to efficiently supply pharmaceutical company pipelines. A rational approach to drug repositioning may include a cross-disciplinary focus on the elucidation of the mechanisms of disease, allowing matching of disease pathways with appropriately targeted therapeutic agents. Repurposed drugs or biologics have the advantage of decreased development costs and decreased time to Launch due to previously collected pharmacokinetic, toxicology and safety data. For these reasons, repurposing should be a primary strategy in drug discovery for every broadly focused, research-based pharmaceutical company. Copyright © 2009 Prous Science, S.A.U. or its licensors. All rights reserved.
Article
Results from large-scale phenome-wide association studies (PheWAS) allow association of genetic variants with a wide spectrum of human disorders and have provided considerable insight into disease etiologies. The PheWAS strategy relies on electronically available phenotypic data collected from patient cohorts. PheWAS is similar to a genome-wide association study…