ArticlePDF AvailableLiterature Review

Explainability and white box in drug discovery

Authors:

Abstract

Recently, artificial intelligence (AI) techniques have been increasingly used to overcome the challenges in drug discovery. Although traditional AI techniques generally have high accuracy rates, there may be difficulties in explaining the decision process and patterns. This can create difficulties in understanding and making sense of the outputs of algorithms used in drug discovery. Therefore, using explainable AI (XAI) techniques, the causes and consequences of the decision process are better understood. This can help further improve the drug discovery process and make the right decisions. To address this issue, Explainable Artificial Intelligence (XAI) emerged as a process and method that securely captures the results and outputs of machine learning (ML) and deep learning (DL) algorithms. Using techniques such as SHAP (SHApley Additive ExPlanations) and LIME (Locally Interpretable Model-Independent Explanations) has made the drug targeting phase clearer and more understandable. XAI methods are expected to reduce time and cost in future computational drug discovery studies. This review provides a comprehensive overview of XAI-based drug discovery and development prediction. XAI mechanisms to increase confidence in AI and modeling methods. The limitations and future directions of XAI in drug discovery are also discussed.
Chem Biol Drug Des. 2023;00:1–17. wileyonlinelibrary.com/journal/cbdd
|
1
© 2023 John Wiley & Sons Ltd.
1
|
INTRODUCTION
The drug discovery process is very complex and chal-
lenging, with a low success rate. Therefore, an interdis-
ciplinary effort is required for effective commercial drug
design and development. This process includes identify-
ing a therapeutic drug molecule that is therapeutically
effective and useful in treating and maintaining the dis-
ease state. The drug discovery and development process
provides target molecule identification, synthesis, charac-
terization, screening, and therapeutic measurements. One
molecule is selected from every 2.3 million compounds
that target the research project in this process. Preclinical,
clinical, and post- clinical drug discovery and development
researches require high budgets and advanced technolo-
gies. The average cost for effective drug research and de-
velopment ranges from $900 million to $2 billion (Zeng
et al.,2022). The time from the discovery of a drug pro-
duced for the treatment of a disease to its release on the
market takes an average of 12– 15 years (Deore et al.,2019).
Also, the success rate of launching a drug from a Phase I
clinical trial is daunting, less than 10% (Deng et al.,2022).
In the last decade, drug discovery has been undergoing
radical transformations driven by rapid development
in artificial intelligence (AI) (Chen et al.,2018; Mater &
Coote, 2019; Schneider,2018; Vamathevan et al.,2019).
Received: 12 November 2022
|
Revised: 24 March 2023
|
Accepted: 12 April 2023
DOI: 10.1111/cbdd.14262
REVIEW
Explainability and white box in drug discovery
Kevser KübraKırboğa1,2
|
SumraAbbasi3
|
Ecir UğurKüçüksille4
1Bioengineering Department, Bilecik
Seyh Edebali University, Bilecik, Turkey
2Informatics Institute, Istanbul
Technical University, Maslak, Turkey
3Department of Biological Sciences,
National of Medical Sciences,
Rawalpindi, Pakistan
4Department of Computer Engineering,
Süleyman Demirel University, Isparta,
Turkey
Correspondence
Kevser Kübra Kırboğa, Bioengineering
Department, Bilecik Seyh Edebali
University, Bilecik, Turkey.
Email: kubra.kirboga@yahoo.com
Abstract
Recently, artificial intelligence (AI) techniques have been increasingly used to
overcome the challenges in drug discovery. Although traditional AI techniques
generally have high accuracy rates, there may be difficulties in explaining the
decision process and patterns. This can create difficulties in understanding and
making sense of the outputs of algorithms used in drug discovery. Therefore,
using explainable AI (XAI) techniques, the causes and consequences of the deci-
sion process are better understood. This can help further improve the drug dis-
covery process and make the right decisions. To address this issue, Explainable
Artificial Intelligence (XAI) emerged as a process and method that securely cap-
tures the results and outputs of machine learning (ML) and deep learning (DL)
algorithms. Using techniques such as SHAP (SHApley Additive ExPlanations)
and LIME (Locally Interpretable Model- Independent Explanations) has made
the drug targeting phase clearer and more understandable. XAI methods are ex-
pected to reduce time and cost in future computational drug discovery studies.
This review provides a comprehensive overview of XAI- based drug discovery and
development prediction. XAI mechanisms to increase confidence in AI and mod-
eling methods. The limitations and future directions of XAI in drug discovery are
also discussed.
KEYWORDS
artificial intelligence, computational drug discovery, drug development, explainable artificial
intelligence
2
|
KIRBOĞA et al.
Popular applications of AI in drug discovery include vir-
tual screening (Stumpfe & Bajorath,2020), reaction pre-
diction (Boström et al.,2018), and de novo drug design
(Schneider et al.,2020) retrosynthesis (Deng et al.,2022;
Table1). These applications are powered by various AI
techniques, with model designs spanning from common
ML models to deep neural networks (DNN), including
convolutional neural networks, recurrent neural net-
works, graph neural networks, and transformers, as well
as other types of networks. In this review, we aim to pro-
vide a comprehensive overview of recent XAI (eXplain-
able Artificial Intelligence) research, highlighting its
benefits, limitations, and future opportunities for drug
discovery. We first give an overview of critical applications
in drug discovery and highlight a collection of previously
published perspectives, reviews, and surveys. Relevant
techniques, including model architectures and learning
paradigms, will be detailed with information on data and
representations. Finally, we discuss current challenges
and highlight some future aspects.
2
|
AI AND XAI IN DRUG
DISCOVERY
AI significantly impacts the creation of small- molecule
medications thanks to access to new biology, improved or
original chemistry, increased success rates, and speedier
and less expensive discovery processes (Mak et al.,2022).
AI- native drug discovery businesses that offer software or
other services to pharmaceutical corporations have been
primarily responsible for historical advancement. At vari-
ous points throughout the value chain, these businesses
employ data and analytics to enhance one or more spe-
cific use cases. Examples include small molecule design
using generative neural networks and target finding and
validation using knowledge graphics. For instance, large
pharmaceutical firms might use partnerships or soft-
ware licensing agreements to gain access to these capa-
bilities and integrate them into their pipelines. Use of AI
in drug discovery Since the early 2000s, machine learn-
ing models such as random forest (RF) has been used for
virtual screening (VS) and Quantitative structure– activity
relationship (QSAR; Lavecchia, 2015; Ma et al., 2015).
Wójcikowski et al. discussed using the R and Python
programming languages, features, and regression mod-
els and creating an RF Score based on the RF technique
(Wójcikowski et al.,2019).
Potent inhibitors of fast discoid domain receptor
1 (DDR1) were discovered in a short time by Insilico
Medicine researchers (Zhavoronkov et al.,2019). At the
same time, AI is used in various applications at different
stages of drug discovery, from target identification and
validation to drug response determination. For instance,
MIT scientists discovered a novel drug candidate against
antibiotic- resistant bacteria in 2020 (Stokes et al.,2020).
Many AI- native drug development businesses have
expanded their end- to- end capabilities in the past several
years. For instance, Atomwise and Schrödinger estab-
lished a joint company with a shared portfolio to unite
the many platform technologies, while Roivant Sciences
bought Silicon Therapeutics (Savage, 2021). IT behe-
moths' internal resources and investments, which are also
actively increasing their AI efforts in biology and pharma-
ceutical research, are significant. For instance, Alphabet
has established Isomorphic Labs, building on AI innova-
tions in the DeepMind AI organization. Another essential
work supporting DeepMind was to produce structure pre-
dictions with accuracies approaching those of DeepMind
in the 14th Critical Assessment of Structure Prediction
(CASP14), using the Three- way network, enabling rapid
resolution of challenging structure modeling problems,
and identifying proteins with current unknown struc-
tures (Baek et al.,2021). With Baidu's AI drug discovery
division, Sanofi, Nvidia Clara has significantly invested in
several AI technologies and applications (Savage,2021).
AI's impact on traditional drug discovery is in its early
stages. Still, we have already seen that when layered into
a conventional process, AI- powered capabilities can dra-
matically speed up or improve individual steps and reduce
the costs of running expensive experiments. AI algorithms
can potentially transform most exploratory tasks (such as
molecule design and testing) so that physical experiments
only need to be done when necessary to validate results.
Companies controlling the entire AI- powered discovery
process emphasize that the IP supports their assets. They
benefit from a network of partners, including CROs and
contract development and manufacturing companies,
but retain ownership of the molecule. Their investments
have potentially significant commercial value through de-
licensing, joint ventures (typically after clinical proof of
concept), and therapeutic marketing. In addition, the ma-
turing AI- first model has accelerated the transition among
AI- native players from software or service providers to
asset- owning biotechnology.
In addition to its relationship with the private sector,
many studies continue to be conducted on the use of AI
in drug discovery. Therefore, when computational biol-
ogists focus on the identification and discovery of new
drugs, network- based biology analysis algorithms can
be therapeutic for cancer from molecular networks such
as protein– protein interaction networks (Li et al.,2017),
gene regulatory networks (Karlebach & Shamir, 2008),
metabolic networks (Stelling et al.,2002), and drug– drug
interaction networks (Hu & Hayton,2011) and drugdrug
interaction networks (Table1). Targets can be identified
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
3
KIRBOĞA et al.
TABLE  Use of explainable artificial intelligence in categorized drug studies.
Application Algoritms Technique References
Adverse drug reactions CART decision trees and JRip Boosting based feature selection Bresso et al.(2021)
Adverse drug reactions Random forests (RF), extra
randomTrees (ET), and
eXtreme Gradient Boosting
machines (XGB)
LIME and shapley values Ward et al.(2021)
Adverse drug reactions Support vector machine (SVM) Shapley values Joshi et al.(2021)
Adverse drug reactions Reccurrent neural network (RNN) Shapley values Rebane et al.,2021)
Adverse drug reactions XGB Shapley values Zhu et al.(2022)
Adverse drug reactions XGB, CatBoost, AdaBoost,
LightGBM, RF, gradient
boosting decision tree (GBDT),
TPOT
Shapley values Yu et al.(2021)
Adverse drug reactions GBDT Shapley values Imran et al.(2022)
Drug repurposing Knowledge graph Drancé(2022)
Drug repurposing Knowledge graph Knowledge- based embeddings He et al.(2022)
Drug repurposing Graphical neural network (GNN) Drug explorer (meta matrix) Wang et al.(2022)
Drug repurposing Knowledge graph Integrated gradients (IG) Atsuko Takagi et al.(2022)
Drug- disease and drug-
target interaction
Knowledge graph Wang et al.(2021)
Drug- disease and drug-
target interaction
GNN GNN explainer Pfeifer et al.(2022)
Drug- disease and drug-
target interaction
Knowledge graphs Knowledge graph Embedding model Zeng et al.(2020)
Drug– drug interactions Knowledge graph neural network
(KGNN)
Lin et al.(2020)
Drug– drug interactions Naive Bayes (NB), decision tree
(DT), RF, logistic regression
(LR), and XGB
Shapley values Dang et al.(2021)
Drug– drug interactions RF and XGBoost Shapley values Hung et al.(2022)
Drug design DT, RF, SVM, XGB, KNN, ANN,
RIPPER, RLF
Permutation importance, LIME,
Shapley values, Integrated
gradients, Diverse Counterfactual
Explanations (DiCE), Partial
Dependence Plot (PDP) + Individual
Conditional Expectation (ICE),
Accumulated Local Effects (ALE)
Banegas- Luna and
Pérez- Sánchez(2022)
Drug design Extreme gradient boosting (GB) Shapley values Vangala et al.(2022)
Drug design Naïve Bayes, SVM, tree Shapley values Wojtuch et al.(2021)
Drug design XGBoost (XGB), k- nearest
neighbor (KNN), extra trees
classifier (ETC), support vector
machine (SVM), and Adaboost
(ADA)
Shapley values Akbar et al.(2022)
Drug design Distributed random forest (DRF),
extremely randomized trees
(XRT), generalizedlinear model
(GLM), XGB, gradient boosting
machine (GBM), multilayer
artificial neural network, and
stacked ensemble models
Shapley values Czub et al.(2021)
(Continues)
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4
|
KIRBOĞA et al.
XAI development frequently aims to make AI intelligi-
ble to humans. All technological ways of understanding,
such as direct interpretability, the generation of an expla-
nation or justification, and the provision of transparent
information, fall within the expansive definition of XAI
(Páez, 2019). Scientists occasionally get impasses when
defining explicability and related concepts like transpar-
ency, interpretability, and intelligibility. We feel that a
diverse group of experts, including biologists, computer
scientists, doctors, and nurses, should be included as we
focus on XAI applications in the drug development pro-
cess since users frequently want a comprehensive knowl-
edge of the system and its behavior. The phrase “XAI” was
first used about expert systems 50 years ago. The adoption
of ML technology has propelled us into its second phase.
High- level XAI approaches may be divided into two cate-
gories (Guidotti et al.,2019; Lipton,2016).
1. Select simpler and easier- to- understand models, such
as a decision tree, a rule- based model, or a linear
regression.
2. Deciding on a complicated, opaque model (sometimes
known as a “black box model”), such as DNN and siz-
able ensembles of trees, and then employing a post- hoc
approach to produce explanations.
The term “performance- interpretability tradeoff” is oc-
casionally used to refer to the decision between the two
since opaque models frequently outperform transparent
ones at various tasks. However, this decision is not always
the best one, though. Research has demonstrated that
explicitly interpretable models may perform on par with
opaque models, especially when given well- structured
datasets and significant characteristics (Rudin,2019).
Additionally, a current field of XAI research is on
creating new algorithms with interpretable features and
beneficial to performance. In these cases, post- hoc XAI
techniques must be used to make the models explainable.
For the disclosure, (Guidotti et al.,2019) categorize post-
hoc XAI techniques as global model explanations on the
general logic of the model, result explanations that focus
on explaining a particular model output, and counterfac-
tual analysis that supports understanding how the model
will behave with an alternative input. Within these cat-
egories, XAI techniques usually produce feature- based
explanations to elucidate the inside of the model or case-
based explanations to support case- based reasoning.
Each category pertains to models that may be directly
interpreted. It should be highlighted. A shallow decision
tree may be stated simply as a generic statement, highlight-
ing a specific explanation for a forecast outcome or several
approaches to doing counterfactual analysis. As we shall
see in the examples below, it is far less straightforward in
opaque box models, which call for other post- hoc meth-
ods. A global model description gives a rough picture of
how the model functions because it is hard to comprehend
the intricate underlying structure of an opaque model.
This is often accomplished by using the same training data
to train an immediately interpretable basic model, such as
a decision tree, rule set, or regression, and then optimiz-
ing it to behave more like the original model— examples
of result descriptions. To explain the result of a predic-
tion made on a sample, a set of algorithms can be used
to estimate the significance of each sample feature that
contributes to the prediction. For example, LIME (Locally
Interpretable Model- independent Explanations; Ribeiro
et al.,2016a) starts by adding a small amount of noise to
the sample to create a set of neighboring samples; It fits a
simple linear model on these neighbors that mimics the
behavior of the original model in the local area. The linear
model weights can then be used as feature importance to
explain the estimation. Another popular algorithm, SHAP
Application Algoritms Technique References
Drug design Deep neural networks (DNN) Shapley values Fan et al.(2022)
Drug design SVM, XGBoost, RF, DNN, GCN,
GAT, MPNN
Shapley values Jiang et al.(2021)
Drug design Convolutional neural network
(CNN)
Shapley values Hosen et al.(2022)
Drug monitoring Support vector regression (SVR),
GBRT, RF
Shapley values Ma et al.(2022)
Drug monitoring RF Shapley values Bittremieux et al.(2022)
Drug monitoring LR, least absolute shrinkage and
selection operation regression,
classification and regression
trees, RF, and gradient boost
modeling (GBM)
Shapley values Lin et al.(2022)
TABLE  (Continued)
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
5
KIRBOĞA et al.
(SHapley Additive exPlanations; Lundberg & Lee,2017b),
identifies feature importance based on Shapley values,
inspired by collaborative game theory to assign credits
to each feature. Feature- importance descriptions can be
displayed to users by visualizing importance or simply
describing the essential features for forecasting. In other
words, rather than asking a detailed “why,” people are
more interested in questions like “why not a different es-
timate” or “how to tweak it to achieve a new estimate.
Such justifications are especially desired when seeking
a cure or recommendation for an already occurring, fre-
quently undesired event, such as strategies to lessen a
patient's anticipated high illness risk. Finally, it is possi-
ble to employ a variety of algorithms to generate tangi-
ble explanations. Ultimately, these algorithms frequently
identify the modifications needed for a sample to obtain a
different estimate, often using the idea of minimal change
(Dhurandhar et al.,2018; Lundberg & Lee,2017b).
The data for instance- based approaches frequently
include examples. The possible risk of adopting approxi-
mate post- hoc procedures for explanations rather than an
interpretable model has long been discussed. Approaches
invariably exclude certain edge situations or even fail to
calculate results exactly as the original model intended
(Dhurandhar et al.,2018). Furthermore, there is a prag-
matic debate concerning the many communication tools
individuals employ to gain “sufficient comprehension”
to achieve a given objective, in addition to the previously
cited practical reasons for using opaque box models.
Explores outlining a causal chain may be required to de-
velop a firm diagnosis of a condition. For instance, using
rough principles or case- based reasoning may be adequate
and less mentally taxing if one wants to make predictions.
It may also be claimed that methods are a required sort
of translation to connect model and person when they
have differing epistemic access. A new area of XAI study
focuses on creating human- consumable descriptions by
managing human descriptions (Codella et al.,2018; Ehsan
et al.,2019; Kim et al.,2018). This translates model rea-
soning into meaningful human explanations that apply
to the exact prediction. This type of explanation is a com-
plete guess. Still, it may be helpful to ordinary people who
have trouble understanding how machine learning (ML)
models work but want to get an idea of the validity of their
predictions. However, AI developers are responsible for
understanding, mitigating, and transparently communi-
cating the limitations of approximate disclosures to stake-
holders. For example, an explainability metric known as
fidelity can detect erroneous post- hoc disclosures (Alvarez-
Melis & Jaakkola,2018). However, we must acknowledge
that this is an actively researched topic and that there is
still a lack of principled approaches to identifying and
communicating the limitations of post- hoc explanations.
Drug discovery is a field that includes medicinal chem-
istry. In medicinal chemistry, XAI provides increased reli-
ability and interpretation of drug effects. Thanks to deep
learning (DL) models, the reliability of the determined
models has increased. It is possible to associate biological
effects with physicochemical effects and to derive accurate
and appropriate models according to this relationship.
Ultimately, XAI aims to reveal what is done, how it is done,
and related information in drug discovery (Holzinger
et al.,2022; Polzer et al.,2022). Given the importance of
explainability, XAI is emerging, a collection of AI methods
focused on generating outputs and recommendations that
human experts can understand and interpret. Currently,
the AI community focuses on developing XAI methods
that balance transparency, explainability, power, perfor-
mance, and accuracy (Gunning,2017).
2.1
|
SHAP
The SHAP method finds and prioritizes the characteris-
tics that affect how any ML model classifies compounds
and estimates their activities. By looking for a version
for the precise computation of Shapley values for deci-
sion tree techniques and rigorously contrasting this
variant with the model- free SHAP method in estima-
tions of compound activity and potential value, we now
advance our analysis of the SHAP approach to drug de-
velopment. Some studies present a theoretically novel
agnostic interpretation technique for ML models of any
complexity utilized for activity prediction. The SHAP ap-
proach (Lee et al.,2009) is an extension of LIME (Ribeiro
et al.,2016b), and accordingly, feature weights are rep-
resented as Shapley values from game theory (Kuhn &
Tucker, 1953). SHAP can interpret activity estimates
from complex ML models. Features that increase or de-
crease the probability of predictive activity are mapped
on molecular graphs to identify and visualize the struc-
tural patterns that determine the predictions (Feldmann
et al.,2021). In theory, it has been shown that the pay-
ment concepts correspond to the model estimate, and
these values represent the mean of all associated contri-
butions (Joseph,2019; Lundberg & Lee,2017a). Shapley
values, which have an essential role in improving the
explainability of ML models, enable the model to be
evaluated without specifying the functional forms of the
models. The functions and property variables in an ML
model are as follows:
where k denotes a single property variable, K denotes the
total number of explanatory variables available, n is the total
(2.1)
𝜙
(f(Xi)) 𝜙0+
Kk=1𝜙k(Xi),i=1, ,n
,
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6
|
KIRBOĞA et al.
number of units that should be. ϕ RK; ϕk R. Φk (Xi) is
the Shapley values of local functions.
As a general definition, Shapley values (SV) respond
by adding credibility to the model's complex decision-
making process. This way, it reveals more clearly how the
model uses its features.
In feature analysis and ML model interpretation, SVs
quantify the contributions of specific traits of a given rep-
resentation to a foreseeable result. For large feature sets,
the explicit calculation of SVs for all possible feature pair-
ings also becomes computationally expensive. This lim-
itation may be overcome using the SHAP model, which
generates a local interpretation model for each prediction
closely resembling the original ML model in the relevant
feature space regions (Lundberg & Lee,2017c).
Regardless of the complexity of an ML model, SHAP
calculations make it feasible to quantify the contributions
of certain chemical features to a successful or unsuccess-
ful prediction for a variety of compound activity prediction
tasks (Rodríguez- Pérez & Bajorath,2019, 2020b). Therefore,
SHAP can be applied to any ML algorithm, including DL
methods. Importantly, for decision tree methods, an algo-
rithm for the precise calculation of local SVs has recently
been presented (Lundberg et al.,2020). They showed that
SHAP and well- defined local SVs were strongly associ-
ated with predicting composite activity for both tree- based
classification and regression models (>80%; Feldmann
et al.,2021; Rodríguez- Pérez & Bajorath,2020b).
2.2
|
LIME
LIME is a technique that applies a local, interpretable
model to each prediction in any black box machine learn-
ing model. The models must be understandable by con-
sumers for them to trust AI systems. AI interpretability
sheds light on these systems' operations and aids in de-
tecting possible problems, including information leakage,
model bias, robustness, and causality. LIME offers a ge-
neric framework for deciphering black boxes and explains
the “why” behind predictions or suggestions made by AI.
LIME attempts to fit a straightforward model to a single
observation that will replicate the behavior of the global
model in that area. The predictions of the more sophisti-
cated model may then be locally explained using the basic
model (Ribeiro et al.,2016a).
2.3
|
Deficiency of AI and why do we
need XAI?
AI addresses various issues by supplying input data
samples and the predicted output from neural networks
(Gunning, 2017). Definite mathematical rules are used
systematically to alter network weights— think of them
as numerical knobs adjusted to achieve the desired out-
come. These values that translate an input query into an
output response are what the network truly learns. Image
categorization is a common application for neural net-
works (West,2018). For many datasets, the performance
of neural networks in this job is well documented. These
collections contain photos ranging from numbers to flora
and wildlife, people, vehicles, motorcycles, aeroplanes,
and other items (Goodman & Flaxman,2017). The image
classification process involves taking an input picture and
passing it through a trained network, which undergoes a
series of modifications to produce a single output class.
Natural language is another practical use of neural net-
works. Translation, language modeling, text categoriza-
tion, question answering, named entity identification, and
dependency parsing are all areas where neural networks
thrive. Google replaced traditional NLP approaches with
an LSTM- based algorithm for its translation service in late
2016. This free service is offered worldwide and is utilized
daily by over 200 million individuals (Castelvecchi,2016).
Although neural networks have been employed to tackle
various issues, it is unclear what the network truly learns
(Lipton, 2018). Attempts have been made to depict the
taught weights to give us an understanding of the obtained
information. However, we have yet to interpret the signifi-
cance of these adjusted knobs, classified as open research.
Employing these networks in real- world issues is danger-
ous without comprehending what information is genu-
inely obtained (Arrieta et al., 2020; Preece et al., 2018).
This issue is conceptually related to the XAI approach,
which is often considered necessary for applying AI mod-
els. For users to adequately comprehend, believe in, and
control powerful AI technologies, XAI is crucial (Gunning
et al.,2019).
2.3.1
|
Scale, growth, diversity
Since it has been studied for decades, XAI has had an
upsurge in popularity comparable to AI. The ability of
XAI to connect applications of AI and the people who
create or use them has also garnered much attention in
recent years. Several XAI support strategies have been
put forth, and it has been emphasized how vital XAI is
in human- machine contexts (Adadi & Berrada, 2018;
Guidotti et al., 2018; Rosenfeld et al., 2019). XAI has
been a recurring topic in other AI settings, such as expert
systems (Swartout et al., 1991), response set program-
ming (Fandinno et al.,2019), and planning (Chakraborti
et al.,2020). Data diversity, which relates to an algorith-
mic model's capacity to ensure that all types of objects
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
7
KIRBOĞA et al.
are represented in its output, has lately received much
attention (Drosou et al.,2017). Therefore, diversity may
be viewed as a metric of the caliber of a set of items that,
when they manifest as a model's output, can character-
ize the model's tendency to produce a spectrum of out-
puts rather than precise predictions. Diversity is crucial
in applications focusing on people, but ethical limitations
apply to AI modeling (Lerman,2013).
Similarly, several AI problems seek to generate differ-
ent recommendations rather than high- scoring but sim-
ilar outcomes (Agrawal et al.,2009). In such instances,
XAI techniques may be useful in identifying the model's
capabilities without compromising the diversity of input
data at its output. Learning techniques to give a model
diversity- keeping skills might be complemented by XAI
approaches to shed light on the model internals and assess
the effectiveness of such tactics regarding the diversity of
the data used to train the model. On the other hand, XAI
could make it simpler to spot the model components af-
fecting its ability to preserve variety.
2.3.2
|
Transparency
Black box to white box, inherent explanations, under-
standability, and comprehensibility are other phrases
that refer to transparency. These phrases allude to a de-
tailed explanation of the AI model's internal operations'
mathematical, statistical, and computational elements.
Mathematicians, computer scientists, or statisticians
might be able to comprehend this explanation. However,
it frequently serves no purpose for the doctor or the pa-
tient. The “transparency” of an AI in banking, for instance,
determining if a consumer qualifies for a loan, is based
on the equation ln(p(x)1p(x)) = 0 + 1 × 1. AI is predicated
on exchanging the model's interpretability for its perfor-
mance in terms of the precision of the AI's predictions
or homework assignments (Došilović et al.,2018). These
models, for instance, artificial neural networks (ANN) or
RFs, employ a significant number of nonlinear functions
as neurons or decision trees coupled by thousands of cou-
pling factors, such as synapses and weights, respectively,
which in DL can be organized in many layers. The many
thousands or millions of internal parameters for these
models can be optimized thanks to the power of modern
computers, up to and including current PCs, to approxi-
mate the functions that map the high- dimensional multi-
variate input data to multivalued outputs such as various
diagnostic classes. In this approach, explaining the indi-
vidual components (neurons, trees) precisely is possible,
but it is impossible to comprehend the system's aggregate
behavior. This is comparable to how it is impossible to
describe concepts or ideas through brain activity. This is
regarded as one of the essential characteristics of emer-
gent systems in systems theory (Zhang et al.,2018).
2.3.3
|
Justification
XAI approach aims to satisfy particular demands, ob-
jectives, expectations, and interests related to artificial
systems (Tjoa & Guan,2020). The concept of XAI was ini-
tiated in the computer sciences XAI was initially started
by the computer science community, who were building
the XAI approach to provide a technical solution to the
current issues (Arrieta et al.,2020). The rising adaptabil-
ity of AI provides one more complex layer of human and
computer interaction (HCI). At the same time, XAI plays a
significant role in the cognitive and behavioral dimensions
of AI- associated decision- making. XAI is a system of novel
inquiry (Tjoa & Guan,2020; Zhu et al., 2018). The cur-
rent era needs XAI despite several neural networks being
used to fix many issues. However, the core understanding
of the network is still contradictory. AI has several biases
in the dataset provided. Thus neural networking requires
transparency and adds justification to the generated pre-
dictions (Zhu et al.,2018). The broad spectrum applica-
tion of Explainable AI can be viewed through various
lenses. The approach depends upon humans' convincing,
as researchers feel more confident if multiple models sup-
port the same prediction. Moreover, the network should
be capable of generating reasoning to provide support to
the predicted features.
2.3.4
|
Informativeness
Whether the pharmacological activity can be derived
from molecular structure and whether components of
such structure are relevant characterizes the process of
developing novel medications. However, the added diffi-
culties and occasionally incorrectly stated issues by multi-
objective design lead to molecular architectures that
frequently serve as compromises. The practical method
reduces the syntheses and assays required to discover and
optimize novel hit and potent leads, mainly when con-
ducting laborious and costly experiments.
By enabling decision- making while concurrently con-
sidering medicinal chemistry expertise, model logic, and
awareness of the system's limits, XAI- assisted drug de-
sign is intended to help address some of these problems
(Liao et al.,2020). For instance, clinical decision support
systems that help doctors with diagnostic or therapeutic
duties require the informativeness of AI. These AI- based
systems are employed more often in clinical settings, with
a current emphasis on medical imaging. For example,
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8
|
KIRBOĞA et al.
magnetic resonance imaging may train DNNs to recog-
nize aberrant brain areas to detect Alzheimer's disease.
To make decisions utilizing these data more accessible,
DNNs were also implemented in clinical imaging (Zhang
et al.,2018). However, the precise processes by which a
diagnosis is determined for a specific patient are still un-
known, even though the findings of these analyses seem
plausible to a medical professional since they are congru-
ent with medical knowledge and, as such, might be com-
municated to the patient. DNNs and RFs are subsymbolic
classifiers in the sense described above; thus, a doctor can-
not fully understand and effectively explain to a patient
the hundreds or thousands of decision trees that make
up these systems. Informativeness aims to provide more
straightforward representations of what an AI performs
on the inside so that the user may learn more from this
abstraction (Carpenter & Huang,2018; Rudin,2019).
2.3.5
|
Uncertainty estimation
Another method of interpreting models that quantify
the epistemic error in a prediction is uncertainty esti-
mation. DNNs are ineffective at estimating uncertainty
compared to other ML techniques, such as Gaussian pro-
cesses (Nguyen et al.,2015; Rasmussen,2003). For this
reason, numerous initiatives have been made to quantify
uncertainty in predictions made using neural networks.
Predictions of opaque black- box systems are commonly
used in high- stakes applications, including banking,
healthcare, and criminal justice (Adadi & Berrada,2018).
Posthoc counterfactual explanations can give users valu-
able and actionable information about the inner work-
ings of black- box models (Byrne, 2019). While the XAI
community has offered numerous strategies for generat-
ing counterfactual explanations, much less attention has
been dedicated to investigating the uncertainty of these
explanations (Karimi et al.,2021; Keane et al.,2021). By
giving users uncertainty estimates on counterfactual ex-
planations, we may avoid giving them overconfident and
perhaps dangerous options, which can help them make
better decisions and increase their trust in intelligent sys-
tems (Bhatt et al.,2021; Jesson et al., 2020). Moreover,
recent user tests have shown that consumers are more
likely to concur with a model's forecast when given the
appropriate predictive uncertainty (McGrath et al.,2020),
further driving the need for solutions. These instances
have made it abundantly evident that understanding the
uncertainty of proposed explanations is a crucial first step
in developing a valuable and reliable resource, especially
in high- stakes real- world prediction tasks (Upadhyay
et al.,2021).
Finding novel medications that can cure or prevent
a specific condition is the aim of drug discovery. Drugs
come in various forms, but most are tiny compounds that
can attach precisely to a target molecule— typically a pro-
tein implicated in a disease. In the past, scientists have
combed through vast libraries of compounds to find can-
didates that may one day become drugs. Although rational
structure- based drug design has gained popularity over
time, it still necessitates several strategies, synthesis, and
testing phases. However, since it is sometimes challenging
to foresee which chemical construct will have the desired
biological effects and the qualities required to be a suc-
cessful medicine, the drug development process continues
to be costly and time- consuming. Even if a novel medicine
candidate performs well in tests, it could not succeed in
human trials. For example, less than 10% of medication
candidates undergo Phase I studies before release. Given
this, it is understandable why researchers are looking to
artificial intelligence's superior data processing capabili-
ties to expedite and lower the cost of drug discovery. In
addition, AI technologies can speed up medication devel-
opment, stimulate innovation, improve the effectiveness
of clinical trials, and regulate drug dosage.
AI may be able to study and adjust chemical character-
istics in de novo molecular design more fully and swiftly
than teams of scientists using conventional techniques.
One of the difficulties in AI- driven de novo drug creation,
in addition to the invention of new chemical compounds,
is synthetic feasibility, or the capacity to synthesize the
substance. New XAI models, called neuro- symbolic mod-
els, have been developed. These latest AI models are inher-
ently transparent, meaning users can find and understand
the mechanisms that lead to a prediction without having
to modify, simulate, or process any information about the
model's functioning. Neuro- symbolic models combine
symbolic and statistical learning. (Das et al.,2017). This
combination enables a neural network to make reliable
predictions, reinforced by the transparency offered by log-
ical principles understandable to humans. The potential
for interaction between the model and users throughout
the learning process is one of the numerous benefits of
employing these neuro- symbolic models. If some bias is
found, models that explain their function may be changed
more quickly. To estimate the likelihood of prostate can-
cer (PCa) and clinically significant PCa (csPCa) in 2020,
Suh et al. built and validated XGBoost- based XAI models.
They also included these models in a web- based structure,
giving doctors simple access to decision guidance before
a prostate biopsy (Suh et al.,2020). In 2022, Kirboga et al.
discussed their impacts on potential medications that may
be created for (Friedreich Ataxia) (FA/FRDA) using
XAI, which can be described by SHAP (Shapley Additive
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
9
KIRBOĞA et al.
Explanations) values. (Kırboğa et al., 2022). Their un-
derstanding of intricate biological processes is substan-
tially increased, and long- term intervention studies that
might disclose gene- editing methods to aid drug discov-
ery are fascinating. Therefore, using gene expression data
(GED) in which XAI is included, information extraction
and functional validation investigations are accessible to
discover physiologically meaningful sequential patterns
(Anguita- Ruiz et al.,2020). Because they make it easier to
understand which components of the inputs utilized by
the underlying supervised learning approach are essential
to a given prediction, feature association methods are pop-
ular options in the explainable AI toolkit. These strategies
often include coloring molecular graphs in the context
of molecular design, and when presented to medicinal
chemists, they can help them choose which compounds
to synthesize or prioritize. Understanding ML models in
drug design is predicted to be aided by the consistency of
highlighted portions and previous specialist knowledge.
However, infrastructure identification tasks have been the
exclusive focus of the quantitative analysis of such color-
ing techniques thus far (Jiménez- Luna et al.,2022). Using
the suggested benchmark, they discovered that molecular
coloring techniques associated with traditional ML mod-
els frequently outperformed recent alternatives utilizing
graph neural networks. Moreover, they anticipate that the
benchmark data, which is open source, will make it easier
to evaluate recently created molecular feature association
tools (Jiménez- Luna et al.,2022). Luna et al. emphasized
the mathematical methods of these estimations in their
report, while (Vo et al.,2022) prepared a review on drug–
drug interactions of XAI. In this study, while explaining
the mathematical background of algorithms, its use in
various fields, such as drug– drug and drug- target inter-
actions, is focused on. The primary objective of machine
learning (ML) in medicinal chemistry is the prediction of
compound characteristics from chemical structures. In
applications like chemical scanning, virtual library enu-
meration, or generative chemistry, ML is frequently used
for enormous datasets. While desired, a thorough compre-
hension of ML model judgments is typically not required
in these circumstances. Comparatively, efforts at compos-
ite optimization rely on tiny datasets to spot structural
adjustments that result in desirable feature profiles. For
example, if ML is used in this case, the person is frequently
hesitant to make choices based on predictions that can-
not be explained. Only a select number of ML techniques
can be understood. However, illustrative methods may be
used to learn more about sophisticated ML model choices
(Figure1; Rodríguez- Pérez & Bajorath,2021).
Graph neural networks can complete specialized drug
discovery tasks, including predicting chemical properties
and creating brand- new molecules. These models, how-
ever, are regarded as “black boxes” and “hard to debug.”
This study used an integrated gradient XAI technique for
graphical neural network models to increase modeling
transparency for rational molecular design. Models were
trained to forecast cytochrome P450 inhibition, passive
permeability, human ether- a- go- go related gene (hERG)
channel inhibition, and plasma protein binding. The sug-
gested technique focused on structural and molecular
characteristics aligning with well- known pharmacoph-
ore patterns, revealing information on specified property
gaps and general ligand- target interactions. To train new
models at different clinically pertinent endpoints, practi-
tioners may use the created XAI technique, which is en-
tirely open- source (Figure1; Jiménez- Luna et al.,2021).
ML models that discriminate between active and inac-
tive substances are trained to detect structural patterns in
qualitative or quantitative structure– activity correlations
(SARs) investigations. Model choices might be challeng-
ing to grasp but essential for guiding composite design.
Interpreting machine learning outcomes provides extra
model validation based on expert knowledge. Many so-
phisticated ML methods, especially DL architectures,
have a recognizable “black box” quality. SHAP, a locally
comprehensible descriptive technique, is presented in the
study to justify activity estimates of any ML algorithm
independent of complexity. To comprehend the models
produced by DNNs, nonlinear support vector machines
(SVM), and RF learning, structural patterns that are used
to predict the likelihood of activity are found and mapped
on test substances. The findings demonstrate that SHAP
can significantly justify the predictions of sophisticated
ML models (Figure1; Rodríguez- Pérez & Bajorath,2020a).
Finding substances with advantageous pharmacolog-
ical, toxicological, and pharmacokinetic characteristics
continues to be difficult for drug development. DL pro-
vides robust tools to create prediction models appropriate
for expanding data sets. Still, there is a widening divide be-
tween what these neural networks learn and what people
can understand. Additionally, this gap may lead to vulner-
ability and limit the practical implementation of DL appli-
cations. Finally, the article introduces attentive fingerprint
for molecular representation (FP), a new graphical neu-
ral network architecture that leverages a visual attention
mechanism to learn from relevant drug discovery datasets.
They demonstrate that attentive FP produces state- of- the-
art prediction results on various datasets and that the in-
formation it picks up can be understood. By automatically
learning non- local intramolecular interactions from cer-
tain activities, Attentive FP's feature visualization shows
that they can gain direct chemical insights from data out-
side human awareness's scope (Xiong et al.,2020). One of
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10
|
KIRBOĞA et al.
the interesting recent studies is Gimeno et al. It is multidi-
mensional module optimization (MOM). They applied the
MOM to an acute myeloid leukaemia (AML) cohort of 122
screened drugs and 319 ex- vivo tumor samples with WES.
They found that they successfully validated their results in
three large- scale screening experiments. In this way, they
have proven that XAI will help healthcare providers and
drug regulators better understand AI medical decisions
(Gimeno et al., 2022). It has been shown that machine
learning models and scoring functions that simplify the
screened Coulomb and Lennard- Jones interactions be-
tween ligands and residues of the target receptor can sig-
nificantly improve the classification ability to improve the
mentioned virtual screening and identify active ligands
(Shimazaki & Tachikawa,2022). With this method, it has
become easier to identify active ligands with the simpli-
fied scoring method.
Explainable models have been proposed to obtain
more transparent and understandable predictions of drug
studies. In Table1, we reviewed a literature review, ad-
verse drug reactions, drug reuse purpose, drug- disease
interaction, drug design, drug– drug interactions, and
more.
3
|
OPPORTUNITY AND
CHALLENGES OF XAI
XAI can educate the general population about how
AI functions. Although there has been much study on
AI in the public sector, XAI has received less atten-
tion. The concept behind XAI is that humans would be
more likely to accept expert system recommendations
if they could understand them (Swartout et al., 1991;
Swartout & Moore,1993). XAI frequently contrasts with
opaque, black- box techniques that leave unclear how or
why a decision was made. The accuracy of AI models
will increase as they become more sophisticated, but
this may compromise the work's capacity to be under-
stood (Xu et al.,2019). Explainability is an intuitively
appealing concept but difficult to realize. Belle and
Papantonis (2021) offer four suggestions for creating
FIGURE  Studies on the use of XAI in drug discovery. The methodology proposed in (a) highlighted molecular features and
structural elements compatible with known pharmacophore motifs, providing information on accurately defined property gaps and non-
specific ligand- target interactions. Reprinted (adapted) with permission from (Jiménez- Luna et al.,2021). (b) A study demonstrating the
high potential of SHAP to rationalize predictions of complex ML models. Reprinted (adapted) with permission from (Rodríguez- Pérez &
Bajorath,2020a). (c) The study that descriptive approaches can be applied to gain insights into complex ML model decisions. Permission
from (Rodríguez- Pérez & Bajorath,2021). (d) They used and compared multiple XAI methods to projects of well- established SARs, existing
X- ray crystal structures, and lead optimization datasets. Reprinted (adapted) with permission from (Harren et al.,2022).
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
11
KIRBOĞA et al.
explainability, such as explanation by simplifying, de-
scribing the contribution of each feature to decisions,
explaining one example rather than a general one, and
using graphical visualization methods for explanations.
They also discuss the complexity of implementing such
proposals. Simplifications may not be accurate, features
may be related, local explanations may fail to provide
the whole picture, and graphical visualization requires
assumptions about the data that may not necessarily be
true. Explainability is assumed to create transparency
and trust in AI. Although trust can be affected differ-
ently than expected, situational factors also affect trust
(Bannister & Connolly, 2011b). Transparency can in-
crease and decrease trust (Bannister & Connolly,2011a).
Similarly, XAI can increase or decrease confidence.
Therefore, explainability must be better understood,
and strategies are required to build confidence in XAI.
For the domain of drug discovery affected by false
predictions, monitoring results reduces the impact of
false results, and identifying the root cause improves the
underlying model. An explainable system can reduce
the effects of such biased estimates by explaining the
decision- making criteria. AI models always have some
degree of error in their predictions, allowing someone
who is and can be responsible for those errors to make
the entire system more efficient. For example, apply-
ing XAI to detect molecular fingerprints in a drug can
increase the effectiveness of predictions. Most molec-
ular fingerprints are designed, validated, and used in
the context of small molecule drugs within the classi-
cal Lipinski boundaries (Lipinski et al., 2001) and are
not well suited for identifying larger molecules. For
example, the most popular molecular fingerprint is the
Morgan fingerprint (Morgan,1965), also known as the
extended link fingerprint ECFP4 (Rogers & Hahn,2010).
ECFP4, along with the corresponding MinHashed fin-
gerprint MHFP6 (Probst & Reymond, 2018), belongs
to the best- performing fingerprints in small molecule
virtual screening (Riniker & Landrum,2013)and target
prediction benchmarks (Awale & Reymond,2018, 2019).
Both fingerprints detect the presence of specific circular
substructures around each atom in a molecule that pre-
dict the biological activities of small organic molecules.
However, both poorly perceive the spherical properties
of molecules, such as size and shape. Which compounds
have a more significant impact on the drug's efficacy can
be identified. Finally, the drug's chemical implications,
explanation, and justification boost the system's trust.
For more practical usage, several user- critical systems,
such as medical diagnostics, and drug events (Chemical
absorption, distribution, metabolism, excretion, and tox-
icity [ADMET]), require high code trust from the user
(Kırboğa et al.,2022).
XAI is an intelligent, influential, and attractive aspect
of AI, and XAI is a powerful descriptive tool and provides
deeper insights compared to traditional linear models.
However, XAI has its unique challenges, regardless of its
benefits. AI algorithms can identify complex relationships
in large datasets used in drug discovery (Nazar et al.,2021;
Samek et al.,2019; Thomas Altmann et al.,2020). However,
it may not be fully understood what the decisions of
these algorithms are based on, so researchers may have a
hard time understanding why and how a drug is recom-
mended. In addition, learning about biases in the train-
ing data can produce false results (de Bruijn et al.,2022;
Ghassemi et al.,2021). This can be particularly problem-
atic in treating diseases that show genetic variations be-
tween sexes, races, and geographic regions. Large datasets
used for drug discovery contain patients' private health in-
formation. These data need to be protected against privacy
and security risks. However, AI algorithms processing
these data may raise concerns about data privacy (Islam
et al.,2022). By explaining the decisions in the drug dis-
covery process, XAI can help researchers understand why
a drug is recommended. However, the responsibility for
decisions such as the approval and use of drugs is still in
the hands of the people (Holzinger et al.,2019). Therefore,
there may be uncertainties about who is responsible for
the decisions of AI algorithms. Another challenge is in the
drug discovery process, researchers need to understand
how the drugs suggested by AI algorithms work and to
which diseases they can be applied. However, understand-
ing how the decisions of AI algorithms are made and why
these drugs are recommended is essential for researchers
to make the right decisions (Saeed & Omlin,2023).
4
|
CONCLUSION AND FUTURE
OUTLOOK
XAI, which has an essential place for shorter and less
costly preliminary trials of drug discovery time, has been
more deeply involved in drug discovery in recent years.
However, current XAI faces technical challenges and a
multitude of possible explanations and methods applicable
to a given task (Lipton,2017). Targeting small molecules in
drug discovery, accurate prediction of disease- based mech-
anisms, and the capacity of molecules to become drugs
are essential for many biological information needs. XAI
is a technique used to increase the explainability of deci-
sions in critical fields such as medicine and drug discovery.
Methods used in drug discovery include correlation analy-
sis, artificial neural networks (ANN) interpretation, t- SNE
(t- Distribution Stochastic Neighborhood Embedding) and
PCA (Principal Component Analysis). Correlation analy-
sis examines the relationships between molecules used
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12
|
KIRBOĞA et al.
for drug discovery, specifically to identify dependencies
between their properties in the dataset. Interpretation of
ANN is a method to understand how ANN's decisions are
produced. This method measures the contribution of each
input that affects the outputs of the ANN.
Methods such as t- SNE and PCA compress complex
structures in data sets, making them more understand-
able. On the other hand, XAI techniques are used to make
the decisions of machine learning models understand-
able, making the decision- making processes more trans-
parent. These techniques include LIME, SHAP pattern
tracking and consistency analysis. LIME is used to inter-
pret a model's predictions for a given sample. SHAP helps
to understand how models work by measuring the effect
of a feature or variable on model output. Model tracking
allows for tracing the characteristics and decisions of a
model. As a result, methods for drug discovery are often
used to identify relationships between data, make com-
plex structures more understandable, and interpret the
decisions of machine learning models. In contrast, XAI
techniques are used to make machine learning models'
decisions intelligible and increase the reliability of the
models. In addition, the tools, programs, and codes we
will use should also be suitable for the work we aim for.
In current studies, most approaches do not come as ready-
to- use solutions but must be tailored to each application.
Therefore, spending much time on each of them is neces-
sary. Thus, to switch to in vivo and in vitro conditions, the
accuracy of computational studies should be ensured as
much as possible. Since the models used in computational
drug discovery differ in need of explanation, which mod-
els require more explanation or have intrinsic explainabil-
ity should be known. Therefore, the user must understand
what types of responses are needed, meaningful or mean-
ingless (Goodman & Flaxman, 2017). In the pre- drug
discovery phase, we emphasize the importance of find-
ing solutions to existing problems and working interdis-
ciplinary while overcoming these problems. However, a
concerted effort may shine a new light on drug discovery.
For example, data scientists and chemists must work
together when working computationally on a SMILES
array. This cooperation is also necessary for detecting
molecules at a level that can be a drug. Looking at the
recent studies described above, structural features and
molecular descriptors that can be easily interpreted in
chemistry are lax. Finally, they reflect a multifaceted light
on the biochemical process (Awale & Reymond, 2014;
Jiménez- Luna et al., 2020; Katritzky & Gordeeva, 1993;
Rogers & Hahn, 2010; Sheridan, 2019; Todeschini &
Consonni, 2010). Given XAI's existing potential and
constraints in drug discovery, it is fair to expect that
the continuous development of more understandable
and computationally economical hybrid techniques and
alternative models will not lose their value. Although
there are several software data codes for XAI in drug
development, there are currently no open community
platforms for tackling difficulties brought on by the
uniqueness of drug research. Therefore, only synergistic
investigations can produce this circumstance.
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no known competing
financial interests or personal relationships that could
have appeared to influence the work reported in this
paper.
ACKNOWLEDGMENTS
This study was supported by the Read&Publish agree-
ment between TÜBİTAK ULAKBİM and Wiley, Republic
of Turkey (2023).
DATA AVAILABILITY STATEMENT
Data sharing not applicable to this article as no datasets
were generated or analysed during the current study.
ORCID
Kevser Kübra Kırboğa https://orcid.
org/0000-0002-2917-8860
REFERENCES
Adadi, A., & Berrada, M. J. I. A. (2018). Peeking inside the black-
box: A survey on explainable artificial intelligence (XAI). IEEE
Access, 6, 52138– 52160.
Agrawal, R., Gollapudi, S., Halverson, A., & Ieong, S. (2009).
Diversifying search results. Paper presented at the proceedings
of the second ACM international conference on web search and
data mining.
Akbar, S., Ali, F., Hayat, M., Ahmad, A., Khan, S., & Gul, S. (2022).
Prediction of antiviral peptides using transform evolution-
ary & SHAP analysis based descriptors by incorporation with
ensemble learning strategy. Chemometrics and Intelligent
Laboratory Systems, 230, 104682. https://doi.org/10.1016/j.
chemo lab.2022.104682
Alvarez- Melis, D., & Jaakkola, T. (2018). Towards robust inter-
pretability with self- explaining neural networks. ArXiv, 1,
7786– 7795.
Anguita- Ruiz, A., Segura- Delgado, A., Alcalá, R., Aguilera, C. M., &
Alcalá- Fdez, J. (2020). eXplainable artificial intelligence (XAI)
for the identification of biologically relevant gene expression
patterns in longitudinal human studies, insights from obesity
research. PLoS Computational Biology, 16(4), e1007792. https://
doi.org/10.1371/journ al.pcbi.1007792
Arrieta, A. B., Díaz- Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S.,
Barbado, A., … Herrera, F. (2020). Explainable artificial intel-
ligence (XAI): Concepts, taxonomies, opportunities and chal-
lenges toward responsible AI. Information Fusion, 58, 82– 115.
https://doi.org/10.1016/j.inffus.2019.12.012
Atsuko Takagi, M. K., Hamatani, E., Kojima, R., & Okuno, Y. (2022).
GraphIX: Graph- based In silico XAI (explainable artificial
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
13
KIRBOĞA et al.
intelligence) for drug repositioning from biopharmaceutical
network. ArXiv. https://doi.org/10.48550/ arxiv.2212.10788
Awale, M., & Reymond, J.- L. (2014). Atom pair 2D- fingerprints per-
ceive 3D- molecular shape and pharmacophores for very fast
virtual screening of ZINC and GDB- 17. Journal of Chemical
Information and Modeling, 54(7), 1892– 1907.
Awale, M., & Reymond, J.- L. (2018). Polypharmacology browser
PPB2: Target prediction combining nearest neighbors with ma-
chine learning. Journal of Chemical Information and Modeling,
59(1), 10– 17.
Awale, M., & Reymond, J.- L. (2019). Web- based tools for polyphar-
macology prediction. In Systems chemical biology (pp. 255– 272).
Springer.
Baek, M., DiMaio, F., Anishchenko, I., Dauparas, J., Ovchinnikov,
S., Lee, G. R., … Baker, D. (2021). Accurate prediction of pro-
tein structures and interactions using a three- track neural net-
work. Science, 373(6557), 871– 876. https://doi.org/10.1126/scien
ce.abj8754
Banegas- Luna, A. J., & Pérez- Sánchez, H. (2022). SIBILA: High-
performance computing and interpretable machine learning join
efforts toward personalised medicine in a novel decision- making
tool.
Bannister, F., & Connolly, R. (2011a). The trouble with transpar-
ency: A critical review of openness in e- government. Policy &
Internet, 3(1), 1– 30.
Bannister, F., & Connolly, R. (2011b). Trust and transformational
government: A proposed framework for research. Government
Information Quarterly, 28(2), 137– 147.
Belle, V., & Papantonis, I. (2021). Principles and Practice of
Explainable Machine Learning. Frontiers in Big Data, 4. https://
doi.org/10.3389/fdata.2021.688969
Bhatt, U., Zhang, Y., Antorán, J., Liao, Q. V., Sattigeri, P., Fogliato,
R., Melançon, G., Krishnan, R., Stanley, J., Tickoo, O.,
Nachman, L., Chunara, R., Srikumar, M., Weller, A., & Xiang,
A. (2021). Uncertainty as a form of transparency: Measuring,
communicating, and using uncertainty. arXiv preprint arXiv:
2011.07586.
Bittremieux, W., Advani, R. S., Jarmusch, A. K., Aguirre, S., Lu, A.,
Dorrestein, P. C., & Tsunoda, S. M. (2022). Physicochemical
properties determining drug detection in skin. Clinical and
Translational Science, 15(3), 761– 770. https://doi.org/10.1111/
cts.13198
Boström, J., Brown, D. G., Young, R. J., & Keserü, G. M. (2018).
Expanding the medicinal chemistry synthetic toolbox.
Nature Reviews. Drug Discovery, 17(10), 709– 727. https://doi.
org/10.1038/nrd.2018.116
Bresso, E., Monnin, P., Bousquet, C., Calvier, F.- E., Ndiaye, N.- C.,
Petitpain, N., … Coulet, A. (2021). Investigating ADR mech-
anisms with explainable AI: A feasibility study with knowl-
edge graph mining. BMC Medical Informatics and Decision
Making, 21(1), 171. https://doi.org/10.1186/s1291 1- 021-
01518 - 6
Byrne, R. M. (2019). Counterfactuals in explainable artificial intelli-
gence (XAI): Evidence from human reasoning. Paper presented
at the IJCAI.
Carpenter, K. A., & Huang, X. (2018). Machine learning- based virtual
screening and its applications to Alzheimer's drug discovery: A
review. Current Pharmaceutical Design, 24(28), 3347– 3358.
Castelvecchi, D. J. N. N. (2016). Can we open the black box of AI?
Nature, 538(7623), 20.
Chakraborti, T., Sreedharan, S., & Kambhampati, S. J. (2020). The
emerging landscape of explainable ai planning and decision
making. arXiv Preprint arXiv: 2002.
Chen, H., Engkvist, O., Wang, Y., Olivecrona, M., & Blaschke, T.
(2018). The rise of deep learning in drug discovery. Drug
Discovery Today, 23(6), 1241– 1250. https://doi.org/10.1016/j.
drudis.2018.01.039
Codella, N., Hind, M., Natesan Ramamurthy, K., Campbell, M.,
Dhurandhar, A., Kush, R., … Mojsilovic, A. (2018). TED:
Teaching AI to explain its decisions.
Czub, N., Pacławski, A., Szlęk, J., & Mendyk, A. (2021). Curated
database and preliminary AutoML QSAR model for 5- HT1A
receptor. Pharmaceutics, 13(10), 1711. https://doi.org/10.3390/
pharm aceut ics13 101711
Dang, L. H., Dung, N. T., Quang, L. X., Hung, L. Q., Le, N. H., Le, N.
T. N., … Le, N. Q. K. (2021). Machine learning- based prediction
of drug- drug interactions for histamine antagonist using hybrid
chemical features. Cell, 10(11), 3092. https://doi.org/10.3390/
cells 10113092
Das, R., Dhuliawala, S., Zaheer, M., Vilnis, L., Durugkar, I.,
Krishnamurthy, A., … McCallum, A. (2017). Go for a walk and
arrive at the answer: Reasoning over paths in knowledge bases
using reinforcement learning.
de Bruijn, H., Warnier, M., & Janssen, M. (2022). The perils and pit-
falls of explainable AI: Strategies for explaining algorithmic
decision- making. Government Information Quarterly, 39(2),
101666. https://doi.org/10.1016/j.giq.2021.101666
Deng, J., Yang, Z., Ojima, I., Samaras, D., & Wang, F. (2022). Artificial
intelligence in drug discovery: Applications and techniques.
Briefings in Bioinformatics, 23(1). https://doi.org/10.1093/bib/
bbab430
Deore, A., Dhumane, J., Wagh, R., & Sonawane, R. (2019). The stages
of drug discovery and development process. Asian Journal of
Pharmaceutical Research and Development, 7, 62– 67. https://
doi.org/10.22270/ ajprd.v7i6.616
Dhurandhar, A., Chen, P.- Y., Luss, R., Tu, C.- C., Ting, P., Shanmugam,
K., & Das, P. (2018). Explanations based on the missing:
Towards contrastive explanations with pertinent negatives.
Došilović, F. K., Brčić, M., & Hlupić, N. (2018). Explainable artificial
intelligence: A survey. Paper presented at the 2018 41st interna-
tional convention on information and communication technol-
ogy, electronics and microelectronics (MIPRO).
Drancé, M. (2022). Neuro- symbolic XAI: Application to drug repur-
posing for rare diseases. Paper presented at the database Systems
for Advanced Applications: 27th International Conference,
DASFAA 2022, Virtual Event, April 11– 14, 2022, Proceedings,
Part III https://doi.org/10.1007/978- 3- 031- 00129 - 1_51
Drosou, M., Jagadish, H., Pitoura, E., & Stoyanovich, J. J. B. D. (2017).
Diversity in big data: A review. Big Data, 5(2), 73– 84.
Ehsan, U., Tambwekar, P., Chan, L., Harrison, B., & Riedl, M. O.
(2019). Automated rationale generation: A technique for explain-
able AI and its effects on human perceptions. Paper presented
at the Proceedings of the 24th International Conference on
Intelligent User interfaces, Marina del Ray, California https://
doi.org/10.1145/33012 75.3302316
Fan, Y.- W., Liu, W.- H., Chen, Y.- T., Hsu, Y.- C., Pathak, N., Huang, Y.-
W., & Yang, J.- M. (2022). Exploring kinase family inhibitors and
their moiety preferences using deep SHapley additive exPlana-
tions. BMC Bioinformatics, 23(S4), 242. https://doi.org/10.1186/
s1285 9- 022- 04760 - 5
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14
|
KIRBOĞA et al.
Fandinno, J., Schulz, C. J. T., & Programming, P. O. L. (2019).
Answering the “why” in answer set programming— A sur-
vey of explanation approaches. Theory and Practice of Logic
Programming, 19(2), 114– 203.
Feldmann, C., Philipps, M., & Bajorath, J. (2021). Explainable ma-
chine learning predictions of dual- target compounds reveal
characteristic structural features. Scientific Reports, 11(1),
21594. https://doi.org/10.1038/s4159 8- 021- 01099 - 4
Ghassemi, M., Oakden- Rayner, L., & Beam, A. L. (2021). The false
hope of current approaches to explainable artificial intelligence
in health care. The Lancet Digital Health, 3(11), e745– e750.
https://doi.org/10.1016/S2589 - 7500(21)00208 - 9
Gimeno, M., San Jose- Eneriz, E., Villar, S., Agirre, X., Prosper, F.,
Rubio, A., & Carazo, F. (2022). Explainable artificial intel-
ligence for precision medicine in acute myeloid leukemia.
Frontiers in Immunology, 13, 977358. https://doi.org/10.3389/
fimmu.2022.977358
Goodman, B., & Flaxman, S. (2017). European Union regulations on
algorithmic decision- making and a “right to explanation”. AI
Magazine, 38(3), 50– 57.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F.,
& Pedreschi, D. (2018). A survey of methods for explaining
black box models. ACM Computing Surveys (CSUR), 51(5),
1– 42.
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., &
Pedreschi, D. (2019). A survey of methods for explaining black
box models. ACM Computing Surveys, 51(5), 1– 42. https://doi.
org/10.1145/3236009
Gunning, D. (2017). Explainable artificial intelligence (xai). Defense
Advanced Research Projects Agency (DARPA), 2(2), 1.
Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G.- Z.
J. S. R. (2019). XAI— Explainable artificial intelligence. Science
Robotics, 4(37), eaay7120.
Harren, T., Matter, H., Hessler, G., Rarey, M., & Grebner, C. (2022).
Interpretation of structure– activity relationships in real- world
drug design data sets using explainable artificial intelligence.
Journal of Chemical Information and Modeling, 62(3), 447– 462.
https://doi.org/10.1021/acs.jcim.1c01263
He, C., Duan, L., Zheng, H., Song, L., & Huang, M. (2022). An ex-
plainable framework for drug repositioning from disease infor-
mation network. Neurocomputing, 511, 247– 258. https://doi.
org/10.1016/j.neucom.2022.09.063
Holzinger, A., Langs, G., Denk, H., Zatloukal, K., & Müller, H. (2019).
Causability and explainability of artificial intelligence in medi-
cine. WIREs Data Mining and Knowledge Discovery, 9(4), e1312.
https://doi.org/10.1002/widm.1312
Holzinger, A., Saranti, A., Molnar, C., Biecek, P., & Samek, W. (2022).
Explainable AI methods— A brief overview (pp. 13– 38). Springer
International Publishing.
Hosen, M. F., Mahmud, S. M. H., Ahmed, K., Chen, W. Y., Moni,
M. A., Deng, H. W., … Hasan, M. M. (2022). DeepDNAbP: A
deep learning- based hybrid approach to improve the identifi-
cation of deoxyribonucleic acid- binding proteins. Computers
in Biology and Medicine, 145, 105433. https://doi.org/10.1016/j.
compb iomed.2022.105433
Hu, T. M., & Hayton, W. (2011). Architecture of the drug– drug inter-
action network. Journal of Clinical Pharmacy and Therapeutics,
36(2), 135– 143.
Hung, T. N. K., Le, N. Q. K., Le, N. H., Van Tuan, L., Nguyen, T. P.,
Thi, C., & Kang, J. H. (2022). An AI- based prediction model
for drug- drug interactions in osteoporosis and Paget's diseases
from SMILES. Molecular Informatics, 41(6), e2100264. https://
doi.org/10.1002/minf.20210 0264
Imran, M., Bhatti, A., King, D. M., Lerch, M., Dietrich, J., Doron,
G., & Manlik, K. (2022). Supervised machine learning- based de-
cision support for signal validation classification. Drug Safety,
45(5), 583– 596. https://doi.org/10.1007/s4026 4- 022- 01159 - 2
Islam, M. R., Ahmed, M. U., Barua, S., & Begum, S. (2022). A
systematic review of explainable artificial intelligence in
terms of different application domains and tasks. Applied
Sciences, 12(3), 1353. Retrieved from https://www.mdpi.
com/2076- 3417/12/3/1353
Jesson, A., Mindermann, S., Shalit, U., & Gal, Y. (2020). Identifying
causal- effect inference failure with uncertainty- aware models,
33, 11637– 11649.
Jiang, D. J., Wu, Z. X., Hsieh, C. Y., Chen, G. Y., Liao, B., Wang, Z.,
… Hou, T. J. (2021). Could graph neural networks learn better
molecular representation for drug discovery? A comparison
study of descriptor- based and graph- based models. Journal of
Cheminformatics, 13(1), 12. https://doi.org/10.1186/s1332 1-
020- 00479 - 8
Jiménez- Luna, J., Grisoni, F., & Schneider, G. (2020). Drug discov-
ery with explainable artificial intelligence. Nature Machine
Intelligence, 2(10), 573– 584. https://doi.org/10.1038/s4225 6-
020- 00236 - 4
Jiménez- Luna, J., Skalic, M., & Weskamp, N. (2022). Benchmarking
molecular feature attribution methods with activity cliffs.
Journal of Chemical Information and Modeling, 62(2), 274– 283.
https://doi.org/10.1021/acs.jcim.1c01163
Jiménez- Luna, J., Skalic, M., Weskamp, N., & Schneider, G. (2021).
Coloring molecules with explainable artificial intelligence
for preclinical relevance assessment. Journal of Chemical
Information and Modeling, 61(3), 1083– 1094. https://doi.
org/10.1021/acs.jcim.0c01344
Joseph, A. (2019). Shapley regressions: A framework for statistical in-
ference on machine learning models.
Joshi, P., Masilamani, V., & Ramesh, R. (2021). An ensembled SVM
based approach for predicting adverse drug reactions. Current
Bioinformatics, 16(3), 422– 432. https://doi.org/10.2174/15748
93615 99920 07071 41420
Karimi, A.- H., Schölkopf, B., & Valera, I. (2021). Algorithmic re-
course: From counterfactual explanations to interventions. Paper
presented at the Proceedings of the 2021 ACM Conference on
Fairness, Accountability, and Transparency.
Karlebach, G., & Shamir, R. (2008). Modelling and analysis of gene
regulatory networks. Nature Reviews Molecular Cell Biology,
9(10), 770– 780.
Katritzky, A. R., & Gordeeva, E. V. (1993). Traditional topological
indexes vs electronic, geometrical, and combined molecu-
lar descriptors in QSAR/QSPR research. Journal of Chemical
Information and Computer Sciences, 33(6), 835– 857.
Keane, M. T., Kenny, E. M., Delaney, E., & Smyth, B. (2021). If only
we had better counterfactual explanations: Five key deficits to
rectify in the evaluation of counterfactual xai techniques. arXiv
preprint arXiv:2103.01035.
Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F.,
& Sayres, R. (2018). Interpretability beyond feature attribution:
Quantitative testing with concept activation vectors (TCAV).
Paper presented at the Proceedings of the 35th International
Conference on Machine Learning, Proceedings of Machine
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
15
KIRBOĞA et al.
Learning Research. https://proce edings.mlr.press/ v80/kim18d.
html
Kırboğa, K., Kucuksille, E. U., & Köse, U. (2022). Ignition of small
molecule inhibitors in Friedreich's ataxia with explainable arti-
ficial intelligence.
Kuhn, H. W., & Tucker, A. W. (1953). Contributions to the theory of
games. Princeton University Press.
Lavecchia, A. (2015). Machine- learning approaches in drug discov-
ery: Methods and applications. Drug Discovery Today, 20(3),
318– 331. https://doi.org/10.1016/j.drudis.2014.10.012
Lee, D. D., Pham, P., Largman, Y., & Ng, A. (2009). Advances in neu-
ral information processing systems 22.
Lerman, J. J. S. L. R. O. (2013). Big data and its exclusions. Stan. L.
Rev. Online, 66, 55.
Li, Z., Ivanov, A. A., Su, R., Gonzalez- Pecchi, V., Qi, Q., Liu, S., …
Pham, C. (2017). The OncoPPi network of cancer- focused
protein– protein interactions to inform biological insights and
therapeutic strategies. Nature Communications, 8(1), 1– 14.
Liao, Q. V., Gruen, D., & Miller, S. (2020). Questioning the AI:
Informing design practices for explainable AI user experiences.
Paper presented at the Proceedings of the 2020 CHI Conference
on Human Factors in Computing Systems.
Lin, H. C., Wang, Z., Hu, Y. H., Simon, K., & Buu, A. (2022).
Characteristics of statewide prescription drug monitoring
programs and potentially inappropriate opioid prescribing
to patients with non- cancer chronic pain: A machine learn-
ing application. Preventive Medicine, 161, 107116. https://doi.
org/10.1016/j.ypmed.2022.107116
Lin, X., Quan, Z., Wang, Z.- J., Ma, T., & Zeng, X. (2020). KGNN:
Knowledge graph neural network for drug- drug interaction
prediction.
Lipinski, C. A., Lombardo, F., Dominy, B. W., & Feeney, P. J. (2001).
Experimental and computational approaches to estimate solu-
bility and permeability in drug discovery and development set-
tings. Advanced Drug Delivery Reviews, 46(3), 3– 26.
Lipton, Z. (2016). The mythos of model interpretability. Communications
of the ACM, 61, 36– 43. https://doi.org/10.1145/3233231
Lipton, Z. (2017). The doctor just won't accept that!
Lipton, Z. C. J. Q. (2018). The mythos of model interpretability: In
machine learning, the concept of interpretability is both im-
portant and slippery. Queue, 16(3), 31– 57.
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M.,
Nair, B., … Lee, S.- I. (2020). From local explanations to global
understanding with explainable AI for trees. Nature Machine
Intelligence, 2(1), 56– 67.
Lundberg, S. M., & Lee, S.- I. (2017a). A unified approach to interpret-
ing model predictions.
Lundberg, S. M., & Lee, S.- I. (2017b). A unified approach to inter-
preting model predictions. Paper presented at the Proceedings
of the 31st International Conference on Neural Information
Processing Systems, Long Beach, California, USA.
Lundberg, S. M., & Lee, S.- I. (2017c). A unified approach to inter-
preting model predictions. ArXiv, abs/1705.07874 https://doi.
org/10.48550/ arXiv.1705.07874
Ma, J., Sheridan, R. P., Liaw, A., Dahl, G. E., & Svetnik, V. (2015).
Deep neural nets as a method for quantitative structure– activity
relationships. Journal of Chemical Information and Modeling,
55(2), 263– 274. https://doi.org/10.1021/ci500 747n
Ma, P., Liu, R. X., Gu, W. R., Dai, Q., Gan, Y., Cen, J., … Chen, Y.
C. (2022). Construction and interpretation of prediction model
of Teicoplanin trough concentration via machine learning.
Frontiers in Medicine, 9, 808969. https://doi.org/10.3389/
fmed.2022.808969
Mak, K.- K., Balijepalli, M. K., & Pichika, M. R. (2022). Success stories
of AI in drug discovery— Where do things stand? Expert Opinion
on Drug Discovery, 17(1), 79– 92. https://doi.org/10.1080/17460
441.2022.1985108
Mater, A. C., & Coote, M. L. (2019). Deep learning in chemistry.
Journal of Chemical Information and Modeling, 59(6), 2545–
2559. https://doi.org/10.1021/acs.jcim.9b00266
McGrath, S., Mehta, P., Zytek, A., Lage, I., & Lakkaraju, H. J.
(2020). When does uncertainty matter?: Understanding the
impact of predictive uncertainty in ML assisted decision
making.
Morgan, H. L. (1965). The generation of a unique machine descrip-
tion for chemical structures- a technique developed at chemi-
cal abstracts service. Journal of Chemical Documentation, 5(2),
107– 113.
Nazar, M., Alam, M. M., Yafi, E., & Su'ud, M. M. (2021). A sys-
tematic review of human– computer interaction and explain-
able artificial intelligence in healthcare with artificial intelli-
gence techniques. IEEE Access, 9, 153316– 153348. https://doi.
org/10.1109/ACCESS.2021.3127881
Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks
are easily fooled: High confidence predictions for unrecognizable
images. Paper presented at the Proceedings of the IEEE confer-
ence on computer vision and pattern recognition.
Páez, A. (2019). The pragmatic turn in explainable artificial intelli-
gence (XAI). Minds and Machines, 29(3), 441– 459. https://doi.
org/10.1007/s1102 3- 019- 09502 - w
Pfeifer, B., Saranti, A., & Holzinger, A. (2022). GNN- SubNet: Disease
subnetwork detection with explainable graph neural networks.
Bioinformatics, 38(Supplement_2), ii120– ii126. https://doi.
org/10.1093/bioin forma tics/btac478
Polzer, A., Fleiß, J., Ebner, T., Kainz, P., Koeth, C., & Thalmann,
S. (2022). Validation of AI- based information systems for sen-
sitive use cases: Using an XAI approach in pharmaceutical
engineering.
Preece, A., Harborne, D., Braines, D., Tomsett, R., & Chakraborty, S.
J. (2018). Stakeholders in Explainable AI.
Probst, D., & Reymond, J.- L. (2018). A probabilistic molecular finger-
print for big data settings. Journal of Cheminformatics, 10(1), 1– 12.
Rasmussen, C. E. (2003). Gaussian processes in machine learning.
Paper presented at the Summer school on machine learning.
Rebane, J., Samsten, I., Pantelidis, P., & Papapetrou, P. (2021, 7- 9
June 2021). Assessing the clinical validity of attention- based
and SHAP temporal explanations for adverse drug event pre-
dictions. Paper presented at the 2021 IEEE 34th International
Symposium on Computer- Based Medical Systems (CBMS).
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016a). “Why should I trust
you?”: Explaining the predictions of any classifier.
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016b). "Why should i trust
you?" Explaining the predictions of any classifier. Paper presented
at the Proceedings of the 22nd ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining.
Riniker, S., & Landrum, G. A. (2013). Open- source platform to
benchmark fingerprints for ligand- based virtual screening.
Journal of Cheminformatics, 5(1), 1– 17.
Rodríguez- Pérez, R., & Bajorath, J. (2019). Interpretation of com-
pound activity predictions from complex machine learning
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
16
|
KIRBOĞA et al.
models using local approximations and shapley values. Journal
of Medicinal Chemistry, 63(16), 8761– 8777.
Rodríguez- Pérez, R., & Bajorath, J. (2020a). Interpretation of com-
pound activity predictions from complex machine learn-
ing models using local approximations and Shapley values.
Journal of Medicinal Chemistry, 63(16), 8761– 8777. https://doi.
org/10.1021/acs.jmedc hem.9b01101
Rodríguez- Pérez, R., & Bajorath, J. (2020b). Interpretation of
machine learning models using shapley values: Application
to compound potency and multi- target activity predic-
tions. Journal of Computer- Aided Molecular Design, 34(10),
1013– 1026.
Rodríguez- Pérez, R., & Bajorath, J. (2021). Explainable machine
learning for property predictions in compound optimization.
Journal of Medicinal Chemistry, 64(24), 17744– 17752. https://
doi.org/10.1021/acs.jmedc hem.1c01789
Rogers, D., & Hahn, M. (2010). Extended- connectivity fingerprints.
Journal of Chemical Information and Modeling, 50(5), 742– 754.
Rosenfeld, A., Richardson, A. J. A. A., & Systems, M.- A. (2019).
Explainability in human– agent systems. Autonomous Agents
and Multi- Agent Systems., 33(6), 673– 705.
Rudin, C. (2019). Stop explaining black box machine learning mod-
els for high stakes decisions and use interpretable models in-
stead. Nature Machine Intelligence, 1(5), 206– 215. https://doi.
org/10.1038/s4225 6- 019- 0048- x
Saeed, W., & Omlin, C. (2023). Explainable AI (XAI): A system-
atic meta- survey of current challenges and future opportu-
nities. Knowledge- Based Systems, 263, 110273. https://doi.
org/10.1016/j.knosys.2023.110273
Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., & Müller, K.- R.
(2019). Explainable AI: Interpreting, explaining and visualizing
deep learning. https://doi.org/10.1007/978- 3- 030- 28954 - 6_1
Savage, N. (2021). Tapping into the drug discovery potential of AI.
Retrieved from https://www.nature.com/artic les/d4374 7- 021-
00045 - 7
Schneider, G. (2018). Automating drug discovery. Nature Reviews.
Drug Discovery, 17(2), 97– 113. https://doi.org/10.1038/
nrd.2017.232
Schneider, P., Walters, W. P., Plowright, A. T., Sieroka, N., Listgarten,
J., Goodnow, R. A., Jr., … Schneider, G. (2020). Rethinking
drug design in the artificial intelligence era. Nature Reviews.
Drug Discovery, 19(5), 353– 364. https://doi.org/10.1038/s4157
3- 019- 0050- 3
Sheridan, R. P. (2019). Interpretation of QSAR models by coloring
atoms according to changes in predicted activity: How robust
is it? Journal of Chemical Information and Modeling, 59(4),
1324– 1337.
Shimazaki, T., & Tachikawa, M. (2022). Collaborative approach be-
tween explainable artificial intelligence and simplified chemi-
cal interactions to explore active ligands for cyclin- dependent
kinase 2. ACS Omega, 7(12), 10372– 10381. https://doi.
org/10.1021/acsom ega.1c06976
Stelling, J., Klamt, S., Bettenbrock, K., Schuster, S., & Gilles, E. D.
(2002). Metabolic network structure determines key aspects of
functionality and regulation. Nature, 420(6912), 190– 193.
Stokes, J. M., Yang, K., Swanson, K., Jin, W., Cubillos- Ruiz, A.,
Donghia, N. M., … Collins, J. J. (2020). A deep learning approach
to antibiotic discovery. Cell, 180(4), 688– 702.e613. https://doi.
org/10.1016/j.cell.2020.01.021
Stumpfe, D., & Bajorath, J. (2020). Current trends, overlooked is-
sues, and unmet challenges in virtual screening. Journal of
Chemical Information and Modeling, 60(9), 4112– 4115. https://
doi.org/10.1021/acs.jcim.9b01101
Suh, J., Yoo, S., Park, J., Cho, S. Y., Cho, M. C., Son, H., & Jeong, H.
(2020). Development and validation of an explainable artificial
intelligence- based decision- supporting tool for prostate biopsy.
BJU International, 126(6), 694– 703. https://doi.org/10.1111/
bju.15122
Swartout, W., Paris, C., & Moore, J. (1991). Explanations in knowl-
edge systems: Design for explainable expert systems. IEEE
Expert, 6(3), 58– 64.
Swartout, W. R., & Moore, J. D. (1993). Explanation in second gen-
eration expert systems. In Second generation expert systems (pp.
543– 585). Springer.
Thomas Altmann, J. B., Dankers, C., Dassen, T., Fritz, N., Gruber,
S., Kopper, P., Kronseder, V., Wagner, M., & Renkl, E. (2020).
Limitations of interpretable machine learning methods.
Tjoa, E., & Guan, C. (2020). A survey on explainable artificial intelli-
gence (XAI): Toward Medical Xai. IEEE Transactions on Neural
Networks and Learning Systems, 32(11), 4793– 4813.
Todeschini, R., & Consonni, V. (2010). New local vertex invariants
and molecular descriptors based on functions of the vertex
degrees. MATCH Communications in Mathematical and in
Computer Chemistry, 64(2), 359– 372.
Upadhyay, S., Joshi, S., & Lakkaraju, H. (2021). Towards robust and
reliable algorithmic recourse. 34, 16926– 16937.
Vamathevan, J., Clark, D., Czodrowski, P., Dunham, I., Ferran, E.,
Lee, G., … Zhao, S. (2019). Applications of machine learning
in drug discovery and development. Nature Reviews. Drug
Discovery, 18(6), 463– 477. https://doi.org/10.1038/s4157
3- 019- 0024- 5
Vangala, S. R., Bung, N., Krishnan, S. R., & Roy, A. (2022). An interpre-
table machine learning model for selectivity of small molecules
against homologous protein family. Future Medicinal Chemistry,
14(20), 1441– 1453. https://doi.org/10.4155/fmc- 2022- 0075
Vo, T. H., Nguyen, N. T. K., Kha, Q. H., & Le, N. Q. K. (2022). On
the road to explainable AI in drug- drug interactions pre-
diction: A systematic review. Computational and Structural
Biotechnology Journal, 20, 2112– 2123. https://doi.org/10.1016/j.
csbj.2022.04.021
Wang, Q., Huang, K., Chandak, P., Zitnik, M., & Gehlenborg, N.
(2022). Extending the nested model for user- centric XAI: A de-
sign study on GNN- based drug repurposing. IEEE Transactions
on Visualization and Computer Graphics, 29, 1266– 1276.
https://doi.org/10.1109/tvcg.2022.3209435
Wang, Q., Li, M., Wang, X., Parulian, N., Han, G., Ma, J., …
Onyshkevych, B. (2021). COVID- 19 literature knowledge graph
construction and drug repurposing report generation.
Ward, I. R., Wang, L., Lu, J., Bennamoun, M., Dwivedi, G., &
Sanfilippo, F. M. (2021). Explainable artificial intelligence for
pharmacovigilance: What features are important when pre-
dicting adverse outcomes? Computer Methods and Programs
in Biomedicine, 212, 106415. https://doi.org/10.1016/j.
cmpb.2021.106415
West, D. M. (2018). The future of work: Robots, AI, and automation.
Brookings Institution Press.
Wójcikowski, M., Siedlecki, P., & Ballester, P. J. (2019). Building
machine- learning scoring functions for structure- based
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
|
17
KIRBOĞA et al.
prediction of intermolecular binding affinity. In W. F. de
Azevedo Jr (Ed.), Docking screens for drug discovery (pp. 1– 12).
Springer New York.
Wojtuch, A., Jankowski, R., & Podlewska, S. (2021). How can SHAP
values help to shape metabolic stability of chemical com-
pounds? Journal of Cheminformatics, 13(1), 74. https://doi.
org/10.1186/s1332 1- 021- 00542 - y
Xiong, Z., Wang, D., Liu, X., Zhong, F., Wan, X., Li, X., … Zheng,
M. (2020). Pushing the boundaries of molecular representa-
tion for drug discovery with the graph attention mechanism.
Journal of Medicinal Chemistry, 63(16), 8749– 8760. https://doi.
org/10.1021/acs.jmedc hem.9b00959
Xu, F., Uszkoreit, H., Du, Y., Fan, W., Zhao, D., & Zhu, J. (2019).
Explainable AI: A brief survey on history, research areas, ap-
proaches and challenges. Paper presented at the CCF interna-
tional conference on natural language processing and Chinese
computing.
Yu, Z., Ji, H. H., Xiao, J. W., Wei, P., Song, L., Tang, T. T., … Jia, Y.
T. (2021). Predicting adverse drug events in Chinese pediatric
inpatients with the associated risk factors: A machine learn-
ing study. Frontiers in Pharmacology, 12, 659099. https://doi.
org/10.3389/fphar.2021.659099
Zeng, X., Song, X., Ma, T., Pan, X., Zhou, Y., Hou, Y., … Cheng, F. (2020).
Repurpose open data to discover therapeutics for COVID- 19
using deep learning. Journal of Proteome Research, 19(11), 4624–
4636. https://doi.org/10.1021/acs.jprot eome.0c00316
Zeng, X., Tu, X., Liu, Y., Fu, X., & Su, Y. (2022). Toward better drug
discovery with knowledge graph. Current Opinion in Structural
Biology, 72, 114– 126. https://doi.org/10.1016/j.sbi.2021.09.003
Zhang, Q., Yang, Y., Liu, Y., Wu, Y. N., & Zhu, S.- C. J. (2018).
Unsupervised learning of neural networks to explain neural
networks.
Zhavoronkov, A., Ivanenkov, Y. A., Aliper, A., Veselov, M. S.,
Aladinskiy, V. A., Aladinskaya, A. V., … Aspuru- Guzik, A.
(2019). Deep learning enables rapid identification of potent
DDR1 kinase inhibitors. Nature Biotechnology, 37(9), 1038–
1040. https://doi.org/10.1038/s4158 7- 019- 0224- x
Zhu, J., Liapis, A., Risi, S., Bidarra, R., & Youngblood, G. M.
(2018). Explainable AI for designers: A human- centered per-
spective on mixed- initiative co- creation. Paper presented at
the 2018 IEEE Conference on Computational Intelligence
and Games (CIG).
Zhu, X. Q., Hu, J. Q., Xiao, T., Huang, S. Q., Shang, D. W., & Wen, Y.
G. (2022). Integrating machine learning with electronic health
record data to facilitate detection of prolactin level and phar-
macovigilance signals in olanzapine- treated patients. Frontiers
in Endocrinology, 13, 1011492. https://doi.org/10.3389/
fendo.2022.1011492
How to cite this article: Kırboğa, K. K., Abbasi,
S., & Küçüksille, E. U. (2023). Explainability and
white box in drug discovery. Chemical Biology &
Drug Design, 00, 1–17. https://doi.org/10.1111/
cbdd.14262
17470285, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/cbdd.14262 by Bilecik Seyh Edebali, Wiley Online Library on [07/06/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
... Such interpretability gap could be ad-dressed by Explainable Artificial Intelligence (XAI), which tries to offer clear and intelligible justifications for the pre-dictions made by AI and ML models [9]. In addition to increasing trust and acceptance, XAI also makes it possible for researchers to spot any biases, inaccuracies, or limits in the underlying data or model architecture by making it possible for humans to understand the logic behind the model's predictions. ...
... SHAP introduces a quantitative metric for discerning the significance of molecular features, offering a nuanced understanding of how each characteristic contributes to the overall predictive model in drug discovery [20]. This along with the LIME is capable of refining interpretability by producing locally faithful explanations, unraveling predictions at the granularity of individual instances [9]. This collaborative approach not only illuminates global feature importance but also enhances the model's transparency on a case-by-case basis. ...
... Model complexity and interpretability still need to be balanced. It is crucial to provide explanations that dissect the complex decision-making of these advanced models since drug discovery entails complex biological interactions and enormous datasets [9]. To achieve this balance, novel methods must be developed that transform the complicated models' high-dimensional interactions into understandable insights, enabling researchers and subject-matter experts to understand and believe AI-driven forecasts [78]. ...
Article
Full-text available
The field of drug discovery has experienced a remarkable transformation with the advent of artificial intelligence (AI) and machine learning (ML) technologies. However, as these AI and ML models are becoming more complex, there is a growing need for transparency and interpretability of the models. Explainable Artificial Intelligence (XAI) is a novel approach that addresses this issue and provides a more interpretable understanding of the predictions made by machine learning models. In recent years, there has been an increasing interest in the application of XAI techniques to drug discovery. This review article provides a comprehensive overview of the current state-of-the-art in XAI for drug discovery, including various XAI methods, their application in drug discovery, and the challenges and limitations of XAI techniques in drug discovery. The article also covers the application of XAI in drug discovery, including target identification, compound design, and toxicity prediction. Furthermore, the article suggests potential future research directions for the application of XAI in drug discovery. This review article aims to provide a comprehensive understanding of the current state of XAI in drug discovery and its potential to transform the field.
... The development of more interpretable AI models and the use of explainable AI techniques are crucial steps towards overcoming this challenge. Explainable AI (XAI) techniques, such as SHAP and LIME, have been proposed to address this issue [196]. These methods aim to provide transparency and accountability in AI systems, which is crucial in sensitive areas like healthcare [197]. ...
Article
Full-text available
Traditional drug discovery struggles to keep pace with the ever-evolving threat of infectious diseases. New viruses and antibiotic-resistant bacteria, all demand rapid solutions. Artificial Intelligence (AI) offers a promising path forward through accelerated drug repurposing. AI allows researchers to analyze massive datasets, revealing hidden connections between existing drugs, disease targets, and potential treatments. This approach boasts several advantages. First, repurposing existing drugs leverages established safety data and reduces development time and costs. Second, AI can broaden the search for effective therapies by identifying unexpected connections between drugs and potential new targets. Finally, AI can help mitigate limitations by predicting and minimizing side effects, optimizing drugs for repurposing, and navigating intellectual property hurdles. The article explores specific AI strategies like virtual screening, target identification, structure base drug design and natural language processing. Real-world examples highlight the potential of AI-driven drug repurposing in discovering new treatments for infectious diseases.
... Deep-learning models combined with explainable artificial intelligence have potential for broad applications in precision medicine, from enhancing disease diagnosis to facilitating drug discovery [69][70][71]. Deep-learning models offer more exact and efficient diagnosis for diseases requiring analysis of medical images (i.e., cancer, dementia), compared with human experts [72]. Explainable artificial-intelligence approaches to deep-learning models of medical images often include some form of visual explanation highlighting the image segments the model used to make the diagnosis [73,74]. ...
Article
Full-text available
This review synthesizes the literature on explaining machine-learning models for digital health data in precision medicine. As healthcare increasingly tailors treatments to individual characteristics, the integration of artificial intelligence with digital health data becomes crucial. Leveraging a topic-modeling approach, this paper distills the key themes of 27 journal articles. We included peer-reviewed journal articles written in English, with no time constraints on the search. A Google Scholar search, conducted up to 19 September 2023, yielded 27 journal articles. Through a topic-modeling approach, the identified topics encompassed optimizing patient healthcare through data-driven medicine, predictive modeling with data and algorithms, predicting diseases with deep learning of biomedical data, and machine learning in medicine. This review delves into specific applications of explainable artificial intelligence, emphasizing its role in fostering transparency, accountability, and trust within the healthcare domain. Our review highlights the necessity for further development and validation of explanation methods to advance precision healthcare delivery.
... LIME is model agnostic; therefore, it may be used with any machine learning model. The technique attempts to comprehend the model by dissecting the input of the data samples and seeing how the predictions vary [24,31]. The explainer-LIME approach was utilised to construct estimation values in this study. ...
Article
Full-text available
Cyclooxygenase-2 (COX-2) inhibitors are nonsteroidal anti-inflammatory drugs that treat inflammation, pain and fever. This study determined the interaction mechanisms of COX-2 inhibitors and the molecular properties needed to design new drug candidates. Using machine learning and explainable AI methods, the inhibition activity of 1488 molecules was modelled, and essential properties were identified. These properties included aromatic rings, nitrogen-containing functional groups and aliphatic hydrocarbons. They affected the water solubility, hydrophobicity and binding affinity of COX-2 inhibitors. The binding mode, stability and ADME properties of 16 ligands bound to the Cyclooxygenase active site of COX-2 were investigated by molecular docking, molecular dynamics simulation and MM-GBSA analysis. The results showed that ligand 339,222 was the most stable and effective COX-2 inhibitor. It inhibited prostaglandin synthesis by disrupting the protein conformation of COX-2. It had good ADME properties and high clinical potential. This study demonstrated the potential of machine learning and bioinformatics methods in discovering COX-2 inhibitors. Graphical abstract This study uses machine learning, bioinformatics and explainable artificial intelligence (XAI) methods to discover and design new drugs that can reduce inflammation by inhibiting COX-2. The activity and properties of various molecules are modelled and analysed. The best molecule is selected, and its interaction with the enzyme is investigated. The results show how this molecule can block the enzyme and prevent inflammation. XAI methods are used to explain the molecular features and mechanisms involved.
... This study applied an annotation method for the model using the LIME library. LIME is a method used to produce fast and understandable annotations to the model [59,60]. LimeTabularExplainer, LIME's tabular annotation module, was chosen to help us understand the predictions of classi cation models. ...
Preprint
Full-text available
Ankylosing spondylitis (AS), an autoimmune disease, has the HLA-B27 gene in more than 90% of its patients. This study investigated the ability of health parameters to predict the presence of the HLA-B-27 gene and clinical and demographic data used in diagnosing AS. For this purpose, various classification models were evaluated, and the best-performing RFC model was selected. In addition, the model's predictions are understood and explained using XAI techniques such as SHAP and LIME. The model development results show that the RFC model performs best (Accuracy:0.75, F1 Score:0.74, Recall:0.75, Precision:0.75, Brier Score:0.25, AUC: 0.76), and XAI techniques provide the ability to explain the decisions of this model. Among the health parameters, WBC, Hematocrit, uric acid, and gender were found to show the strongest association with HLA-B-27. This study aims to understand the genetic predisposition of AS and to illuminate the potential of XAI techniques in medical diagnosis. The study's strengths include comprehensive model evaluation, explainability of model decisions, and revealing the relationship between health parameters and HLA-B-27. In addition, this study considered ethical dimensions like the confidentiality of personal health data and the privacy of patients.
... 196 Nevertheless, advances continue to be made by way of text-based explanations, visualizations, explanations by example or simplification, and feature relevance. 194,197 These techniques have increasingly been applied to biomedical sciences and healthcare, 178,198,199 such as for drug discovery 200,201 and prediction of drug-drug interactions. 202 216 PubMedGPT/BioMedLM, 217 GeneGPT, 218 BioGPT, 219 PubMedBERT, 220 BioLinkBERT, 221 Galactica, 222 and BioMegatron. ...
Article
INTRODUCTION Experimental models are essential tools in neurodegenerative disease research. However, the translation of insights and drugs discovered in model systems has proven immensely challenging, marred by high failure rates in human clinical trials. METHODS Here we review the application of artificial intelligence (AI) and machine learning (ML) in experimental medicine for dementia research. RESULTS Considering the specific challenges of reproducibility and translation between other species or model systems and human biology in preclinical dementia research, we highlight best practices and resources that can be leveraged to quantify and evaluate translatability. We then evaluate how AI and ML approaches could be applied to enhance both cross‐model reproducibility and translation to human biology, while sustaining biological interpretability. DISCUSSION AI and ML approaches in experimental medicine remain in their infancy. However, they have great potential to strengthen preclinical research and translation if based upon adequate, robust, and reproducible experimental data. Highlights There are increasing applications of AI in experimental medicine. We identified issues in reproducibility, cross‐species translation, and data curation in the field. Our review highlights data resources and AI approaches as solutions. Multi‐omics analysis with AI offers exciting future possibilities in drug discovery.
... Deep learning models, such as multilayer perceptrons or Long-Short Term Memory networks (25,44,45), improve the representation of the data and extract features found to be significant to the endpoint being modeled. However, deep models are complex, requiring many parameters, and interpretability remains an unsolved problem (46). In this context, an advantage of the analysis presented here is that the statistical quantities are not based on fitted parameters, but are rather calculated directly from the data, and the prediction process follows well-defined steps from the calculation of epistatic divergence components to the assessment of mutant combinations. ...
Article
Full-text available
The identification of catalytic RNAs is typically achieved through primarily experimental means. However, only a small fraction of sequence space can be analyzed even with high-throughput techniques. Methods to extrapolate from a limited data set to predict additional ribozyme sequences, particularly in a human-interpretable fashion, could be useful both for designing new functional RNAs and for generating greater understanding about a ribozyme fitness landscape. Using information theory, we express the effects of epistasis (i.e., deviations from additivity) on a ribozyme. This representation was incorporated into a simple model of the epistatic fitness landscape, which identified potentially exploitable combinations of mutations. We used this model to theoretically predict mutants of high activity for a self-aminoacylating ribozyme, identifying potentially active triple and quadruple mutants beyond the experimental data set of single and double mutants. The predictions were validated experimentally, with nine out of nine sequences being accurately predicted to have high activity. This set of sequences included mutants that form a previously unknown evolutionary ‘bridge’ between two ribozyme families that share a common motif. Individual steps in the method could be examined, understood, and guided by a human, combining interpretability and performance in a simple model to predict ribozyme sequences by extrapolation.
... In clinical settings, XAI can be integrated into decision support systems, providing transparent explanations for predictions and assisting healthcare professionals in making accurate diagnoses and treatment plans [38]. In drug discovery, XAI models help researchers comprehend the relationships between molecular features, pathways, and drug responses, leading to the development of safer and more effective therapeutics [39]. XAI also supports precision medicine by enabling personalized risk prediction and treatment selection with understandable rationales. ...
Article
The advancements in genomics and biomedical technologies have generated vast amounts of biological and physiological data, which present opportunities for understanding human health. Deep learning (DL) and machine learning (ML) are frontiers and interdisciplinary fields of computer science that consider comprehensive computational models and provide integral roles for disease diagnosis and therapy investigation. DL-based algorithms can discover the intrinsic hierarchies in the training data to show great promise for extracting features and learning patterns from complex datasets and performing various analytical tasks. This review comprehensively discusses the wide-ranging DL approaches for intelligent healthcare systems (IHS) in genomics and biomedicine. This paper explores advanced concepts in deep learning (DL) and discusses the workflow of utilizing role-based algorithms in genomics and biomedicine to integrate intelligent healthcare systems (IHS). The aim is to overcome biomedical obstacles like patient disease classification, core biomedical processes, and empowering patient-disease integration. The paper also highlights how DL approaches are well-suited for addressing critical challenges in these domains, offering promising solutions for improved healthcare outcomes. We also provided a concise concept of DL architectures and model optimization in genomics and bioinformatics at the molecular level to deal with biomedicine classification, genomic sequence analysis, protein structure classification, and prediction. Finally, we discussed DL's current challenges and future perspectives in genomics and biomedicine for future directions.
Article
Clinical trials are primarily conducted to estimate causal effects, but the data collected can also be invaluable for additional research, such as identifying prognostic measures of disease or biomarkers that predict treatment efficacy. However, these exploratory settings are prone to false discoveries (type‐I errors) due to the multiple comparisons they entail. Unfortunately, many methods fail to address this issue, in part because the algorithms used are generally designed to optimize predictions and often only provide the measures used for variable selection, such as machine learning model importance scores, as a byproduct. To address the resulting unclear uncertainty in the selection sets, the knockoff framework offers a model‐agnostic, robust approach to variable selection with guaranteed type‐I error control. Here, we review the knockoff framework in the setting of clinical data, highlighting main considerations using simulation studies. We also extend the framework by introducing a novel knockoff generation method that addresses two main limitations of previously suggested methods relevant for clinical development settings. With this new method, we empirically obtain tighter bounds on type‐I error control and gain an order of magnitude in computational efficiency in mixed data settings. We demonstrate comparable selections to those of the competing method for identifying prognostic biomarkers for C‐reactive protein levels in patients with psoriatic arthritis in four clinical trials. Our work increases access to the knockoff framework for variable selection from clinical trial data. Hereby, this paper helps to address the current replicability crisis which can result in unnecessary research efforts, increased patient burden, and avoidable costs.
Article
Full-text available
Background and aim Available evidence suggests elevated serum prolactin (PRL) levels in olanzapine (OLZ)-treated patients with schizophrenia. However, machine learning (ML)-based comprehensive evaluations of the influence of pathophysiological and pharmacological factors on PRL levels in OLZ-treated patients are rare. We aimed to forecast the PRL level in OLZ-treated patients and mine pharmacovigilance information on PRL-related adverse events by integrating ML and electronic health record (EHR) data. Methods Data were extracted from an EHR system to construct an ML dataset in 672×384 matrix format after preprocessing, which was subsequently randomly divided into a derivation cohort for model development and a validation cohort for model validation (8:2). The eXtreme gradient boosting (XGBoost) algorithm was used to build the ML models, the importance of the features and predictive behaviors of which were illustrated by SHapley Additive exPlanations (SHAP)-based analyses. The sequential forward feature selection approach was used to generate the optimal feature subset. The co-administered drugs that might have influenced PRL levels during OLZ treatment as identified by SHAP analyses were then compared with evidence from disproportionality analyses by using OpenVigil FDA. Results The 15 features that made the greatest contributions, as ranked by the mean (|SHAP value|), were identified as the optimal feature subset. The features were gender_male, co-administration of risperidone, age, co-administration of aripiprazole, concentration of aripiprazole, concentration of OLZ, progesterone, co-administration of sulpiride, creatine kinase, serum sodium, serum phosphorus, testosterone, platelet distribution width, α-L-fucosidase, and lipoprotein (a). The XGBoost model after feature selection delivered good performance on the validation cohort with a mean absolute error of 0.046, mean squared error of 0.0036, root-mean-squared error of 0.060, and mean relative error of 11%. Risperidone and aripiprazole exhibited the strongest associations with hyperprolactinemia and decreased blood PRL according to the disproportionality analyses, and both were identified as co-administered drugs that influenced PRL levels during OLZ treatment by SHAP analyses. Conclusions Multiple pathophysiological and pharmacological confounders influence PRL levels associated with effective treatment and PRL-related side-effects in OLZ-treated patients. Our study highlights the feasibility of integration of ML and EHR data to facilitate the detection of PRL levels and pharmacovigilance signals in OLZ-treated patients.
Article
Full-text available
Whether AI explanations can help users achieve specific tasks efficiently (i.e., usable explanations) is significantly influenced by their visual presentation. While many techniques exist to generate explanations, it remains unclear how to select and visually present AI explanations based on the characteristics of domain users. This paper aims to understand this question through a multidisciplinary design study for a specific problem: explaining graph neural network (GNN) predictions to domain experts in drug repurposing, i.e., reuse of existing drugs for new diseases. Building on the nested design model of visualization, we incorporate XAI design considerations from a literature review and from our collaborators' feedback into the design process. Specifically, we discuss XAI-related design considerations for usable visual explanations at each design layer: target user, usage context, domain explanation, and XAI goal at the domain layer; format, granularity, and operation of explanations at the abstraction layer; encodings and interactions at the visualization layer; and XAI and rendering algorithm at the algorithm layer. We present how the extended nested model motivates and informs the design of DrugExplorer, an XAI tool for drug repurposing. Based on our domain characterization, DrugExplorer provides path-based explanations and presents them both as individual paths and meta-paths for two key XAI operations, why and what else. DrugExplorer offers a novel visualization design called MetaMatrix with a set of interactions to help domain users organize and compare explanation paths at different levels of granularity to generate domain-meaningful insights. We demonstrate the effectiveness of the selected visual presentation and DrugExplorer as a whole via a usage scenario, a user study, and expert interviews. From these evaluations, we derive insightful observations and reflections that can inform the design of XAI visualizations for other scientific applications.
Article
Full-text available
Artificial intelligence (AI) can unveil novel personalized treatments based on drug screening and whole-exome sequencing experiments (WES). However, the concept of “black box” in AI limits the potential of this approach to be translated into the clinical practice. In contrast, explainable AI (XAI) focuses on making AI results understandable to humans. Here, we present a novel XAI method -called multi-dimensional module optimization (MOM)- that associates drug screening with genetic events, while guaranteeing that predictions are interpretable and robust. We applied MOM to an acute myeloid leukemia (AML) cohort of 319 ex-vivo tumor samples with 122 screened drugs and WES. MOM returned a therapeutic strategy based on the FLT3, CBFβ-MYH11, and NRAS status, which predicted AML patient response to Quizartinib, Trametinib, Selumetinib, and Crizotinib. We successfully validated the results in three different large-scale screening experiments. We believe that XAI will help healthcare providers and drug regulators better understand AI medical decisions.
Article
Full-text available
Background While it has been known that human protein kinases mediate most signal transductions in cells and their dysfunction can result in inflammatory diseases and cancers, it remains a challenge to find effective kinase inhibitor as drugs for these diseases. One major challenge is the compensatory upregulation of related kinases following some critical kinase inhibition. To circumvent the compensatory effect, it is desirable to have inhibitors that inhibit all the kinases belonging to the same family, instead of targeting only a few kinases. However, finding inhibitors that target a whole kinase family is laborious and time consuming in wet lab. Results In this paper, we present a computational approach taking advantage of interpretable deep learning models to address this challenge. Specifically, we firstly collected 9,037 inhibitor bioassay results (with 3991 active and 5046 inactive pairs) for eight kinase families (including EGFR, Jak, GSK, CLK, PIM, PKD, Akt and PKG) from the ChEMBL25 Database and the Metz Kinase Profiling Data. We generated 238 binary moiety features for each inhibitor, and used the features as input to train eight deep neural networks (DNN) models to predict whether an inhibitor is active for each kinase family. We then employed the SHapley Additive exPlanations (SHAP) to analyze the importance of each moiety feature in each classification model, identifying moieties that are in the common kinase hinge sites across the eight kinase families, as well as moieties that are specific to some kinase families. We finally validated these identified moieties using experimental crystal structures to reveal their functional importance in kinase inhibition. Conclusion With the SHAP methodology, we identified two common moieties for eight kinase families, 9 EGFR-specific moieties, and 6 Akt-specific moieties, that bear functional importance in kinase inhibition. Our result suggests that SHAP has the potential to help finding effective pan-kinase family inhibitors.
Article
The past decade has seen significant progress in artificial intelligence (AI), which has resulted in algorithms being adopted for resolving a variety of problems. However, this success has been met by increasing model complexity and employing black-box AI models that lack transparency. In response to this need, Explainable AI (XAI) has been proposed to make AI more transparent and thus advance the adoption of AI in critical domains. Although there are several reviews of XAI topics in the literature that have identified challenges and potential research directions of XAI, these challenges and research directions are scattered. This study, hence, presents a systematic meta-survey of challenges and future research directions in XAI organized in two themes: (1) general challenges and research directions of XAI and (2) challenges and research directions of XAI based on machine learning life cycle’s phases: design, development, and deployment. We believe that our meta-survey contributes to XAI literature by providing a guide for future exploration in the XAI area.
Article
Viral diseases are a major health concern in the last few years. Antiviral peptides (AVPs) belong to a type of antimicrobial peptides (AMPs) that have the high potential to defend the human body from various viral diseases. Despite the large production of antiviral vaccination and drugs, viral infections are still a prominent human disease. The discovery of AVPs as an antiviral agent offers an effective way to treat virus-affected cells. Recently, the development of peptide-based therapeutic agents via machine learning methods is becoming a major area of interest due to its promising results. In this paper, we developed an intelligent and computationally efficient learning approach for the reliable identification of AVPs. The novel evolutionary descriptors are explored via embedding discrete wavelet transform and k-segmentation approaches into the position-specific scoring matrix. Moreover, the Shapley Additive exPlanations (SHAP) based global interpretation analysis is employed to choose optimal features by measuring the contributions of each feature in the extracted vectors. In the next phase, the selected feature spaces are examined using five different classifiers, such as XGBoost (XGB), k-nearest neighbor (KNN), Extra Trees classifier (ETC), Support Vector Machine (SVM), and Adaboost (ADA). Furthermore, to boost the discriminative power of the proposed model, the predicted labels of all classifiers are given to the optimized genetic algorithm to build an ensemble learner. Hence, our proposed study reported a higher classification rate of 97.33% and 95.57% via training samples and independent samples, respectively. Which is ∼5% improved accuracy than available predictors. It is recommended that our model will be a helpful approach for the researchers and may perform a significant role in research academia and drug development. The source code and all datasets are publicly available at https://github.com/wangphd0/pAVP_PSSMDWT-EnC.
Article
Aim: In the early stages of drug discovery, various experimental and computational methods are used to measure the specificity of small molecules against a target protein. The selectivity of small molecules remains a challenge leading to off-target side effects. Methods: We have developed a multitask deep learning model for predicting the selectivity on closely related homologs of the target protein. The model has been tested on the Janus-activated kinase and dopamine receptor families of proteins. Results & conclusion: The feature-based representation (extended connectivity fingerprint 4) with Extreme Gradient Boosting performed better when compared with deep neural network models in most of the evaluation metrics. Both the Extreme Gradient Boosting and deep neural network models outperformed the graph-based models. Furthermore, to decipher the model decision on selectivity, the important fragments associated with each homologous protein were identified.
Article
Motivation: The tremendous success of graphical neural networks (GNNs) already had a major impact on systems biology research. For example, GNNs are currently being used for drug target recognition in protein-drug interaction networks, as well as for cancer gene discovery and more. Important aspects whose practical relevance is often underestimated are comprehensibility, interpretability and explainability. Results: In this work, we present a novel graph-based deep learning framework for disease subnetwork detection via explainable GNNs. Each patient is represented by the topology of a protein-protein interaction (PPI) network, and the nodes are enriched with multi-omics features from gene expression and DNA methylation. In addition, we propose a modification of the GNNexplainer that provides model-wide explanations for improved disease subnetwork detection. Availability and implementation: The proposed methods and tools are implemented in the GNN-SubNet Python package, which we have made available on our GitHub for the international research community (https://github.com/pievos101/GNN-SubNet). Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Exploring efficient and high-accuracy computational drug repositioning methods has become a popular and attractive topic in drug development. This technology can systematically identify potential drug-disease interactions, which could greatly alleviate the pressures from the high cost and long period taken by traditional drug research and discovery. However, plenty of current computational drug repositioning approaches lack interpretability in predicting drug-disease associations, which will not be friendly to their subsequent in-depth research. To this end, we hereby propose a novel computational framework, called EDEN, for exploring explainable drug repositioning from the disease information network (DIN). EDEN is a graph neural network framework that learns the local semantics and global structure of the DIN, and models the drug-disease associations into the DIN by maximizing the mutual information of both and an end-to-end manner. In this way, the learned biomedical entity and link embeddings are enabled to retain the ability to drug repositioning with the semantical structure of external knowledge, thereby making interpretation possible. Meanwhile, we also propose a matching score based on the final embeddings to generate the predictive drug repositioning explanation. Empirical results on the real-world dataset show that EDEN outperforms other state-of-the-art baselines on most of the metrics. Further studies reveal the effectiveness of the explainability of our approach.
Article
Unnecessary/unsafe opioid prescribing has become a major public health concern in the U.S. Statewide prescription drug monitoring programs (PDMPs) with varying characteristics have been implemented to improve safe prescribing practice. Yet, no studies have comprehensively evaluated the effectiveness of PDMP characteristics in reducing opioid-related potentially inappropriate prescribing (PIP) practices. The objective of the study is to apply machine learning methods to evaluate PDMP effectiveness by examining how different PDMP characteristics are associated with opioid-related PIPs for non-cancer chronic pain (NCCP) treatment. This was a retrospective observational study that included 802,926 adult patients who were diagnosed NCCP, obtained opioid prescriptions, and were continuously enrolled in plans of a major U.S. insurer for over a year. Four outcomes of opioid-related PIP practices, including dosage ≥50 MME/day and ≥ 90 MME/day, days supply ≥7 days, and benzodiazepine-opioid co-prescription were examined. Machine learning models were applied, including logistic regression, least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient boost modeling (GBM). The SHapley Additive exPlanations (SHAP) method was applied to interpret model results. The results show that among 1,886,146 NCCP opioid-related claims, 22.8% had an opioid dosage ≥50 MME/day and 8.9% ≥90 MME/day, 70.3% had days supply ≥7 days, and 10.3% were when benzodiazepine was filled ≤7 days ago. GBM had superior model performance. We identified the most salient PDMP characteristics that predict opioid-related PIPs (e.g., broader access to patient prescription history, monitoring Schedule IV controlled substances), which could be informative to the states considering the redesign of PDMPs.