ArticlePDF Available

An optimal Bayesian intervention policy in response to unknown dynamic cell stimuli

Authors:
An Optimal Bayesian Intervention Policy in Response
to Unknown Dynamic Cell Stimuli
Seyed Hamid Hosseini, Mahdi Imania
aNortheastern University, 360 Huntington Ave, Boston, MA, 02115, U.S.
Abstract
Interventions in gene regulatory networks (GRNs) aim to restore normal functions
of cells experiencing abnormal behavior, such as uncontrolled cell proliferation.
The dynamic, uncertain, and complex nature of cellular processes poses signifi-
cant challenges in determining the best interventions. Most existing intervention
methods assume that cells are unresponsive to therapies, resulting in stationary
and deterministic intervention solutions. However, cells in unhealthy conditions
can dynamically respond to therapies through internal stimuli, leading to the re-
currence of undesirable conditions. This paper proposes a Bayesian intervention
policy that adaptively responds to cell dynamic responses according to the latest
available information. The GRNs are modeled using a Boolean network with per-
turbation (BNp), and the fight between the cell and intervention is modeled as a
two-player zero-sum game. Assuming an incomplete knowledge of cell stimuli,
a recursive approach is developed to keep track of the posterior distribution of
cell responses. The proposed Bayesian intervention policy takes action accord-
ing to the posterior distribution and a set of Nash equilibrium policies associated
with all possible cell responses. Analytical results demonstrate the superiority of
the proposed intervention policy against several existing intervention techniques.
Meanwhile, the performance of the proposed policy is investigated through com-
prehensive numerical experiments using the p53-MDM2 negative feedback loop
regulatory network and melanoma network. The results demonstrate the empirical
convergence of the proposed policy to the optimal Nash equilibrium policy.
Keywords: Gene Regulatory Networks, Two-Player Zero-Sum Game, Bayesian
intervention, Boolean networks, Nash Equilibrium.
1. Introduction
Preprint submitted to Information Science March 9, 2024
Recent genomics advances have deepened our understanding of complex bi-
ological systems, particularly gene regulatory networks (GRNs) [1, 2, 3, 4, 5, 6].
GRNs consist of several interacting genes whose activities control cellular pro-
cesses, including DNA repair, stress response, and complex diseases like can-
cer [7]. In genomics intervention, the objective is to design effective intervention
strategies that can alter the undesirable behavior of unhealthy cells (e.g., those
associated with chronic diseases) and shift them into desirable ones.
Boolean networks have emerged as a powerful class of models for character-
izing the temporal dynamics of GRNs [8, 9, 10, 11, 12, 13]. Several interven-
tion strategies have been developed for Boolean network models in recent years.
These include structural interventions, which aim to make a single-time, long-
lasting change in the interaction between two or more genes [14, 15, 16, 17, 18],
and dynamic interventions that perturb (e.g., overexpress or suppress) the activity
of targeted genes over time [14, 15, 16, 17]. The most well-known method is the
optimal stationary intervention derived in [19], which is later extended to include
constraints [20, 21] and asynchronicity of the GRNs [22, 13]. Meanwhile, several
intervention approaches are developed for GRNs with states observed indirectly
through gene-expression data [23, 24, 25, 26, 27, 28], including robust interven-
tion methods for domains with partially-known dynamics and costs [29, 30].
Most existing intervention methods are built on the assumption that cells are
isolated and non-responsive to therapies. However, the dynamic and intelligent
responses of cells to therapies, triggered by internal stimuli, often result in the
short-term success of interventions at early stages and the recurrence of the un-
healthy condition afterward. This paper models GRNs using Boolean networks
with perturbation (BNp) [31, 32], and models the cell dynamic responses to inter-
ventions through a two-player zero-sum game [33, 34, 35]. There are two players
in the game: the cell and the intervention, each with opposing goals. The cell
aims to maintain the cell condition in unhealthy states using its internal stimuli,
while the intervention’s objective is to deviate the system from unhealthy condi-
tions through therapies. Assuming incomplete information about the possible cell
responses to interventions, this paper develops a recursive method for computing
the posterior distribution of the cell responses. Given the quantified uncertainty
in cell responses, we develop a Bayesian intervention policy. The proposed pol-
icy utilizes the combination of the Nash equilibrium policies for different cell
responses and the posterior associated with them. The policy is fully adaptive;
as new data appears, the posterior distribution of cell responses and the proposed
intervention policy are updated.
The main contributions of this paper are as follows:
2
Modeling the aggressive and dynamic responses of unhealthy cells during
the intervention process, which enables deriving intervention solutions by
accounting for and predicting possible cell responses to therapies.
Develop an adaptive Bayesian intervention policy that can probabilistically
reason about cell responses and incorporate such knowledge to make better
intervention decisions.
Analytically demonstrating the superiority of the proposed policy compared
to existing intervention methods, along with numerical results indicating the
empirical convergence of the proposed policy to the optimal Nash policy.
We analyze the performance of the proposed intervention policy using the
p53-MDM2 and melanoma networks. The p53-MDM2 network is a crucial reg-
ulatory system that responds to cellular stresses such as DNA damage [36, 37].
The melanoma regulatory network also plays a crucial role in the development
and progression of melanoma, a highly aggressive form of skin cancer [21, 38].
Through a comprehensive set of numerical experiments using these two networks,
we compare the performance of the proposed policy with state-of-the-art interven-
tion methods.
The article is organized as follows: The GRN model is briefly described in
Section 2. Section 3 includes formulating the intervention process as a two-player
zero-sum game, followed by the optimal Nash equilibrium policy for a two-player
zero-sum game. The proposed Bayesian intervention policy and its matrix-form
implementation are presented in Sections 4 and 5, respectively. The analytical and
numerical results are presented in Section 6 and Section 7, respectively. Finally,
Section 8 contains the concluding remarks.
2. Background
In this paper, a Boolean network with perturbation model [32, 39] is used
to capture the dynamics of gene regulatory networks. The BNp model effec-
tively incorporates the stochastic nature of GRNs and accounts for the uncertainty
coming from unmodeled parts of the systems. Consider a GRN consisting of d
components. The state process can be represented as {xk;k= 0,1, . . .}, where
xk {0,1}ddenotes the activation or inactivation state of the genes at time k.
The genes’ state is influenced by a series of internal and external inputs/stimuli.
At each discrete time point, the state of the genes evolves according to the follow-
ing Boolean signal model [40]:
xk=f(xk1)ak1uk1nk, k = 1,2, . . . , (1)
3
where {ak;k= 0,1, ...}refers to a set of external interventions/therapies, {uk;k=
0,1, ...}represents internal inputs regulated by the cell, nk {0,1}drepresents
Boolean transition noise at time k, "" denotes component-wise module-2 addi-
tion, and fis the network function. The noise value nk(j)=1alters the state of
the jth gene at time step k; whereas for nk(j)=0, the jth state follows the value
predicted by the network function. The noise process nkis assumed to have inde-
pendent components modeled by a Bernoulli distribution with parameter p > 0.
The Bernoulli parameter prepresents the noise intensity, with higher values rep-
resenting more chaotic systems and smaller values indicating nearly deterministic
models. Note that the rest of the paper is applicable to a general class of Boolean
network models of the form f(xk1,ak1,uk1,nk).
The network function in GRNs is often represented through a Boolean logic
model or a pathway diagram model [41, 40]. The Boolean logic model captures
the genes’ activities and interactions using logical operators such as AND, OR,
XOR, and NOT, while the pathway diagram model parameterizes suppressive and
activating interactions among genes to capture their dynamics. These models have
shown success in capturing the temporal changes in gene activities and causal
interactions among genes.
3. Battle of Cell and Intervention
3.1. Two-Player Zero-Sum Game
We represent the battle between the cell and intervention as a two-player zero-
sum game [42, 33, 34, 35]. This can be characterized by a tuple ⟨X ,A,U, Ra, T ,
where X={0,1}dis the state space,Ais the intervention space,Uis the cell
control space,Rais the intervention reward function, and Tis the state transition
probability function.T:X ×A×U×X is such that p(x|x,a,u)represents
the probability of moving to state xaccording to the external and internal inputs
aand uin state x. Also, Ra(x,a,u,x)denotes the immediate intervention re-
ward gained if the system moves from state xto state xaccording to the joint
intervention and cell actions (a,u).
3.2. Optimal Nash Intervention Policy under Known Cell Responses
The diagram representing the fight between cell and intervention is shown
in Fig. 1. For cells in cancerous conditions, the intervention objective is to de-
crease cell proliferation, whereas cells aim to increase such proliferation by fight-
ing against interventions. The opposite objectives of the intervention and cell can
4
be expressed by the cell reward Rutaking the negative of the intervention reward,
i.e., Ru(x,a,u,x) = Ra(x,a,u,x).
Figure 1: The fight between intervention and the cell dynamic response according to its
internal stimuli.
This paper focuses on stationary Markov Nash equilibria in GRNs modeled by
the infinite-horizon discounted Markov game. Let Ucontain a finite set of stim-
uli/actions that the cell could perform during the intervention process against ther-
apies. Let also Abe the set of actions/therapies available during the intervention
process. We define the intervention policy πa(a|x), representing the probabil-
ity of taking action a A in any given state x X . Similarly, the cell policy
πu(u|x)specifies the probability of selecting input u U in state x X . For
the joint stochastic policy (πa, πu), the expected value function of intervention
and cell can be defined as:
Va
πau(x) = EX
t0
γtRa(xt,at,ut,xt+1)|a0:πa,u0:πu,x0=x,
Vu
πau(x) = EX
t0
γtRu(xt,at,ut,xt+1)|a0:πa,u0:πu,x0=x,
(2)
for x X ; where 0< γ < 1is a discount factor that prioritizes the early-
stage rewards compared to future ones. Given that cell and intervention reward
5
functions are negative of each other, we have Va
πau(x) = Vu
πau(x), for any
x X . Due to the interplay between state values for the cell and intervention, this
problem differs from a Markov decision process (MDP). The optimal solution for
a two-player zero-sum game can be expressed through the Markov game. This is
expressed as the optimal Nash equilibrium policy π= (πa
, πu
), which for any
joint policy π= (πa, πu)and x X satisfies [33]:
Va
πa
u
(x)Va
πau
(x)and Vu
πa
u
(x)Vu
πa
u(x).(3)
The optimal Nash equilibrium policy is the policy that the cell and intervention
have no motivation to deviate from it. This policy can be expressed according to
the min-max theorem as [43]:
(πa
, πu
) = argmax
πa
argmin
πu
Va
πau(x) = argmin
πu
argmax
πa
Va
πau(x),for all x X .
(4)
Based on equation (2), any pair of (πa, πu)that achieves the supremum and infi-
mum values in equation (4) forms an optimal Nash equilibrium.
4. Bayesian Intervention Policy under Unknown Cell Responses
4.1. Intervention Challenges of Unknown Cell Space
If the cell space U, representing the internal cell stimuli, is fully known, then
the optimal Nash policy could be achieved as a solution for the optimization in
(4). However, in practice, the cell’s internal stimuli are often unknown, preventing
the computation of the optimal Nash policy. Therefore, this paper aims to de-
rive an effective intervention policy that can be implemented despite incomplete
knowledge about cell space. We present a systematic approach to probabilistically
reason about the possible cell responses using the latest available data and use this
knowledge for effective intervention selection.
Let U1, ...., UMbe the set of all possible cell spaces. This set depends on the
size of the regulatory networks and the prior biological knowledge regarding the
cell responses. Given a regulatory network consisting of dgenes, there are 2dpos-
sible cell actions. In this case, there are 2d
1cell spaces containing 1cell actions,
2d
2sets with 2cell actions, and 2d
msets containing mcell actions. This set can
be large in large regulatory networks, but as described in the following paragraph,
the posterior of many models approach zero as more data are observed.
If Uiis the true cell space, the optimal space-specific Nash policy can be ex-
pressed as (πa,Ui
, πu,Ui
), where this policy can be computed using the optimization
6
problem in (4) corresponding to the cell space Ui. The Nash policy obtained under
cell space Uimight significantly differ from Uj=Ui. Thus, given the limited or
no knowledge about the true cell space, the space-specific intervention policies are
not directly implementable. In fact, executing a wrong (non-optimal) intervention
policy corresponding to Uj=Ucould lead to poor intervention performance and
the dominance of the cell.
4.2. Probability Model over Cell Spaces
This paper constructs a probabilistic model over the cell spaces. Let p0(i)be
the prior probability of the ith cell space Ui. The prior information about the set
of cell spaces can be represented in a single vector as:
p0= [P(U1), ..., P (UM)]T.(5)
If no prior biological knowledge about cell space is available, a uniform prior can
be considered over the cell spaces, i.e., p0= [1/M, ..., 1/M].
Let pk1= [pk1(1), ..., pk1(M)] be the posterior probability over the cell
spaces obtained according to the sequence of observed states x0:k1obtained upon
taking interventions a0:k2. If intervention ak1is taken at time step k1and the
state xkis observed at time step k, the posterior probability of the cell spaces at
time step kcan be expressed as:
pk(i) = P(U=Ui|a0:k1,x0:k)
=p(xk,Ui|a0:k1,x0:k1)
p(xk|a0:k1,x0:k1)
=P(xk|a0:k1,x0:k1,Ui)P(U=Ui|a0:k2,x0:k1)
PM
j=1 P(xk|a0:k1,x0:k1,Uj)P(U=Uj|a0:k2,x0:k1)
=p(xk|a0:k1,x0:k1,Ui)pk1(i)
PM
j=1 p(xk|a0:k1,x0:k1,Uj)pk1(j),
(6)
for i= 1, ..., M . The numerator term in (6) specifies the probability of observing
the next state xkgiven the sequence of interventions and states and the cell space
Ui. Further simplification of this term through marginalization of the joint distri-
bution of the state xkand the unobserved cell action uk1at time step kleads
7
to:
p(xk| Ui,a0:k1,x0:k1) = X
u∈Ui
p(xk,uk1=u| Ui,a0:k1,x0:k1)
=X
u∈Ui
p(xk|uk1=u,ak1,xk1)p(uk1=u| Ui,xk1)
=X
u∈Uip
1p||f(xk1)ak1uxk||1
(1 p)dπu,Ui
(u|xk1),
(7)
where πu,Ui
(uk1=u|xk1) = p(uk1=u| U=Ui,xk1)is the probability
that cell takes action uk1=uat state xk1if the true cell action space is Ui. The
first line in the last expression in (7) is obtained using the Markovian properties
of the state transition and the Bernoulli process noise. Replacing (7) into (6), the
posterior probability of the cell space can be recursively computed using the last
taken intervention and observed state.
4.3. Bayesian Intervention Policy
Let pkbe the posterior probability over the cell spaces obtained according to
the states x0:kand the sequence of intervention a0:k1. The proposed Bayesian
intervention policy at time step kcan be expressed as:
µa,B
k(a|xk) : = p(ak=a|a0:k1,x0:k)
=
M
X
i=1
p(ak=a,U=Ui|a0:k1,x0:k)
=
M
X
i=1
p(ak=a| Ui,a0:k1,x0:k)p(U=Ui|a0:k1,x0:k)
=
M
X
i=1
p(ak=a| Ui,a0:k1,x0:k)pk(i)
=
M
X
i=1
πa,Ui
(a|xk)pk(i),
(8)
for a A; where the cell space is augmented and marginalized out in the second
line. One can see that if the uncertainty over the cell spaces goes to zero, the
Bayesian policy µa,B(.|xk)becomes the optimal Nash equilibrium policy under
the known cell space πa,U(.|xk).
8
The Bayesian policy in (8) is stochastic and provides the best intervention
solution given the available data. Let {u1, ..., uN}be all unique cell actions in the
set of cell spaces, i.e., {u1, ..., uN}=U1... UM {0,1}d. The Bayesian
modeling of the cell defense policy at time step kcan be expressed as:
µu,B
k(u|xk) = p(uk=u|a0:k1,x0:k)
=
M
X
i=1
p(uk=u,U=Ui|a0:k1,x0:k)
=
M
X
i=1
p(uk=u| Ui,a0:k1,x0:k)p(U=Ui|a0:k1,x0:k)
=
M
X
i=1
p(uk=u| Ui,a0:k1,x0:k)pk(i)
=
M
X
i=1
πu,Ui
(u|xk)pk(i),
(9)
for u {u1, ..., uN}. Note that the cell defense response in (9) represents the
intervention belief about the cell policy since the cell performs the optimal Nash
policy corresponding to the true cell space.
The Bayesian policy in (8) yields the optimality with respect to the posterior
distribution of the cell spaces. The schematic diagram of the proposed Bayesian
intervention policy is shown in Fig. 2. As the next intervention is performed and
the next state is observed, the posterior distribution over the cell spaces becomes
updated, and the optimal Bayesian policy can also be computed according to the
new posterior and the next observed state. The analysis of the proposed Bayesian
policy and its comparison with the state-of-art intervention policies are described
in Section 6.
5. Matrix-Form formulation of the Proposed Bayesian Intervention Policy
This section provides an efficient and recursive computation of the proposed
Bayesian intervention policy. The process is divided into offline and online steps.
The offline step consists of computing the space-specific optimal Nash policies
associated with all cell spaces. Upon termination of the offline step, the online
step computes the posterior distribution of all cell spaces given the last observed
state, followed by the calculation of the Bayesian intervention policy. The details
of these two steps are outlined below.
9
Figure 2: The schematic diagram of the proposed Bayesian intervention policy.
5.1. Offline Step Computation
The offline step computes the optimal space-specific optimal Nash equilibrium
policy for all cell spaces, i.e., {U1, ..., UM}. This is achieved according to the
value iteration method for a two-player zero-sum game [33]. For the ith cell space
Ui, we define the state joint-action value function for any state value function
V:X Ras:
Qa,Ui
V(x,a,u) = ExP(.|x,a,u)[Ra(x,a,u,x) + γV(x)] ,(10)
for x X ,a A and u U i.Qa,Ui
V(x, ., .)can be seen as a matrix in R|A|×|Ui|,
with elements representing the expected discounted accumulated rewards for the
intervention when the joint actions (a,u)are performed at state xand the policy
associated with the state value function Vis followed.
We define the joint-action transition matrix associated with (a,u)in R2d×2d
as:
(M(a,u))lj =Pxk=xj|xk1=xl,ak1=a,uk1=u
=p||f(xl)auxj||1(1 p)d−||f(xl)auxj||1,
(11)
for l, j = 1,...,2d,a A, and u Ui, where ||.||1is the absolute L-1 norm of
a vector. Under zero noise and stochasticity, f(xl)aurepresents the state
of genes in the next time step. Thus, ||f(xl)auxj||1counts the number
of flips caused by the noise once the system moves from state xlto state xj. The
transition probability in (11) is computed based on the noise characteristics for
each variable, modeled as independent Bernoulli variables with parameter p.
10
The matrix form representation of the intervention reward function associated
with aand ucan be expressed as:
(Ra(a,u))lj =Raxl,a,u,xj,for l, j = 1, ..., 2d.(12)
The expected intervention reward in state xlafter taking actions (a,u)and before
observing the next state can be computed as:
Ra(xl,a,u) = Ex|x,a,u[Ra(xl,a,u,x)]
=
2d
X
j=1
P(xk=xj|xk1=xl,ak1=a,uk1=u)Ra(xl,a,u,xj),
(13)
for l= 1, .., 2d. The expected reward in (13) can be rewritten according to (11)
and (12) as:
Ra(xl,a,u) =
2d
X
j=1
(Ra(a,u))lj (M(a,u))lj .(14)
We define the expected intervention reward function in a vector form as Ra
a,u=
[Ra(x1,a,u),· · · , Ra(x2d,a,u)]T. This vector can be computed using the follow-
ing matrix-form computation:
Ra
a,u= (Ra(a,u)M(a,u)) 12d×1,(15)
for a A and u U i; where 12d×1is a vector of size 2dwith all elements 1, and
is the Hadamard product.
According to the controlled transition matrix M(a,u)and the vector-form
reward function Ra
a,u, the Q-values defined in (10) can be calculated as:
Qa,Ui
V(x1,a,u)
.
.
.
Qa,Ui
V(x2d,a,u)
=Ra
a,u+γM (a,u)V,(16)
for a A and u U iand any given state value function V.
Let πabe 2d-simplex of size A, and πube 2d-simplex of size Ui. Consider
Qa,Ui
V(x, ., .)as the payoff matrix in a matrix form zero-sum game. We define the
Bellman operator Tfor any x X as [33]:
(T[V])(x) = Value[Qa,Ui
V(x, ., .)]
= max
πamin
πuX
a∈A X
u∈Ui
πa(a|x)πu(u|x)Qa,Ui
V(x,a,u),(17)
11
which should meet the condition Pa∈A πa(a|x) = Pu∈Uiπu(u|x)=1. The
solution for the min-max optimization in (17) can be obtained using a linear pro-
gramming technique.
The Bellman operator Tis a γ-contractive in the L-norm, and the exclu-
sive solution to the Bellman equation corresponds to the optimal value function,
denoted as V=T[V][33]. This fixed-point solution represents an optimal
Nash equilibrium for the Markov game, associated with the cell space Ui. There-
fore, starting from any arbitrary V, we can repeatedly apply Vt+1 =T[Vt]for
t= 0,1, ..., and compute a fixed point solution for the value vector.
Let V0= [0,· · · ,0]Tdenote the initial value vector with all elements set to
0. During the rth iteration of the value iteration method, the new vector Vr+1 is
obtained upon performing the Bellman operator to the previous value Vras:
Vr+1(xl) = Value[Qa,Ui
Vr(xl, ., .)],for l= 1, ..., 2d,(18)
where Qa,Ui
Vr(xl, ., .)consists of Q-values for all joint pairs of (a,u). In practice,
the iterations continue until the maximum difference between the elements of the
value vectors in two consecutive iterations becomes smaller than a predetermined
threshold ϵ > 0, expressed as:
max
l∈{1,..,2d}|VT(l)VT1(l)|< ϵ.
Let VT=Vbe the fixed-point solution after conducting the value iteration
method. The Q-values associated with Vcan be computed as:
Qa,Ui
V(x1,a,u)
.
.
.
Qa,Ui
V(x2d,a,u)
=Ra
a,u+γM (a,u)V,for a A,u Ui.(19)
After computation of the optimal Q-values, the optimal policy for intervention and
cell can be calculated as:
πa,Ui
(.|x), πu,Ui
(.|x)= argmax
πa
argmin
πuX
a∈A X
u∈Ui
πa(a|x)πu(u|x)Qa,Ui
V(x,a,u),
(20)
for any x X , where πa,Ui(a|x)and πa,Ui(u|x)are non-negative numbers that
add up to 1for any x X . The solution to the Nash equilibrium policy in (20) can
be obtained using a linear programming technique. Repeating the above process
for all cell spaces leads to the computation of the space-specific Nash policies in
the offline step.
12
Algorithm 1 Bayesian Intervention Policy
1: Intervention space A; Cell spaces, U1,...,UM; intervention reward (Ra(a,u))lj =
Ra(xl,a,u,xj); controlled transition matrix M(a,u); threshold ϵ > 0.
Offline Step
2: for Ui {U1,...,UM}do
3: Set V=02d×1.
4: repeat
5: V=V.
6:
Qa,Ui
V(x1,a,u)
.
.
.
Qa,Ui
V(x2d,a,u)
=(Ra(a,u)M(a,u)) 12d×1+γM (a,u)V,for a
Aand u Ui.
7: Bellman Operator: V(xl) = Value[Qa,Ui
V(xl, ., .)], for l= 1,...,2d Eq.
(17)
8: until maxl∈{1,...,2d}|V(xl)V(xl)|< ϵ
9: For any given x X , use linear programming approach over Qa,Ui
V(x, ., .)to
obtain πa,Ui
(.|x)and πu,Ui
(.|x).
10: end for
Online Step
11: Initial state x0, and initial probability of cell space: p0= [P(U1), . . . , P (UM)].
12: for k= 0,1,2,...,do
13: Compute Bayesian Intervention µa,B
k(a|xk) = PM
i=1 πa,Ui
(a|xk)pk(i),a
A, and select action accordingly: akµa,B
k(.|xk).
14: Apply the intervention akand receive the next system state, xk+1.
15: Posterior Update:
pk+1(i) = Pu∈Uip
1p||f(xk)akuxk+1||1πu,Ui
(u|xk)pk(i)
PM
j=1 Pu∈Ujp
1p||f(xk)akuxk+1||1πu,Uj
(u|xk)pk(j)
, i = 1, . . . , M.
16: end for
13
5.2. Online Step Computation
This section describes a recursive and online computation of the Bayesian in-
tervention policy, obtained according to the space-specific Nash equilibrium poli-
cies computed during the offline step. Let pkcontain the posterior probability of
the cell spaces and xkbe the system state at time step k. An intervention at time
step kcan be selected according to the Bayesian policy in (8) as:
akµa,B
k(.|xk),(21)
where
µa,B
k(a|xk) =
M
X
i=1
πa,Ui
(a|xk)pk(i),for a A.(22)
Upon performing the intervention akand observing the next state xk+1, the pos-
terior distribution of the cell spaces can be updated using (6) and (7) as:
pk+1(i) = Pu∈Uip
1p||f(xk)akuxk+1||1πu,Ui
(u|xk)pk(i)
PM
j=1 Pu∈Ujp
1p||f(xk)akuxk+1||1πu,Uj
(u|xk)pk(j)
,(23)
for i= 1, ..., M .
The diagram in Fig. 3 represents the processes of the computation of the pro-
posed intervention policy in the offline and online steps. Algorithm 1 provides the
details of the computations in both steps. Meanwhile, the complexity of each step
is provided in Table 1. The offline step has a computational complexity of order
O(22d× |A| × maxi=1,...,M |Ui| × L), where 22dis due to the transition matrices
involved, Lrepresents the number of steps of the value iteration method before
termination, |A| is the size of intervention space, and the |Ui|is the size of the
ith cell space. In the online step, the computation of the Bayesian intervention
has the complexity of order O(M), whereas the posterior update’s complexity is
of order O(M×maxi=1,...,M |Ui|). Overall, the complexity of the online step is
significantly lower than that of the offline step, enabling a recursive computation
of the proposed intervention policy.
6. Performance Analysis and Comparison with State-of-Art Methods
This section analyzes the performance of the proposed Bayesian intervention
policy with the system under no intervention and some of the existing interven-
tion policies. First, consider a system with no intervention under the aggressive
14
Figure 3: The schematic diagram of processes in the offline and online steps of the pro-
posed Bayesian intervention policy.
Table 1: Computational complexity of the proposed Bayesian intervention policy.
Offline Step (Cell Space Ui) Bayesian Intervention Posterior Update
O(22d× |A| × |Ui| × Li)O(M)O(M×max{|U1|, ..., |U M|})
response of cells, e.g., representing uncontrolled cancerous conditions. The best
cell policy under no intervention is deterministic. Let πu:X Ube a determin-
istic cell policy, which maps a cell action in Uto each system state. The optimal
cell response under no intervention can be computed as:
πu,a=0
(x) = argmin
πu
E
X
t=0
γtRa(xt,at=0,ut,xt+1)|x0=x,u0:πu,
(24)
where πu(U)2dand the minimization is used since the reward of the interven-
tion is negative of the cell reward function. The steady-state probability under no
intervention can be expressed as:
Π
a=0(j) = lim
k→∞ P(xk=xj|u0:πu,a=0
,a0:=0),(25)
for j= 1...., 2d. One can see Π
a=0as a long-term probability of the visitation of
various states under no intervention.
15
Most conventional intervention methods assume non-responsive cells [19],
wherein cells lack defense mechanisms to counteract interventions (i.e., U={}).
In this scenario, the Markov game can be represented by an MDP with a single
agent/player, and since the intervention is driven by no competition with cell re-
sponses assumption, the optimal intervention policy becomes deterministic. This
policy can be expressed as:
πa,u=0
(x) = argmax
πa
E
X
t=0
γtRa(xt,at,ut=0,xt+1)|x0=x,a0:πa,
(26)
where the maximization is over all deterministic intervention policies, i.e., (A)2d.
The cell’s aggressive response to the naive and deterministic intervention in (26)
can be expressed as:
πu,πa,u=0
(x) = argmin
πu
E
X
t=0
γtRa(xt,at,ut,xt+1)|x0=x,
a0:πa,u=0
,u0:πu,for x X .
(27)
The expected value function for the intervention under no cell response policy in
(26) and (27) can be expressed through Va
πa,u=0
u,πa,u=0
. The intervention gain
obtained under this policy compared to no intervention case can be expressed as:
Va
πa,u=0
u,πa,u=0
(x)Va
0u,a=0
(x)0,(28)
for any x X . The positivity of the difference in the state values indicates that the
intervention helps the system to experience less undesirable conditions, compared
to cases with no intervention. Meanwhile, the comparison with the optimal Nash
policy (πa,U
, πu,U
)can be achieved as:
Va
πa,u=0
u,πa,u=0
(x)Va
πa,u=0
u,U
(x)Va
πa,U
u,U
(x),(29)
for any x X , where the inequalities are obtained due to the fact that devia-
tion of the intervention from the optimal Nash policy leads to a reduction in the
intervention performance (see (3)). More specifically, if the intervention policy
deviates from the Nash policy, the cell can take advantage of this and further shift
the system toward undesirable conditions. Note that the conventional intervention
16
policies can achieve the same performance level as the optimal Nash policy if and
only if the optimal Nash policy is deterministic, i.e., πa,U
(πa,u=0
(x)|x)=1, for
all x X .
In this part, the difference between the state-value function of the proposed
Bayesian intervention policy and the optimal Nash policy is investigated. The
proposed Bayesian policy is adaptive, meaning that its policy becomes updated
according to the latest observed states. We represent the Bayesian policy after
time step kas µa,B
k:= [µa,B
k, µa,B
k+1, ...], where µa,B
k+1 yields optimality with respect
to the information up to time step k+ 1. Thus, we can express the difference be-
tween the state-value functions of the proposed Bayesian policy and the optimal
Nash policy as:
Va
µa,B
k:a,U
(xk)Va
πa,U
u,U
(xk)0.(30)
It can be shown that the state value function of the Bayesian policy becomes close
to the optimal Nash policy as time progresses. In fact, for a sufficiently large value
of k, the posterior distribution over the cell spaces is expected to become peaked
over the true cell space, and according to (8), the Bayesian policy becomes the
same as the optimal Nash policy. In particular, the difference between the pro-
posed Bayesian policy at time step kand the optimal Nash policy can be expressed
as follows:
KL(πa,U
(.|xk), µa,B
k(.|xk)) = X
a∈A
πa,U
(a|xk) log πa,U
(a|xk)
µa,B
k(a|xk)
=X
a∈A
πa,U
(a|xk)hlog πa,U
(a|xk)log µa,B
k(a|xk)i,
(31)
where KL indicates the Kullback-Leibler divergence. The KL approaches zero if
the posterior peaks over a single cell space (i.e., the true cell space). Finally, unlike
existing deterministic intervention policies, the stochastic nature of the proposed
policy aligns with the stochastic nature of the optimal Nash policy. This stochastic-
ity prevents the cell from predicting a single deterministic intervention in different
cases, helping to ensure short-term and long-term success during the intervention
process.
7. Numerical Experiments
In this section, the performance of the proposed intervention policy is assessed
through two well-known gene regulatory networks: the p53-MDM2 Boolean net-
work model and the melanoma regulatory network.
17
7.1. P53-MDM2 Negative Feedback Loop Network
This paper utilizes a simplified p53-MDM2 Boolean network [44] with DNA
double-strand break (DNA-DSB) for the experiment. This network has been widely
studied for assessing the performance of various intervention policies. The p53
tumor suppressor is a crucial transcription factor that regulates essential cellular
processes, including DNA repair, cell cycle control, apoptosis, angiogenesis, and
senescence [45]. Fig. 4(a) illustrates the diagram of this network, where solid and
blunt arrows indicate activating and suppressive interactions, respectively. The
network consists of four genes: ATM, p53, WIP1, MDM2, and DNA-DSB, which
is an external stress to the cell. The system state is represented using the following
vector: xk= [ATMk,p53k,WIP1k,MDM2k]. The Boolean model described in (1)
represents the state transition of the healthy system as:
xk=
0 0 1 0
+1 0 11
0 +1 0 0
1 +1 +1 0
xk1+
dna_dsb
0
0
0
ak1uk1nk,(32)
where vis a function that maps the element of the vector vgreater than 0 to 1 and
others to 0.
Figure 4: (a) The pathway diagram for the p53-MDM2 Boolean network. (b) The aver-
age reward gained by the Bayesian intervention policy, naive intervention policy, and the
Baseline. (c) The average absolute difference of the rewards.
In cells under normal conditions, the stress response is zero (i.e., dna_dsb =
0), whereas under stressed conditions, the stress is present (i.e., dna_dsb = 1).
For no stressed cells, the genes’ states are mostly at rest, i.e., the system remains
in the "0000" state. In stressed conditions, the activation and inactivation of p53
18
help the system control the genes’ activities and cell proliferation. However, when
p53, a tumor suppressor gene, undergoes a loss of function, other genes can exhibit
excessive activations and cell proliferation, leading to transitioning from a healthy
to a cancerous condition.
The cell defensive responses are modeled using single-gene and double-gene
perturbations. This represents realistic situations in which cells have the capability
to respond to therapies by altering the states of multiple genes simultaneously.
Therefore, the possible cell responses can be expressed through the following 7
actions:
u1= [0 0 0 0]T,u2= [1 0 0 0]T,u3= [0 0 1 0]T,u4= [0 0 0 1]T,
u5= [1 0 1 0]T,u6= [1 0 0 1]T,u7= [0 0 1 1]T.(33)
The cell might utilize one or multiple stimuli in response to interventions. In our
experiment, we consider the following cell space to be true but unknown:
U={u2,u6},(34)
where u2alters the state value of ATM, and u6simultaneously alters the state of
ATM and MDM2.
Toward modeling the possible cell spaces, we consider cell spaces to contain
any subset of one, two, and three elements from the above 7 possible cell actions
in (33). This leads to M=7
1+7
2+7
3= 63 possible cell spaces. Among them,
7 contain a single action, denoted by U1to U7, 21 contain two actions indicated
by U8to U28, and 35 consist of 3 actions, indicated by U29 to U63. Note that the
true cell space in (34) is the 17th space (i.e., U=U17), which is unknown during
the intervention.
The space of intervention (i.e., drugs/therapies) is assumed to be:
A={a1= [0 0 0 0]T,a2= [1 0 0 0]T,a3= [0 0 0 1]T},(35)
where the first intervention a1corresponds to no therapy, whereas the second and
third interventions alter the state value of the ATM and MDM2 genes, respectively.
Intervention aims to reduce cell proliferation in cancerous situations and re-
store the system to a normal condition. For the p53-MDM2 network, this can be
achieved by reducing the activation of ATM, WIP1, and MDM2. This can be ex-
pressed through the following intervention reward function:
Ra(x,a,u,x) = x(1) x(3) x(4).(36)
19
The activation of each of ATM, MDM2, and WIP1 yields a negative reward of
-1, resulting in an immediate reward ranging from -3 to 0. The objective of the
intervention is to maximize cumulative intervention rewards by maintaining ATM,
MDM2, and WIP1 in an inactivated state. Conversely, the cell with the opposing
reward seeks to increase the activation of these genes and drive the system closer
to states leading to uncontrolled cell proliferation.
We consider the optimal Nash policy associated with true cell space (i.e.,
πu,U
, and πa,U
) as a Baseline policy. The Baseline provides the best interven-
tion outcomes that could be achieved by any intervention policy (since it assumes
the full knowledge of true cell space). The following parameters are used for the
numerical experiments: p= 0.05,γ= 0.95, and ϵ= 0.01, the initial state "1011",
representing the cancerous condition.
The average reward over 100 independent runs obtained by the proposed Baye-
sian intervention policy, the naive intervention policy, and the Baseline is pre-
sented in Fig. 4(b). As can be seen, the reward gained by the proposed Bayesian
policy becomes closest to the Baseline after a few steps (i.e., a few numbers of
interventions). The performance of the naive intervention policy is notably poor,
with an average 2 out of 3 genes remaining activated. In contrast, the Bayesian
intervention policy demonstrates a significant improvement by effectively deac-
tivating approximately 2.4 of the genes, which highlights the superiority of the
proposed approach. Furthermore, Fig. 4(c) shows the average absolute difference
between the rewards obtained by the Baseline and the proposed Bayesian pol-
icy and the Baseline and the naive intervention policy. As can be seen, a much
smaller absolute reward difference is achieved for the proposed intervention pol-
icy. In particular, the absolute reward difference approaches zero for the proposed
Bayesian policy as time progresses, which means the proposed method achieves
intervention performance (i.e., reward) similar to the Baseline. On the other hand,
one can see the poor performance of the naive policy with a large absolute reward
difference over time.
The prior and average posterior probability over cell spaces is shown in Fig.
5(a). A uniform prior is considered over cell spaces (blue bars). The average pos-
teriors after 20 steps are shown with red bars. As can be seen, the proposed method
has been almost able to discern the true cell space, i.e., U17. Aside from the true
cell space, another cell space (i.e., U12 ={u1,u6}) has a large posterior proba-
bility. This set shares a single cell action with the true cell space, making it prob-
abilistically indistinguishable from the true cell space, given 20 observed states.
Furthermore, the average posterior of the true cell space over time is shown in
Fig. 5(b). The average posterior of the true cell space is increasing over time. The
20
reason for not approaching 1 is the existence of another cell space, U12, with a
similar space-specific Nash policy.
Figure 5: (a) The prior and posterior (after 20 steps) probability over cell spaces. (b) The
average posterior of the true cell space over time.
Fig. 6(a) represents the probability assigned to each intervention (a1,a2, and
a3) by both the optimal Nash equilibrium policy and the proposed Bayesian policy
in a single run. It can be seen that the proposed Bayesian policy and Baseline
behave similarly after a few initial steps. In fact, the average result reveals that
the Bayesian intervention policy empirically converges toward the optimal Nash
intervention policy after approximately 7 steps.
In this part, the KL divergence is used as a distance measure between the
optimal Nash equilibrium policy and the proposed Bayesian intervention policy.
Fig. 6(b) represents the average KL divergence performed over 100 independent
runs. The results indicate that these two policies become close to each other not
only in individual runs (as shown in Fig. 6(a)), but also on average. This indicates
the empirical convergence of the proposed policy to the optimal Nash policy as
more interventions are taken, and more data are observed.
In this part of the experiment, we investigate the reason for obtaining a large
posterior probability for a non-true cell space in Fig. 5(a). Fig. 7(a) illustrate the
space-specific Nash policies under the true cell space Uand the cell space U12.
The blue bars represent the probability assigned to each intervention at the 16
states under the true cell space’s Nash equilibrium policy, while the red bars rep-
resent the corresponding probabilities under the Nash policy associated with U12.
One can see the similarity between these two policies in different states.
21
Figure 6: (a) The proposed Bayesian intervention policy and the optimal Nash equilibrium
intervention policy (both stochastic) in one single run. (b) The average KL divergence
between the true Nash intervention policy and the proposed Bayesian intervention policy.
The average rate of state visitations under the proposed Bayesian policy is
shown in Fig. 7(b). One can see the subset of states {x1,x2,x10,x12}are the most
frequently visited states. At these most visited states, we can see the similarity
between the space-specific Nash policies associated with Uand U12 in Fig. 7(a).
This explains the reason behind the similar performance of the proposed Bayesian
policy to the Baseline, despite a large posterior probability for a non-true cell
space.
This section analyzes the impact of the system stochasticity on the perfor-
mance of the proposed Bayesian policy. Fig. 8(a) illustrates the average posterior
of the true cell space under two levels of state stochasticity. The solid line corre-
sponds to the small noise level, characterized by a Bernoulli process noise with
p= 0.001, whereas the dashed line represents a higher noise level with p= 0.15.
The results indicate that when there is less randomness in the system (low stochas-
ticity), the average posterior of the true cell space becomes closer to 1. However,
when the stochasticity level increases (high stochasticity), there is greater uncer-
tainty in determining the true cell space. Therefore, as expected, the proposed
method performs better for less chaotic systems.
Fig. 8(b) shows the average reward obtained by the proposed Bayesian in-
tervention policy and the naive intervention policy under low and high levels of
stochasticity. The average rewards obtained by both policies have more fluctuation
under a larger stochasticity level. The results indicate that the naive intervention
policy performs poorly when the stochasticity level is low. Under a high stochas-
22
Figure 7: (a) The space-specific Nash equilibrium intervention policy associated with U
and U12. (b) The average state visitation rate in 100 independent runs under the proposed
Bayesian intervention policy.
ticity level, it takes longer for the proposed policy to achieve a performance similar
to that of the optimal Nash equilibrium policy. However, the final average reward
obtained by the proposed policy under low and high stochasticity levels is similar.
This demonstrates that the proposed Bayesian policy exhibits greater robustness
compared to the naive policy. In fact, in more chaotic systems characterized by
higher levels of noise, decision-making becomes more challenging for both cells
and intervention, resulting in similar performance regardless of changes in the
noise level.
This section of numerical experiments investigates the robustness of the pro-
posed policy with respect to different cell and intervention spaces. Table 1 presents
the average reward obtained by various policies across 9 pairs of intervention and
true cell spaces. The Bayesian policy and the Baseline outperform the naive pol-
icy in all cases. For a fixed intervention space (i.e., the results in a single row),
a reduction in the reward can be seen for cell spaces with larger elements. This
is due to the greater power of cells with larger cell space to resist intervention.
Given a fixed true cell space (a column in the table), a stronger intervention space
yields a larger or similar average reward. The improvement in the result is more
significant when the size of the intervention space has increased from 2 to 3, and
less significant once it is increased to 4.
23
Figure 8: (a) Average posterior of the true cell space for systems with low (p= 0.001) and
high (p= 0.15) levels of stochasticity. (b) The average reward gained by the Bayesian
intervention policy and naive intervention policy under low (p= 0.001) and high (p=
0.15) levels of stochasticity.
7.2. Melanoma Regulatory Network
In this part of the numerical experiment, we evaluate the effectiveness of the
proposed Bayesian intervention policy using the melanoma regulatory network.
Melanoma is a deadly type of skin cancer arising from melanocytes’ malignant
conversion [21, 46, 47]. In this paper, we consider a well-known Boolean network
model of melanoma network [21], which is widely studied in deriving genomics
interventions. Fig. 9(a) illustrates the regulatory relationships among the genes
in the network. This network consists of a total of 10 genes and 1,024 states. The
state vector shows the activation/inactivation of the following genes in sequential
order: WNT5A, pirin, S100P, RET1, MMP3, PHOC, MART1, HADHB, synu-
clein, and STC2. The network function can be expressed as:
24
Table 2: Average steady-state reward gained by different policies under different intervention sets
and true cell spaces
U=
1
0
0
0
U=
1 1
0 0
0 0
0 1
U=
1 1 0
0 0 0
0 0 1
0 1 1
A=
0 1
0 0
0 0
0 0
Baseline: 0.402 ±0.008 Baseline: 1.044 ±0.013 Baseline: 1.802 ±0.021
Bayesian: 0.415 ±0.026 Bayesian: 1.057 ±0.036 Bayesian: 1.885 ±0.039
Naive: 1.319 ±0.010 Naive: 2.207 ±0.012 Naive: 2.602 ±0.011
A=
0 1 0
0 0 0
0 0 0
0 0 1
Baseline: 0.288 ±0.011 Baseline: 0.627 ±0.016 Baseline: 0.833 ±0.026
Bayesian: 0.297 ±0.028 Bayesian: 0.637 ±0.041 Bayesian: 0.846 ±0.052
Naive: 1.188 ±0.010 Naive: 1.941 ±0.011 Naive: 2.131 ±0.009
A=
0 1 0 0
0 0 0 0
0 0 0 1
0 0 1 1
Baseline: 0.193 ±0.008 Baseline: 0.565 ±0.018 Baseline: 0.725 ±0.028
Bayesian: 0.209 ±0.032 Bayesian: 0.602 ±0.053 Bayesian: 0.744 ±0.062
Naive: 1.051 ±0.012 Naive: 1.740 ±0.014 Naive: 1.969 ±0.012
f(xk) = [f1(xk), f2(xk), ..., f10(xk)]T=
(S100P MMP3 PHOC)(MMP3 PHOC)
(WNT5A S100P MMP3)(WNT5A S100P MMP3)
MART1
(WNT5A pirin RET1)(pirin RET1)
(RET1 synuclein)synuclein
(RET1 MART1)(RET1 MART1 STC2)
MART1
(WNT5A MMP3)(MMP3 synuclein)(WNT5A MMP3 synuclein)
(RET1 MART1 STC2)(RET1 MART1 STC2)MART1
S100P
.
The intervention objective is to reduce the activation of two genes: WNT5A
and pirin. This can be expressed using the following intervention reward function:
Ra(x,a,u,x) = 2 x(1) x(2),(37)
where the reward of 2 is reached if both genes are inactivated, 1 if one of them is
activated, and 0 when both genes are in the inactivated state.
25
Figure 9: (a) The pathway diagram for the melanoma regulatory network. (b) The aver-
age reward gained by the Bayesian intervention policy, naive intervention policy, and the
Baseline.
In our experiment, we consider modeling cell responses using single-gene per-
turbations, which lead to 11 distinct cell actions denoted as u1to u11. The action
u1represents no cell stimuli, and u2to u11 correspond to gene 1 to gene 10 stim-
uli, respectively. Similar to the previous experiment, cell spaces are assumed to
contain one, two, or three cell actions, resulting in 231 possible cell spaces. We
use the following true (unknown) cell space in our experiment:
U=U48 ={u5,u8},(38)
where the cell has the capability to alter the state value of the RET1 or MART1.
The intervention space contains three possible actions as A={a1,a2,a3}, where
a1indicates no intervention, and a2and a3represent interventions targeting RET1
and PHOC, respectively. All the parameters are the same as in the previous exper-
iment. The initial state is randomly selected from states with activated WNT5A
and pirin.
Fig. 9(b) represents the average reward obtained by the proposed Bayesian
intervention policy, naive intervention policy, and the Baseline. The average re-
ward achieved by the Bayesian policy gradually converges towards the Baseline
after a few steps. In contrast, the naive intervention policy performs poorly, with
an average reward of approximately half of the Bayesian policy. This difference
26
highlights the superiority of the Bayesian approach to probabilistically model the
cell space and fight back against internal cell responses through stochastic policy.
Figure 10: (a) The average posterior of the true cell space over time. (b) The average KL
divergence between the true Nash intervention policy and the proposed Bayesian inter-
vention policy.
Fig. 10(a) illustrates the average posterior of the true cell space over time. As
can be seen, the true cell space has the largest posterior probability, and its prob-
ability approaches 1 after about 15 steps. Furthermore, Fig. 6(b) shows the aver-
age KL divergence between the true Nash equilibrium intervention policy and the
proposed Bayesian intervention policy. The KL divergence approaching zero in-
dicates the empirical convergence of the Bayesian policy converges to the optimal
Nash policy.
8. Conclusion
This paper develops a Bayesian intervention policy for gene regulatory net-
works (GRNs) that takes into account cell defensive responses. The temporal dy-
namics of GRNs are modeled using a Boolean network with perturbation (BNp)
model, and the interaction between the cell and the intervention is formulated as
a two-player zero-sum game. Given incomplete information about cell responses,
this paper provides a recursive and probabilistic method to capture the posterior
distribution of cell defensive responses. The Bayesian policy is introduced using
the combination of the cell-specific Nash policies for each cell space and the pos-
terior distribution associated with them. Our analytical results demonstrate the
superiority of the proposed intervention policy against several existing interven-
tion techniques. Meanwhile, the superiority of the proposed intervention policy
27
is demonstrated through comprehensive numerical experiments using the p53-
MDM2 negative feedback loop regulatory network and melanoma network.
Our future studies will explore the extension of the proposed game-theoretic
intervention policy to practical settings, including studying the partial observabil-
ity of the genes’ state through noisy gene-expression data, as well as addressing
scalability issues related to large gene regulatory networks and cell stimuli spaces.
Acknowledgment
The authors acknowledge the support of the National Institute of Health award
1R21EB032480-01, National Science Foundation awards IIS-2311969 and IIS-
2202395, ARMY Research Laboratory award W911NF2320179, ARMY Research
Office award W911NF2110299, and Office of Naval Research award N00014-23-
1-2850.
References
[1] H. Lähdesmäki, I. Shmulevich, O. Yli-Harja, On learning gene regulatory
networks under the Boolean network model, Machine learning 52 (2003)
147–167.
[2] A. Paul, J. Sil, Optimized time-lag differential method for constructing gene
regulatory network, Information Sciences 478 (2019) 222–238.
[3] vZ. Puvsnik, M. Mraz, N. Zimic, M. Movskon, Review and assessment
of Boolean approaches for inference of gene regulatory networks, Heliyon
(2022).
[4] Z. Zou, H. Chen, P. Poduval, Y. Kim, M. Imani, E. Sadredini, R. Cammarota,
M. Imani, BioHD: an efficient genome sequence search platform using hy-
perdimensional memorization, in: Proceedings of the 49th Annual Interna-
tional Symposium on Computer Architecture, 2022, pp. 656–669.
[5] W.-P. Lee, Y.-T. Hsiao, Inferring gene regulatory networks using a hybrid
ga–pso approach with numerical constraints and network decomposition, In-
formation Sciences 188 (2012) 80–99.
[6] M. Alali, M. Imani, Inference of regulatory networks through temporally
sparse data, Frontiers in control engineering 3 (2022) 1017256.
28
[7] E. R. Dougherty, R. Pal, X. Qian, M. L. Bittner, A. Datta, Stationary and
structural control in gene regulatory networks: basic concepts, International
Journal of Systems Science 41 (2010) 5–16.
[8] A. Yerudkar, E. Chatzaroulas, C. Del Vecchio, S. Moschoyiannis, Sampled-
data control of probabilistic Boolean control networks: A deep reinforce-
ment learning approach, Information Sciences 619 (2023) 374–389.
[9] M. Takizawa, K. Kobayashi, Y. Yamashita, Design of reduced-order and
pinning controllers for probabilistic Boolean networks using reinforcement
learning, Applied Mathematics and Computation 457 (2023) 128211.
[10] S. Dai, B. Li, J. Lu, J. Zhong, Y. Liu, A unified transform method for general
robust property of probabilistic Boolean control networks, Applied Mathe-
matics and Computation 457 (2023) 128137.
[11] J. A. Aledo, E. Goles, M. Montalva-Medel, P. Montealegre, J. C. Valverde,
Symmetrizable Boolean networks, Information Sciences 626 (2023) 787–
804.
[12] A. Ravari, S. F. Ghoreishi, M. Imani, Optimal inference of hidden Markov
models through expert-acquired data, IEEE Transactions on Artificial Intel-
ligence (2024).
[13] C. Su, J. Pang, CABEAN: a software for the control of asynchronous
Boolean networks, Bioinformatics 37 (2021) 879–881.
[14] L. Van den Broeck, M. Gordon, D. Inzé, C. Williams, R. Sozzani, Gene
regulatory network inference: connecting plant biology and mathematical
modeling, Frontiers in genetics 11 (2020) 457.
[15] D. Mercatelli, L. Scalambra, L. Triboli, F. Ray, F. M. Giorgi, Gene regulatory
network inference resources: A practical overview, Biochimica et Biophys-
ica Acta (BBA)-Gene Regulatory Mechanisms 1863 (2020) 194430.
[16] Y. You, Z. Hua, An intelligent intervention strategy for patients to prevent
chronic complications based on reinforcement learning, Information Sci-
ences 612 (2022) 1045–1065.
[17] J. Zhong, Y. Liu, J. Lu, W. Gui, Pinning control for stabilization of Boolean
networks under knock-out perturbation, IEEE Transactions on Automatic
Control 67 (2021) 1550–1557.
29
[18] S. H. Hosseini, M. Imani, Learning to fight against cell stimuli: A game
theoretic perspective, in: 2023 IEEE Conference on Artificial Intelligence
(CAI), IEEE, 2023, pp. 285–287.
[19] R. Pal, A. Datta, E. R. Dougherty, Optimal infinite-horizon control for prob-
abilistic Boolean networks, IEEE Transactions on Signal Processing 54
(2006) 2375–2387.
[20] B. Faryabi, J.-F. Chamberland, G. Vahedi, A. Datta, E. R. Dougherty, Opti-
mal intervention in asynchronous genetic regulatory networks, IEEE Journal
of Selected Topics in Signal Processing 2 (2008) 412–423.
[21] X. Qian, E. R. Dougherty, Intervention in gene regulatory networks via
phenotypically constrained control policies based on long-run behavior,
IEEE/ACM Transactions on Computational Biology and Bioinformatics 9
(2011) 123–136.
[22] Q. Liu, Y. He, J. Wang, Optimal control for probabilistic Boolean networks
using discrete-time Markov decision processes, Physica A: Statistical Me-
chanics and its Applications 503 (2018) 1297–1307.
[23] M. Imani, U. M. Braga-Neto, Control of gene regulatory networks using
Bayesian inverse reinforcement learning, IEEE/ACM transactions on com-
putational biology and bioinformatics 16 (2018) 1250–1261.
[24] M. Imani, U. M. Braga-Neto, Control of gene regulatory networks with
noisy measurements and uncertain inputs, IEEE Transactions on Control of
Network Systems 5 (2017) 760–769.
[25] M. Imani, U. M. Braga-Neto, Finite-horizon LQR controller for partially-
observed Boolean dynamical systems, Automatica 95 (2018) 172–179.
[26] M. Imani, U. Braga-Neto, Multiple model adaptive controller for partially-
observed Boolean dynamical systems, in: 2017 American Control Confer-
ence (ACC), IEEE, 2017, pp. 1103–1108.
[27] M. Imani, M. Imani, S. F. Ghoreishi, Optimal Bayesian biomarker selection
for gene regulatory networks under regulatory model uncertainty, in: 2022
American Control Conference (ACC), IEEE, 2022, pp. 1379–1385.
30
[28] M. Imani, U. Braga-Neto, Point-based value iteration for partially-observed
Boolean dynamical systems with finite observation space, in: 2016 IEEE
55th Conference on Decision and Control (CDC), IEEE, 2016, pp. 4208–
4213.
[29] M. Imani, S. F. Ghoreishi, U. M. Braga-Neto, Bayesian control of large mdps
with unknown dynamics in data-poor environments, Advances in neural
information processing systems 31 (2018).
[30] M. Imani, U. Braga-Neto, Optimal control of gene regulatory networks
with unknown cost function, in: 2018 Annual American Control Confer-
ence (ACC), IEEE, 2018, pp. 3939–3944.
[31] I. Shmulevich, E. R. Dougherty, W. Zhang, From Boolean to probabilistic
Boolean networks as models of genetic regulatory networks, Proceedings of
the IEEE 90 (2002) 1778–1792.
[32] L. E. Chai, S. K. Loh, S. T. Low, M. S. Mohamad, S. Deris, Z. Zakaria, A
review on the computational approaches for gene regulatory network con-
struction, Computers in biology and medicine 48 (2014) 55–65.
[33] K. Zhang, Z. Yang, T. Ba¸sar, Multi-agent reinforcement learning: A selective
overview of theories and algorithms, Handbook of reinforcement learning
and control (2021) 321–384.
[34] K. Zhang, S. Kakade, T. Basar, L. Yang, Model-based multi-agent RL in
zero-sum Markov games with near-optimal sample complexity, Advances in
Neural Information Processing Systems 33 (2020) 1166–1178.
[35] K. Zhang, Z. Yang, T. Basar, Policy optimization provably converges to
nash equilibria in zero-sum linear quadratic games, Advances in Neural
Information Processing Systems 32 (2019).
[36] I. Bose, B. Ghosh, The p53-mdm2 network: from oscillations to apoptosis,
Journal of biosciences 32 (2007) 991–997.
[37] W. Abou-Jaoudé, M. Chaves, J.-L. Gouzé, A theoretical exploration of
birhythmicity in the p53-mdm2 network, PLOS one 6 (2011) e17075.
[38] J. S. Chauhan, M. Hölzel, J.-P. Lambert, F. M. Buffa, C. R. Goding, The
mitf regulatory network in melanoma, Pigment Cell & Melanoma Research
35 (2022) 517–533.
31
[39] A. Ravari, S. Ghoreishi, M. Imani, Structure-based inverse reinforcement
learning for quantification of biological knowledge, in: IEEE Conference on
Artificial Intelligence, 2023.
[40] A. Ravari, S. F. Ghoreishi, M. Imani, Optimal recursive expert-enabled infer-
ence in regulatory networks, IEEE Control Systems Letters 7 (2023) 1027–
1032.
[41] M. Alali, M. Imani, Reinforcement learning data-acquiring for causal in-
ference of regulatory networks, in: American Control Conference (ACC),
IEEE, 2023.
[42] L. S. Shapley, Stochastic games, Proceedings of the national academy of
sciences 39 (1953) 1095–1100.
[43] A. Rubinstein, H. W. Kuhn, O. Morgenstern, J. Von Neumann, Theory of
Games and Economic Behavior, Princeton university press, 2007.
[44] E. Batchelor, A. Loewer, G. Lahav, The ups and downs of p53: understand-
ing protein dynamics in single cells, Nature Reviews Cancer 9 (2009) 371.
[45] S. Nag, J. Qin, S. KS, M. Wang, R. Zhang, The mdm2-p53 pathway revis-
ited, The Journal of Biomedical Research 27(4) (2013) 254–271.
[46] J. Paluncic, Z. Kovacevic, P. J. Jansson, D. Kalinowski, A. M. Merlot, M. L.-
H. Huang, H. C. Lok, S. Sahni, D. J. Lane, D. R. Richardson, Roads to
melanoma: Key pathways and emerging players in melanoma progression
and oncogenic signaling, Biochimica et Biophysica Acta (BBA) - Molecular
Cell Research 1863 (2016) 770–784.
[47] W. Guo, H. Wang, C. Li, Signal pathways of melanoma and targeted therapy,
Signal Transduction and Targeted Therapy 6 (2021) 424.
32
... A special and important class of networked systems are those with binary state variables (i.e., nodes), such as gene regulatory networks [3]- [9], attack graphs [10], [11], sensor networks [12], [13], brain networks [14], and social networks [15]. Partially-observed Boolean dynamical systems (POBDS) model is designed to represent intricate networks with binary state variables [16], [17]. POBDS consolidates all existing Boolean network frameworks [18], [19]. ...
... The POBDS consists of state and measurement processes [16], [17]. The state process {x k ; k = 0, 1, . . . ...
... The proposed framework is applied to evaluate the performance of the widely known p53-MDM2 GRN [16], [17], [35]- [37]. This GRN is tasked with encoding the p53 tumor suppressor protein in humans. ...
Conference Paper
This paper focuses on joint state and parameter estimation in partially observed Boolean dynamical systems (POBDS), a hidden Markov model tailored for modeling complex networks with binary state variables. The majority of current techniques for parameter estimation rely on com-putationally expensive gradient-based methods, which become intractable in most practical applications with large size of networks. We propose a gradient-free approach that uses Gaussian processes to model the expensive log-likelihood function and utilizes Bayesian optimization for efficient likelihood search over parameter space. Joint state estimation is also achieved alongside parameter estimation using the Boolean Kalman filter. The performance of the proposed method is demonstrated using gene regulatory networks observed through synthetic gene-expression data. The numerical results demonstrate the scalability and effectiveness of the proposed method in the joint estimation of the model parameters and genes' states.
... Each gene's state, denoted by x k (i) ∈ {0, 1}, indicates whether the ith gene is activated (1) or inactive (0). The state process for this network can be expressed as [Hosseini and Imani (2024)]: ...
Conference Paper
Full-text available
This paper addresses the inference challenges associated with a class of hidden Markov models with binary state variables, known as partially observed Boolean dynamical systems (POBDS). POBDS have demonstrated remarkable success in modeling the ON and OFF dynamics of genes, microbes, and bacteria in systems biology, as well as in network security to represent the propagation of attacks among interconnected elements. Despite existing optimal and approximate inference solutions for POBDS, scalability remains a significant issue due to the computational cost associated with likelihood evaluations and the exploration of extensive parameter spaces. To overcome these challenges, this paper proposes a kernel-based particle filtering approach for large-scale inference of POBDS. Our method employs a Gaussian process (GP) to efficiently represent the expensive-to-evaluate likelihood function across the parameter space. The likelihood evaluation is approximated using a particle filtering technique, enabling the GP to account for various sources of uncertainty, including limited likelihood evaluations. Leveraging the GP's predictive behavior, a Bayesian optimization strategy is derived for effectively seeking parameters yielding the highest likelihood, minimizing the overall computational burden while balancing exploration and exploitation. The proposed method's performance is demonstrated using two biological networks: the mammalian cell-cycle network and the T-cell large granular lymphocyte leukemia network.
Conference Paper
Full-text available
Gene Regulatory Networks (GRNs) are pivotal in governing diverse cellular processes, such as stress response, DNA repair, and mechanisms associated with complex diseases like cancer. The interventions in GRNs aim to restore the system state to its normal condition by altering gene activities over time. Unlike most intervention approaches that rely on the direct observability of the system state and assume no response of the cell against intervention, this paper models the fight between intervention and cell dynamic response using a partially observed zero-sum Markov game with binary state variables. The paper derives a stochastic intervention policy under partial state observability of genes. The optimal Nash equilibrium intervention policy is first obtained for the underlying system. To overcome the challenges of partial state observability, the paper employs the optimal minimum mean-square error (MMSE) state estimator to estimate the system state, given all available information. The proposed intervention policy utilizes the optimal Nash intervention policy associated with the optimal MMSE state estimator. The performance of the proposed method is examined using numerical experiments on the melanoma regulatory network observed through gene-expression data.
Conference Paper
Full-text available
Bayesian attack graphs (BAGs) are powerful models to capture the time-varying progression of attacks in complex interconnected networks. Network elements are modeled by graph nodes, and connections among components are represented through edges. The nodes take binary values, representing the compromised and uncompromised state of the network components. BAGs also offer a probabilistic representation of the likelihood of external and internal attacks progressing through exploit probabilities. The accuracy and timely detection of attacks are the main objectives in the security analysis of networks modeled by BAGs. This can ensure network safety by identifying network vulnerabilities and designing better defense strategies (e.g., reimaging devices, installing firewalls, changing connections, etc.). Two main challenges in achieving accurate detection in complex networks are 1) the partial monitoring of the network components due to the limited available resources and 2) the uncertainty in identifying and removing some compromises in the network due to the ever-evolving complexity of attacks. For a general class of BAGs, this paper presents an optimal minimum mean square error (MMSE) attack detection technique with arbitrary uncertainty in the monitoring and reimaging process. As with the Kalman filtering approach used for linear Gaussian state-space models, the derived solution exhibits the same optimality. A recursive matrix-form implementation of the proposed detection method is introduced, and its performance is examined through numerical experiments using a synthetic BAG.
Conference Paper
Full-text available
Autonomy through humans and autonomous agents becomes more prevalent in many complex domains, including time-sensitive and unknown environments. Examples include crisis response or operational planning, where partial knowledge about casualties, locations, and the number of victims in disaster zones might be available. Several approaches have been developed to tackle the issue arising from the partial knowledge of the environment by establishing communication among agents and humans. However, communication might be limited or non-existent in complex domains with no access to communication tools or no time to process information or respond to queries. This paper develops a perception learning approach that allows agents to implicitly reason about humans' perception of the environment using limited human data without direct communication. Human is modeled as a non-optimal reinforcement learning agent in a partially known Markov decision process. A recursive method is derived to optimally build a probabilistic model of the environment using agents' experience and quantified humans' perception. We demonstrate that the learned perception models can be incorporated into various decision-making policies relying on the environment model. The performance of the proposed method is investigated using a rescue operation team consisting of a human and an agent.
Conference Paper
Full-text available
Current genomics interventions have limitations in accounting for cell stimuli and the dynamic response to intervention. Although genomic sequencing and analysis have led to significant advances in personalized medicine, the complexity of cellular interactions and the dynamic nature of the cellular response to stimuli pose significant challenges. These limitations can lead to chronic disease recurrence and inefficient genomic interventions. Therefore, it is necessary to capture the full range of cellular responses to develop effective interventions. This paper presents a game-theoretic model of the fight between the cell and intervention, demonstrating analytically and numerically why current interventions become ineffective over time. The performance is analyzed using melanoma regulatory networks, and the role of artificial intelligence in deriving effective solutions is described.
Conference Paper
Full-text available
Gene regulatory networks (GRNs) play crucial roles in various cellular processes, including stress response, DNA repair, and the mechanisms involved in complex diseases such as cancer. Biologists are involved in most biological analyses. Thus, quantifying their policies reflected in available biological data can significantly help us to better understand these complex systems. The primary challenges preventing the utilization of existing machine learning, particularly inverse reinforcement learning techniques, to quantify biologists' knowledge are the limitations and huge amount of uncertainty in biological data. This paper leverages the network-like structure of GRNs to define expert reward functions that contain exponentially fewer parameters than regular reward models. Numerical experiments using mammalian cell cycle and synthetic gene-expression data demonstrate the superior performance of the proposed method in quantifying biologists' policies.
Conference Paper
Full-text available
Gene regulatory networks (GRNs) consist of multiple interacting genes whose activities govern various cellular processes. The limitations in genomics data and the complexity of the interactions between components often pose huge uncertainties in the models of these biological systems. Meanwhile, inferring/estimating the interactions between components of the GRNs using data acquired from the normal condition of these biological systems is a challenging or, in some cases, an impossible task. Perturbation is a well-known genomics approach that aims to excite targeted components to gather useful data from these systems. This paper models GRNs using the Boolean network with perturbation, where the network uncertainty appears in terms of unknown interactions between genes. Unlike the existing heuristics and greedy data-acquiring methods, this paper provides an optimal Bayesian formulation of the data-acquiring process in the reinforcement learning context, where the actions are perturbations, and the reward measures step-wise improvement in the inference accuracy. We develop a semi-gradient reinforcement learning method with function approximation for learning near-optimal data-acquiring policy. The obtained policy yields near-exact Bayesian optimality with respect to the entire uncertainty in the regulatory network model, and allows learning the policy offline through planning. We demonstrate the performance of the proposed framework using the well-known p53-Mdm2 negative feedback loop gene regulatory network.
Article
Full-text available
Accurate inference of biological systems, such as gene regulatory networks and microbial communities, is a key to a deep understanding of their underlying mechanisms. Despite several advances in the inference of regulatory networks in recent years, the existing techniques cannot incorporate expert knowledge into the inference process. Expert knowledge contains valuable biological information and is often reflected in available biological data, such as interventions made by biologists for treating diseases. Given the complexity of regulatory networks and the limitation of biological data, ignoring expert knowledge can lead to inaccuracy in the inference process. This paper models the regulatory networks using Boolean network with perturbation. We develop an expert-enabled inference method for inferring the unknown parameters of the network model using expert-acquired data. Given the availability of information about data-acquiring objectives and expert confidence, the proposed method optimally quantifies the expert knowledge along with the temporal changes in the data for the inference process. The numerical experiments investigate the performance of the proposed method using the well-known p53-MDM2 gene regulatory network.
Article
Full-text available
A major goal in genomics is to properly capture the complex dynamical behaviors of gene regulatory networks (GRNs). This includes inferring the complex interactions between genes, which can be used for a wide range of genomics analyses, including diagnosis or prognosis of diseases and finding effective treatments for chronic diseases such as cancer. Boolean networks have emerged as a successful class of models for capturing the behavior of GRNs. In most practical settings, inference of GRNs should be achieved through limited and temporally sparse genomics data. A large number of genes in GRNs leads to a large possible topology candidate space, which often cannot be exhaustively searched due to the limitation in computational resources. This paper develops a scalable and efficient topology inference for GRNs using Bayesian optimization and kernel-based methods. Rather than an exhaustive search over possible topologies, the proposed method constructs a Gaussian Process (GP) with a topology-inspired kernel function to account for correlation in the likelihood function. Then, using the posterior distribution of the GP model, the Bayesian optimization efficiently searches for the topology with the highest likelihood value by optimally balancing between exploration and exploitation. The performance of the proposed method is demonstrated through comprehensive numerical experiments using a well-known mammalian cell-cycle network.
Article
The rise of reinforcement learning (RL) has guided a new paradigm: unraveling the intervention strategies to control systems with unknown dynamics. Model-free RL provides an exhaustive framework to devise therapeutic methods to alter the regulatory dynamics of gene regulatory networks (GRNs). This paper presents an RL-based technique to control GRNs modeled as probabilistic Boolean control networks (PBCNs). In particular, a double deep-network (DDQN) approach is proposed to address the sampled-data control (SDC) problem of PBCNs, and optimal state feedback controllers are obtained, rendering the PBCNs stabilized at a given equilibrium point. Our approach is based on options, i.e., the temporal abstractions of control actions in the Markov decision processes (MDPs) framework. First, we define options and hierarchical options and give their properties. Then, we introduce multi-time models to compute the optimal policies leveraging the options framework. Furthermore, we present a DDQN algorithm: i) to concurrently design the feedback controller and the sampling period; ii) wherein the controller intelligently decides the sampled period to update the control actions under the SDC scheme. The presented method is model-free and offers scalability, thereby providing an efficient way to control large-scale PBCNs. Finally, we compare our control policy with state-of-the-art control techniques and validate the presented results.