ArticlePDF Available

Random Boolean Network Models and the Yeast Transcriptional Network

Authors:
  • Institute for Systems Biology, Seattle WA United States

Abstract and Figures

The recently measured yeast transcriptional network is analyzed in terms of simplified Boolean network models, with the aim of determining feasible rule structures, given the requirement of stable solutions of the generated Boolean networks. We find that, for ensembles of generated models, those with canalyzing Boolean rules are remarkably stable, whereas those with random Boolean rules are only marginally stable. Furthermore, substantial parts of the generated networks are frozen, in the sense that they reach the same state, regardless of initial state. Thus, our ensemble approach suggests that the yeast network shows highly ordered dynamics.
Content may be subject to copyright.
Random Boolean network models and the yeast
transcriptional network
Stuart Kauffman*, Carsten Peterson
†‡
, Bjo
¨
rn Samuelsson
, and Carl Troein
*Department of Cell Biology and Physiology, University of New Mexico Health Sciences Center, Albuquerque, NM 87131; and
Complex Systems Division,
Department of Theoretical Physics, Lund University, So¨ lvegatan 14A, S-223 62 Lund, Sweden
Communicated by Philip W. Anderson, Princeton University, Princeton, NJ, October 6, 2003 (received for review June 30, 2003)
The recently measured yeast transcriptional network is analyzed in
terms of simplified Boolean network models, with the aim of
determining feasible rule structures, given the requirement of
stable solutions of the generated Boolean networks. We find that,
for ensembles of generated models, those with canalyzing Boolean
rules are remarkably stable, whereas those with random Boolean
rules are only marginally stable. Furthermore, substantial parts of
the generated networks are frozen, in the sense that they reach the
same state, regardless of initial state. Thus, our ensemble approach
suggests that the yeast network shows highly ordered dynamics.
genetic networks dynamical systems
T
he regulatory network for Saccharomyces cerevisiae was
recently measured (1) for 106 of the 141 known transcription
factors by determining the bindings of transcription factor
proteins to promoter regions on the DNA. Associating the
promoter regions with genes yields a network of directed gene
gene interactions. As described in refs. 1 and 2, the significance
of measured bindings with regard to inferring putative interac-
tions are quantified in terms of P values. Lee et al. (1) did not
infer interactions having P values above a threshold value, P
th
0.001, for most of their analysis. Small threshold values, P
th
,
correspond to a small number of inferred interactions with high
quality, whereas larger values correspond to more inferred
connections, but of lower quality. It was found that for the P
th
0.001 network, the fan-out from each transcription factor to its
regulated targets is substantial, on the average 38 (1). From the
underlying data (http:兾兾web.wi.mit.eduyoungregulatory
network), one finds that fairly few signals feed into each of them;
on the average 1.9. The experiments yield the regulatory network
architecture but yield neither the interaction rules at the nodes,
nor the dynamics of the system, nor its final states.
With no direct experimental results on the states of the system,
there is, of course, no systematic method to pin down the
interaction rules, not even within the framework of simplified
and coarse-grained genetic network models; e.g., ones where the
rules are Boolean. One can nevertheless attempt to investigate
to what extent the measured architecture can, based on criteria
of stability, select between classes of Boolean models (3).
We generate ensembles of different model networks on the
given architecture and analyze their behavior with respect to
stability. In a stable system, small initial perturbations should not
grow in time. This aspect is investigated by monitoring how the
Hamming distances between different initial states evolve in a
Derrida plot (4). If small Hamming distances diverge in time, the
system is unstable and vice versa. Based on this criterion, we find
that synchronously updated random Boolean networks (with a
flat rule distribution) are marginally stable on the transcriptional
network of yeast.
By using a subset of Boolean rules, nested canalyzing functions
(see Methods and Models), the ensemble of networks exhibits
remarkable stability. The notion of nested canalyzing functions
is introduced to provide a natural way of generating canalyzing
rules, which are abundant in biology (5). Furthermore, it turns
out that for these networks, there exists a fair amount of forcing
structures (3), where nonnegligible parts of the networks are
frozen to fixed final states regardless of the initial conditions.
Also, we investigate the consequences of rewiring the network
while retaining the local properties; the number of inputs and
outputs for each node (6).
To accomplish the above, some tools and techniques were
developed and used. To include more interactions besides those
in the P
th
0.001 network (1), we investigated how network
properties, local and global, change as P
th
is increased. We found
a transition slightly above P
th
0.005, indicating the onset of
noise in the form of biologically irrelevant inferred connections.
In ref. 5, extensive literature studies revealed that, for eu-
karyotes, the rules seem to be canalyzing. We developed a
convenient method to generate a distribution of canalyzing rules,
that fit well with the list of rules presented by Harris et al. (5).
Methods and Models
Choosing Network Architecture. Lee et al. (1) calculated P values
as measures of confidence in the presence of an interaction. With
further elucidation of noise levels, one might increase the
threshold for P values from the value 0.001 used in Lee et al. (1).
To this end, we computed various network properties, to inves-
tigate whether there is any value of P
th
for which these properties
exhibit a transition that can be interpreted as the onset of noise.
In Fig. 1, the number of nodes, mean connectivity, mean pairwise
distance (radius), and fraction of node pairs connected are
shown. As can be seen, there appears to be a transition slightly
above P
th
0.005. In what follows, we therefore focus on the
network defined by P
th
0.005. Furthermore, we (recursively)
remove genes that have no outputs to other genes, because these
are not relevant for the network dynamics. The resulting network
is shown in Fig. 2.
Generating Rules. Lee et al. (1) determined the architecture of the
network but not the specific rules for the interactions. To
investigate the dynamics on the measured architecture, we
repeatedly assign a random Boolean rule to each node in the
network. We use two rule distributions; one null hypothesis and
one distribution that agrees with rules compiled from the
literature (ref. 5; see also Supporting Text, which is published as
supporting information on the PNAS web site). In both cases, we
ensure that every rule depends on all of its inputs because the
dependence should be consistent with the network architecture.
As a null hypothesis, we use a flat distribution among all
Boolean functions that depend on all inputs. For rules with a few
inputs, this will create rules that can be expressed with normal
Boolean functions in a convenient way. In the case of many
inputs, most rules are unstructured and the result of toggling one
input value will appear random.
In biological systems, the distribution of rules is likely to be
structured. Indeed, all of the rules compiled by Harris et al. (5)
are canalyzing (3); a canalyzing Boolean function (3) has at least
one input, such that for at least one input value, the output value
To whom correspondence should be addressed. E-mail: carsten@thep.lu.se.
© 2003 by The National Academy of Sciences of the USA
14796–14799
PNAS
December 9, 2003
vol. 100
no. 25 www.pnas.orgcgidoi10.1073pnas.2036429100
is fixed. It is not straightforward to generate biologically relevant
canalyzing functions. A canalyzing rule implies some structure,
but the function of the noncanalyzing inputs (when the canalyz-
ing inputs are clamped to their noncanalyzing values) could be
as disordered as the full set of random Boolean rules. However,
the canalyzing structure is repeated in a nested fashion for almost
all rules in the study by Harris et al. (5). Hence, we introduce the
concept of nested canalyzing functions (see Appendix), which can
be used to generate distributions of canalyzing rules. Actually, of
the 139 rules of Harris et al. (5), only 6 are not nested canalyzing
functions (see Tables 1 and 2, which are published as supporting
information on the PNAS web site).
A special case of nested canalyzing functions is the recently
introduced notion of chain functions (ref. 7; see Appendix).
Chain functions are the most abundant form of nested canalyzing
functions, although 32 of the 139 rules in the study by Harris et al.
(5) fall outside this class.
It turns out that the rule distribution of nested canalyzing
functions in the study by Harris et al. (5) can be well described
by a model with only one parameter (see Appendix). Hence, we
use this model to mimic the compiled rule distribution. The free
parameter determines the degree of asymmetry between active
and inactive states and its value reflects the fact that most genes
are inactive at any given time in a gene regulatory system.
Analyzing the Dynamics. A biological system is subject to a
substantial amount of noise, making robustness a necessary
feature of any model. We expect a transcriptional network to be
stable, in that a random disturbance cannot be allowed to grow
uncontrollably. Gene expression levels can be approximated as
Boolean, because genes tend to be either active or inactive. This
approximation for genetic networks is presumably easier to
handle for stability issues than for general dynamical properties.
Using synchronous updates is computationally and conceptually
convenient, although it may at first sight appear unrealistic.
However, in instances of strong stability, the update order should
not be very important.
To study the time development of small fluctuations in this
discrete model with synchronous updating, we investigate how
the Hamming distance between two states evolves with time. In
a Derrida plot (4), pairs of initial states are sampled at defined
initial distances, H(0), from the entire state space, and their
mean Hamming distance, H(t), after a fixed time, t, is plotted
against the initial distance, H(0). The slope in the low H region
indicates the fate of a small disturbance. If the curve is above
below the line, H(t) H(0), it reflects instabilitystability in the
sense that a small disturbance tend to increasedecrease during
the next, t, time steps (see Fig. 3).
It is not uncommon that transcription factors control their own
expression. In some cases, genes up-regulate themselves, with
the effect that their behavior becomes less linear and more
switch-like. This action is readily mimicked in a Boolean net-
work. However, in the other case, where a transcription factor
down-regulates itself, the system will be stabilized in a model
with continuous variables, provided that the time delay of the
self-interaction is not too large. Boolean networks can only
model the limit of large time delays, which gives rise to nodes that
in a nonbiological manner repeatedly flip between no activity
and full activity without requiring any external input. Thus, the
self-interactions need to be treated as a special case in the
Boolean approximation. To this end, we consider three different
alternatives: (i) view the self-interactions as internal parts of the
rules (all self-interactions are removed); (ii) remove the possi-
bility for self-interactions to be down-regulating; and (iii)no
special treatment of self-interactions.
It is natural to use alternative i as a reference point to understand
the effect of the self-interactions in alternatives ii and iii.
We want to examine how the geometry of networks influence
the dynamics. It is known (3) that the distributions of in- and
out-connectivities of the nodes strongly affect the dynamics in
Boolean networks, but how important is the overall architec-
ture? If for each node, we preserve the connectivities, but
otherwise rewire the network randomly (6), how is the dynamics
Fig. 1. Topological properties of the yeast regulatory network described by
Lee et al. (1) for different P value thresholds excluding nodes with no outputs:
number of nodes (solid line), mean connectivity (dotted line), mean pairwise
distance (radius) (dottedsolid line), and fraction of node pairs that are
connected (dashed line). The right y axis corresponds to the number of nodes,
whereas the other quantities are indicated on the left y axis. Self-couplings
were excluded, but the figure looks similar when they are included. The
dashed vertical line marks the threshold, P
th
0.005.
Fig. 2. The P
th
0.005 network excluding nodes with no outputs to other
nodes. The filled areas in the arrowheads are proportional to the probability
of each coupling to be in a forcing structure when the nested canalyzing rules
are used on the network without self-interactions. This probability ranges
from approximately one-fourth, for the inputs to YAP6, to one, for the inputs
to one-input nodes. Nodes that will reach a frozen state (on or off) in the
absence of down-regulating self-interactions, regardless of the choice of
rules, are shown as dashes. For the other nodes, the grayscale indicates the
probability of being frozen in the absence of self-interactions, ranging from
97% (bold black) to 99.9% (light gray).
Kauffman et al. PNAS
December 9, 2003
vol. 100
no. 25
14797
BIOPHYSICS
affected? For a Derrida plot with t 1, there is no change. If we
only take a single time step from a random state, the outputs will
not have time to be used as inputs. There will be correlations
between nodes, but the measured quantity H(1) is a mean over
all nodes, and this is not affected by these correlations. Hence,
H(1) is not changed by the rewiring. To obtain a better picture
of the dynamics, we need to increase t. However, if we go high
enough in t to probe larger structures in the networks, we lose
sight of the transient effects of a perturbation.
To remedy this situation, we opt to select a fixed initial
Hamming distance, H(0), and examine the expectation value of
the distance as a function of time, by using the nested canalyzing
rules. As noise entering the biological network would act on the
current state of the system rather than on an entirely random
one, we select one of the states to be a fixed point of the
dynamics, and let the probability of any given fixed point be
proportional to the size of its attractor basin. A graph of H(t)
shows the relaxation behavior of the perturbed system where the
self-interactions have been removed (see Fig. 4a). We investigate
the role of the self-interactions both in terms of relaxation of a
perturbed fixed point (see Fig. 4b) and in terms of probabilities
for random trajectories to arrive at distinct fixed points and
cycles.
The assumption that the typical state of these networks is a
fixed point can be motivated. A forcing connection (3) is a pair
of connected nodes, such that control over a single input to one
node is sufficient to force the output of the other node to one of
the Boolean values. With canalyzing rules, this outcome is
fulfilled when the canalyzed output of the first node is a
canalyzing input to the second. The condition of forcing struc-
tures implies stability, because a (forcing) signal traveling
through such a structure will block out other inputs and is
thereby likely to cause information loss. Abundant forcing
structures should tend to favor fixed points.
Results and Discussion
Despite the absence of knowledge about initial and final states,
we have been able to get a hint about possible interaction rules
within a Boolean network framework for the yeast transcrip-
tional network. Our findings are as follows: (i) Canalyzing
Boolean rules confer far more stability than rules drawn from a
flat distribution as is clear from the Derrida plots in Fig. 3. Yet,
even a flat distribution of Boolean functions yields marginal
stability; (ii) The dynamical behavior around fixed points is more
stable for the measured network than for the rewired ones,
although only in the early time evolution (two to three time
steps) of the systems (see Fig. 4a). The behavior at this time scale
can be expected to depend largely on small network motifs,
whose numbers are systematically changed by the rewiring (6);
(iii) The removal of self-couplings increases the stability in these
networks. However, the relaxation is only changed significantly
if we allow the toggling of self-interacting nodes (see Fig. 4b).
This finding means that a node with a switch-like self-interaction
is not likely to be toggled by its inputs during the relaxation, nor
do the down-regulating self-interactions alter the relaxation.
This result means that the overall properties of relaxation to
fixed points can be investigated regardless of how the self-
interactions should be modeled; (iv) The number of attractors
and their length distribution are strongly dependent on how the
self-interactions are modeled. The average numbers of distinct
fixed points per rule assignments found in 1,000 trials of different
trajectories are 1.02, 4.33, and 3.79, respectively, for the three
self-interaction models. The numbers of two-cycles are 0.02,
0.09, and 0.38, respectively. Longer cycles are less common; in
total they sum up to 0.03, 0.11, and 0.11, respectively; and (v)
Fig. 3. Evolution of different Hamming distances, H(0) with one time step to
H(1) [Derrida plots (4)] for random rules (dark gray) and nested canalyzing
rules (light gray) with and without self-couplings (dashed borders), respec-
tively. (Down-regulating self-couplings are allowed.) The bands correspond to
1
variation among the different rule assignments generated on the archi-
tecture in Fig. 2. Statistics were gathered from 1,000 starts on each of 1,000
rule assignments.
Fig. 4. The average time evolution of perturbed xed points for nested canalyzing rules, starting from Hamming distance, H(0) 5; impact of the network
architecture (a) and impact of the self-interactions (b). The lines marked with circles in both gures correspond to the network in Fig. 2 without self-interactions.
The gray lines in a show the relaxation for 26 different rewired architectures with no self-interactions, with 1
errors of the calculated means indicated by the
line widths. The black lines in b correspond to the network in Fig. 2 with self-interactions. The upper line shows the case when it is allowed to toggle nodes with
self-interactions as a state at H(0) 5 is picked, whereas the lower line shows the relaxation if this toggling is not allowed. The widths of these lines show the
difference between allowing self-interactions to be repressive or not repressive.
14798
www.pnas.orgcgidoi10.1073pnas.2036429100 Kauffman et al.
Forcing structures (3) are prevalent for this architecture with
canalyzing rules, as is evident from Fig. 2. On average, 56% of
the couplings belong to forcing structures. As a consequence,
most nodes will be forced to a fixed state regardless of the initial
state of the network. Even the highly connected nodes (in the
center of the network) will be forced to a fixed state for a vast
majority of the random rule assignments. In most cases, the
whole network will be forced to a specific fixed state. At first
glance, this might seem nonbiological. However, in the real
world, there are more inputs to the system than the measured
transcription factors, and to study a process such as the cell cycle,
one may need to consider additional components of the system.
With more inputs, such a strong stability, of the measured part
of the network, may be necessary for robustness of the entire
system.
Future reverse engineering projects in transcriptional net-
works may be based on the restricted pool of nested canalyzing
rules, which have been shown to generate very robust networks
in this case. It should be pointed out that the notion of nested
canalyzing functions is not intrinsically Boolean. For instance,
the same concept can be applied to nested sigmoids.
Appendix: Nested Canalyzing Functions
The notion of nested canalyzing functions is a natural extension
of canalyzing functions. Consider a K input Boolean rule, R, with
inputs i
1
,...,i
K
and output o. R is canalyzing on the input i
m
if
there are Boolean values, I
m
and O
m
, such that i
m
I
m
f o
O
m
. I
m
is the canalyzing value, and O
m
is the canalyzed value for
the output.
For each canalyzing rule, R, renumber the inputs in a way such
that R is canalyzing on i
1
. Then, there are Boolean values, I
1
and
O
1
, such that i
1
I
1
f o O
1
. To investigate the case i
1
not
I
1
, fix i
1
to this value. This defines a new rule R
1
with K 1 inputs;
i
2
,...,i
K
. In most cases, when picking R from compiled data, R
1
is also canalyzing. Then, renumber the inputs in order for R
1
to
be canalyzing on i
2
. Fixing i
2
not I
2
renders a rule R
2
with the
inputs i
3
,...,i
K
. As long as the rules R, R
1
, R
2
, . . . are canalyzing,
we can repeat this procedure until we find R
K1
, which has only
one input, i
K
, and, hence, is trivially canalyzing. Such a rule R is
a nested canalyzing function and can be described by the
canalyzing input values, I
1
,...,I
K
, together with their respective
canalyzed output values, O
1
,...,O
K
, and an additional value,
O
default
. The output is given by
o
O
1
if i
1
I
1
O
2
if i
1
I
1
and i
2
I
2
O
3
if i
1
I
1
and i
2
I
2
and i
3
I
3
·
·
·
O
K
if i
1
I
1
and 䡠䡠䡠and i
K1
I
K1
and i
K
I
K
O
default
if i
1
I
1
and 䡠䡠䡠and i
K
I
K
.
The notion of chain functions in Gat-Viks and Shamir (7) is
equivalent to nested canalyzing functions that can be written on
the form I
1
䡠䡠䡠 I
K1
false.
We want to generate a distribution of rules with K inputs, such
that all rules depend on every input. The dependency require-
ment is fulfilled if and only if O
default
not O
K
. Then, it remains
to choose values for I
1
,...,I
K
and O
1
,...,O
K
. These values are
independently and randomly chosen with the probabilities
PI
m
true PO
m
true
exp共⫺2
m
1 exp共⫺2
m
for m 1,...,K. For all generated distributions, we let
7.
The described scheme is sufficient to generate a well defined
rule distribution, but each rule has more than one representation
in I
1
,...,I
K
and O
1
,...,O
K
.InSupporting Text, we describe how
to obtain a unique representation, which is applied to the rules
compiled in Harris et al. (5). This result enables us to present a
firm comparison between the generated distribution and the list
of rules in Harris et al. (5). (See Fig. 5, which is published as
supporting information on the PNAS web site.)
We thank Stephen Harris for providing details underlying ref. 5. C.T.
thanks the Swedish National Research School in Genomics and Bioin-
formatics for support. This work was initiated at the Kavli Institute for
Theoretical Physics (Santa Barbara, CA) (C.P. and S.K.) and was
supported in part by National Science Foundation Grant PHY99-07949.
1. Lee, T. I., Rinaldi, N. J., Robert, F., Odom, D. T., Bar-Joseph, Z., Gerber,
G. K., Hannett, N. M., Harbison, C. T., Thompson, C. M., Simon, I., et al. (2002)
Science 298, 799804.
2. Hughes, T. R., Marton, M. J., Jones, A. R., Roberts, C. J., Stoughton, R.,
Armour, C. D., Bennett, H. A., Coffey, E., Dai, H., He, Y. D., et al. (2000) Cell
102, 109126.
3. Kauffman, S. A. (1993) Origins of Order: Self-Organization and Selection
in Evolution (Oxford Univ. Press, Oxford).
4. Derrida, B. & Weisbuch, G. (1986) J. Physique 47, 12971303.
5. Harris, S. E., Sawhill, B. K., Wuensche, A. & Kauffman, S. (2002) Complexity
7, 2340.
6. Maslov, S. & Sneppen, K. (2002) Science 296, 910913.
7. Gat-Viks, I. & Shamir, R. (2003) Bioinformatics 19, Suppl. 1, 1108
1117.
Kauffman et al. PNAS
December 9, 2003
vol. 100
no. 25
14799
BIOPHYSICS
... If the subfunction which is evaluated when the canalizing variable does not receive its canalizing input is also canalizing, the function is 2-canalizing, etc. 15 . If all n variables of a function become eventually canalizing, the function is n-canalizing, also known as nested canalizing 16 . The number of variables that become eventually canalizing is known as the canalizing depth 15 . ...
... For unbiased networks with high in-degree (e.g., K = 5, p = 0.5), the MAE was very close to the maximally observed value of 0.25, even when using fourth-order Taylor approximations. Low-degree functions with a high absolute bias exhibit the highest degree of canalization, irrespective of whether canalization is measured on the variable level 14,16 or the function level 10,34 (Fig. 7). The amount of canalization in N-K Kauffman networks correlates thus highly with their approximability. ...
... When f C is not canalizing, then the integer k is the canalizing depth of f 15 . If k = n (i.e., if all variables are become eventually canalizing), then f is a nested canalizing function (NCF) 16 . By 17 , every nonzero Boolean function f(x 1 , …, x n ) can be uniquely written as ...
Article
Full-text available
Biological networks, such as gene regulatory networks, possess desirable properties. They are more robust and controllable than random networks. This motivates the search for structural and dynamical features that evolution has incorporated into biological networks. A recent meta-analysis of published, expert-curated Boolean biological network models has revealed several such features, often referred to as design principles. Among others, the biological networks are enriched for certain recurring network motifs, the dynamic update rules are more redundant, more biased, and more canalizing than expected, and the dynamics of biological networks are better approximable by linear and lower-order approximations than those of comparable random networks. Since most of these features are interrelated, it is paramount to disentangle cause and effect, that is, to understand which features evolution actively selects for, and thus truly constitute evolutionary design principles. Here, we compare published Boolean biological network models with different ensembles of null models and show that the abundance of canalization in biological networks can almost completely explain their recently postulated high approximability. Moreover, an analysis of random N–K Kauffman models reveals a strong dependence of approximability on the dynamical robustness of a network.
... They argued for the ubiquity of these functions in biological networks. Furthermore, in the same year, Kauffman et al. 3 provided a simplified definition of the chain function class based on the canalyzing input values. Their work revealed that out of the 139 rules (or BFs) compiled by Harris et al. 27 , 132 were NCFs and amongst those 107 were chain functions. ...
... We begin with a definition given in Akutsu et al. 28 and then move on to an alternative version which is of importance to us and then finally to the definition of Kauffman et al. 3 (based on canalyzing inputs). 2. At all but the last layer, a positive variable is followed by an AND ( ∧ ) operator, whereas a negative variable is followed by an OR ( ∨ ) operator. ...
... In this manner, the positive variables may be moved to the last position, thereby restoring the apparent inconsistency between f 3 and f 4 . Lastly, Kauffman et al. 3 defined the chain functions as a specific constraint on the canalyzing inputs of the NCFs as follows: ...
Article
Full-text available
Boolean networks (BNs) have been extensively used to model gene regulatory networks (GRNs). The dynamics of BNs depend on the network architecture and regulatory logic rules (Boolean functions (BFs)) associated with nodes. Nested canalyzing functions (NCFs) have been shown to be enriched among the BFs in the large-scale studies of reconstructed Boolean models. The central question we address here is whether that enrichment is due to certain sub-types of NCFs. We build on one sub-type of NCFs, the chain functions (or chain-0 functions) proposed by Gat-Viks and Shamir. First, we propose two other sub-types of NCFs, namely, the class of chain-1 functions and generalized chain functions, the union of the chain-0 and chain-1 types. Next, we find that the fraction of NCFs that are chain-0 (also holds for chain-1) functions decreases exponentially with the number of inputs. We provide analytical treatment for this and other observations on BFs. Then, by analyzing three different datasets of reconstructed Boolean models we find that generalized chain functions are significantly enriched within the NCFs. Lastly we illustrate that upon imposing the constraints of generalized chain functions on three different GRNs we are able to obtain biologically viable Boolean models.
... In this work, due to the size and complexity of the GRN of P. aeruginosa CCBH 4851 (Chagas et al., 2022), there was a need to reduce the GRN into a core sub-network. Kauffman et al. (2003) describe that the definition of the core subnetwork for attractor calculation should consider all nodes with at least one outgoing edge, which can change the network state from one timestep t to the next. Therefore, for this work, the core An example of basins of attraction state graph from literature. ...
... In more than 130 curated GRN models, the majority of regulatory rules used are called Nested Canalizing Functions (Kadelka et al., 2020), which has increased interest in studying these Boolean behaviors and their impact on GRN dynamics and controllability (Murrugarra and Dimitrova, 2015). It has been observed that Boolean networks governed by canalizing functions are typically more stable than those governed by random functions (Kauffman et al., 2003), as they have a smaller number of attractors and are more robust to perturbations (Dimitrova et al., 2022). In general, the greater the quantity and prevalence of canalization, the more stable the dynamics of the model (Karlsson and Hörnquist, 2007). ...
... These 46 nodes always have conserved initial states, as it is through the incoming edge that the state of a gene can be modified. The initial conditions of the core sub-network (which in this work are the binarized bulk RNA-seq expression data) can be defined as a starting point for the trajectory simulation, but the landscape of attractors does not depend on the initial conditions (Kauffman et al., 2003). The adopted Boolean function assignment for all nodes in the network is the Nested Canalizing Functions (NCFs). ...
Article
Full-text available
Introduction Pseudomonas aeruginosa infections are one of the leading causes of death in immunocompromised patients with cystic fibrosis, diabetes, and lung diseases such as pneumonia and bronchiectasis. Furthermore, P. aeruginosa is one of the main multidrug-resistant bacteria responsible for nosocomial infections worldwide, including the multidrug-resistant CCBH4851 strain isolated in Brazil. Methods One way to analyze their dynamic cellular behavior is through computational modeling of the gene regulatory network, which represents interactions between regulatory genes and their targets. For this purpose, Boolean models are important predictive tools to analyze these interactions. They are one of the most commonly used methods for studying complex dynamic behavior in biological systems. Results and discussion Therefore, this research consists of building a Boolean model of the gene regulatory network of P. aeruginosa CCBH4851 using data from RNA-seq experiments. Next, the basins of attraction are estimated, as these regions and the transitions between them can help identify the attractors, representing long-term behavior in the Boolean model. The essential genes of the basins were associated with the phenotypes of the bacteria for two conditions: biofilm formation and polymyxin B treatment. Overall, the Boolean model and the analysis method proposed in this work can identify promising control actions and indicate potential therapeutic targets, which can help pinpoint new drugs and intervention strategies.
... Various mathematical models have been proposed to characterize GRNs, including graphical models [9], Bayesian network models [10], information theory methods [11], and neural network models [12]. Boolean network models (BNs), first proposed by Kauffman in the 1970s to study GRNs [13,14]. The fundamental premise of a BN is that genes exhibit switch-like behavior, assigning each gene a Boolean state (i.e. ...
... Reconstructing a BN that is precise in both network topology and logical rules is essential for understanding and controlling the dynamic behavior of a GRN [14,17]. While prior information are valuable for manual GRN reconstruction, they might not capture the context-specific or dynamic aspects of the system [18,19]. ...
Article
Full-text available
Reconstructing the topology of gene regulatory network from gene expression data has been extensively studied. With the abundance functional transcriptomic data available, it is now feasible to systematically decipher regulatory interaction dynamics in a logic form such as a Boolean network (BN) framework, which qualitatively indicates how multiple regulators aggregated to affect a common target gene. However, inferring both the network topology and gene interaction dynamics simultaneously is still a challenging problem since gene expression data are typically noisy and data discretization is prone to information loss. We propose a new method for BN inference from time-series transcriptional profiles, called LogicGep. LogicGep formulates the identification of Boolean functions as a symbolic regression problem that learns the Boolean function expression and solve it efficiently through multi-objective optimization using an improved gene expression programming algorithm. To avoid overly emphasizing dynamic characteristics at the expense of topology structure ones, as traditional methods often do, a set of promising Boolean formulas for each target gene is evolved firstly, and a feed-forward neural network trained with continuous expression data is subsequently employed to pick out the final solution. We validated the efficacy of LogicGep using multiple datasets including both synthetic and real-world experimental data. The results elucidate that LogicGep adeptly infers accurate BN models, outperforming other representative BN inference algorithms in both network topology reconstruction and the identification of Boolean functions. Moreover, the execution of LogicGep is hundreds of times faster than other methods, especially in the case of large network inference.
... At this critical point, the system neither freezes nor evolves randomly, resulting in complex gene state evolution. As living systems also exhibit critical behaviors, studying RBNs sheds light on the evolution of natural selection [11][12][13]. However, actual living systems appear to maintain an edge of chaos even with an average in-degree surpassing the critical point [11,12]. ...
... Therefore, agents' future states can be determined not only by interactions with others but also by their internal rules. In this context, we specifically examine the self-interaction of artificial genes, mirroring the observed selfinteraction in real genes [13,24]. This internal process distinguishes individual artificial genes, giving them a unique internal function. ...
Article
Full-text available
Random Boolean Networks (RBNs) model complex networks with numerous variables, serving as a tool for gene expression and genetic regulation modeling. RBNs exhibit phase transitions, contingent on node degrees. Given the significance of phase transitions in collective behaviors, the study explores the relationship between RBNs and actual living system networks, which also display critical behaviors. Notably, living systems exhibit such behaviors even beyond the predicted critical point in RBNs. This paper introduces a novel RBNs model incorporating a rewiring process for edge connections/disconnections. In contrast to prior studies, our model includes artificial genes occasionally adding self-loops and creating an instant and temporal lookup table. Consequently, our proposed model demonstrates the edge of chaos at higher node degrees. It serves as an abstract RBNs model generating noisy behaviors from internal agent processes without external parameter tuning.
... In this report, we used a synchronous update mode for the network vertices, wherein all vertices are updated simultaneously. While an asynchronous update mode may align better with biological realism, the choice of update mode is not crucial given the computational and conceptual advantages of synchronous updates and the enhanced system stability achieved by utilizing nested canalized transfer functions (Kauffman et al., 2003). In the synchronous update mode, the system progresses in consecutive temporal states (Eq. ...
... of criticality, such as long-range spatiotemporal correlations and heightened sensitivity to stimuli, also contribute to biological functions [16,17], which may potentially lead to several general trade-offs such as robustness and accuracy [18], robustness and evolvability [19], and robustness and flexibility [20]. In bacterial metabolism, the coordinated behavior of individual biomolecules is crucial for sustaining high growth rates and minimizing lag time during nutrient shifts, highlighting an inherent trade-off between growth rate and physiological adaptation [21]. ...
Article
Full-text available
The metabolic network plays a crucial role in regulating bacterial metabolism and growth, but it is subject to inherent molecular stochasticity. Previous studies have utilized flux balance analysis and the maximum entropy method to predict metabolic fluxes and growth rates, while the underlying principles governing bacterial metabolism and growth, especially the criticality hypothesis, remain unclear. In this study, we employ a maximum entropy approach to investigate the universality in various constraint-based metabolic networks of Escherichia coli. Our findings reveal the existence of universal scaling relations across different nutritional environments and metabolic network models, similarly to the universality observed in physics. By analyzing single-cell data, we confirm that metabolism of E. coli operates close to the state with maximum Fisher information, which serves as a signature of criticality. This critical state provides functional advantages such as high sensitivity and long-range correlation. Moreover, we demonstrate that a metabolic system operating at criticality takes a compromise solution between growth and adaptation, thereby serving as a survival strategy in fluctuating environments.
Article
Boolean models of gene regulatory networks (GRNs) have gained widespread traction as they can easily recapitulate cellular phenotypes via their attractor states. Their overall dynamics are embodied in a state transition graph (STG). Indeed, two Boolean networks (BNs) with the same network structure and attractors can have drastically different STGs depending on the type of Boolean functions (BFs) employed. Our objective here is to systematically delineate the effects of different classes of BFs on the structural features of the STG of reconstructed Boolean GRNs while keeping network structure and biological attractors fixed, and explore the characteristics of BFs that drive those features. Using $10$ reconstructed Boolean GRNs, we generate ensembles that differ in BFs and compute from their STGs the dynamics’ rate of contraction or ‘bushiness’ and rate of ‘convergence’, quantified with measures inspired from cellular automata (CA) that are based on the garden-of-Eden (GoE) states. We find that biologically meaningful BFs lead to higher STG ‘bushiness’ and ‘convergence’ than random ones. Obtaining such ‘global’ measures gets computationally expensive with larger network sizes, stressing the need for feasible proxies. So we adapt Wuensche’s $Z$-parameter in CA to BFs in BNs and provide four natural variants, which, along with the average sensitivity of BFs computed at the network level, comprise our descriptors of local dynamics and we find some of them to be good proxies for bushiness. Finally, we provide an excellent proxy for the ‘convergence’ based on computing transient lengths originating at random states rather than GoE states.
Article
Discrete dynamical systems serve as useful formal models to study diffusion phenomena in social networks. Several recent papers have studied the algorithmic and complexity aspects of some decision problems on synchronous Boolean networks, which are discrete dynamical systems whose underlying graphs are directed, and may contain directed cycles. Such problems can be regarded as reachability problems in the phase space of the corresponding dynamical system. Previous work has shown that some of these decision problems become efficiently solvable for systems on directed acyclic graphs (DAGs). Motivated by this line of work, we investigate a number of decision problems for dynamical systems whose underlying graphs are DAGs. We show that computational intractability (i.e., PSPACE -completeness) results for reachability problems hold even for dynamical systems on DAGs. We also identify some restricted versions of dynamical systems on DAGs for which reachability problem can be solved efficiently. In addition, we show that a decision problem (namely, Convergence), which is efficiently solvable for dynamical systems on DAGs, becomes PSPACE -complete for Quasi-DAGs (i.e., graphs that become DAGs by the removal of a single edge). In the process of establishing the above results, we also develop several structural properties of the phase spaces of dynamical systems on DAGs.
Article
Gene regulatory networks (GRNs) play a central role in cellular decision-making. Understanding their structure and how it impacts their dynamics constitutes thus a fundamental biological question. GRNs are frequently modeled as Boolean networks, which are intuitive, simple to describe, and can yield qualitative results even when data are sparse. We assembled the largest repository of expert-curated Boolean GRN models. A meta-analysis of this diverse set of models reveals several design principles. GRNs exhibit more canalization, redundancy, and stable dynamics than expected. Moreover, they are enriched for certain recurring network motifs. This raises the important question why evolution favors these design mechanisms.
Article
Full-text available
Control rules governing transciption of eukaryotic genes can be modeled as Boolean function, and these rules are strongly biased toward large numbers of “canalizing” inputs. The ensemble of networks with the observed canalizing bias predicts cells are in an ordered regime with convergent flow in transcription state space, a percolating subnetwork of genes fixed on or off an isolated islands of twinkling genes turning on or off, and a near power-law distribution of cascades of gene activity changes following perturbations. The data suggest that a given cell state or type can be represented as an attractor of transcriptional activity or flow over time. © 2002 Wiley Periodicals, Inc.
Article
Full-text available
Random Boolean nets are systems of randomly connected binary units (or spins). Each spin σi can take two possible values (σ i = 0 or 1). It receives, at time t, K binary input signals coming from K connected spins and updates its state according to a deterministic Boolean function of the K inputs. We compare the time evolution of the overlaps between different configurations for the two following models : Kauffman's model, for which the connections and Boolean function of each spin are randomly chosen at time t = 0 and remain unchanged at later times; the annealed model, for which these parameters are randomly reset at each time step. The numerical simulations for both models agree remarkably well with the theoretical predictions available for the second model. Les réseaux booléens aléatoires sont constitués d'unités logiques binaires connectées aléatoirement. A chaque intervalle de temps t, chaque unité, ou spin, prend la valeur 0 ou 1 suivant une fonction booléenne de K signaux d'entrée binaires provenant des K spins connectés. Nous comparons l'évolution au cours du temps des recouvrements entre des configurations initialement différentes pour les deux modèles suivants : dans le modèle de Kauffman, les connexions et les fonctions booléennes des automates sont choisies une fois pour toutes à l'instant initial. Dans le modèle recuit ces paramètres font l'objet d'un nouveau tirage aléatoire à chaque pas de temps. Les simulations numériques effectuées pour les deux modèles sont dans un accord remarquable avec les prédictions théoriques faites pour le second modèle.
Article
Full-text available
This article is written as a prolegomena, both to a research program, and a forthcoming book discussing the same issues in greater detail (Kauffman, 1991). The suspicion that evolutionary theory needs broadening is widespread. To accomplish this, however, will not be easy. The new framework I shall discuss here grows out of the realization that complex systems of many kinds exhibit high spontaneous order. This implies that such order is available to evolution and selective forces for further molding. But it also implies, quite profoundly, that the spontaneous order in such systems may enable, guide and limit selection. Therefore, the spontaneous order in complex systems implies that selection may not be the sole source of order in organisms, and that we must invent a new theory of evolution which encompasses the marriage of selection and self-organization.
Article
Full-text available
Molecular networks guide the biochemistry of a living cell on multiple levels: Its metabolic and signaling pathways are shaped by the network of interacting proteins, whose production, in turn, is controlled by the genetic regulatory network. To address topological properties of these two networks, we quantified correlations between connectivities of interacting nodes and compared them to a null model of a network, in which all links were randomly rewired. We found that for both interaction and regulatory networks, links between highly connected proteins are systematically suppressed, whereas those between a highly connected and low-connected pairs of proteins are favored. This effect decreases the likelihood of cross talk between different functional modules of the cell and increases the overall robustness of a network by localizing effects of deleterious perturbations.
Article
Full-text available
We have determined how most of the transcriptional regulators encoded in the eukaryote Saccharomyces cerevisiaeassociate with genes across the genome in living cells. Just as maps of metabolic networks describe the potential pathways that may be used by a cell to accomplish metabolic processes, this network of regulator-gene interactions describes potential pathways yeast cells can use to regulate global gene expression programs. We use this information to identify network motifs, the simplest units of network architecture, and demonstrate that an automated process can use motifs to assemble a transcriptional regulatory network structure. Our results reveal that eukaryotic cellular functions are highly connected through networks of transcriptional regulators that regulate other transcriptional regulators.
Article
Full-text available
One of the grand challenges of system biology is to reconstruct the network of regulatory control among genes and proteins. High throughput data, particularly from expression experiments, may gradually make this possible in the future. Here we address two key ingredients in any such ‘reverse engineering’ effort: The choice of a biologically relevant, yet restricted, set of potential regulation functions, and the appropriate score to evaluate candidate regulatory relations. We propose a set of regulation functions which we call chain functions, and argue for their ubiquity in biological networks. We analyze their complexity and show that their number is exponentially smaller than all boolean functions of the same dimension. We define two new scores: one evaluating the fitness of a candidate set of regulators of a particular gene, and the other evaluating a candidate function. Both scores use established statistical methods. Finally, we test our methods on experimental gene expression data from the yeast galactose pathway. We show the utility of using chain functions and the improved inference using our scores in comparison to several extant scores. We demonstrate that the combined use of the two scores gives an extra advantage. We expect both chain functions and the new scores to be helpful in future attempts to infer regulatory networks. Contact: {iritg,rshamir}@post.tau.ac.il * To whom correspondence should be addressed.
Book
Stuart Kauffman here presents a brilliant new paradigm for evolutionary biology, one that extends the basic concepts of Darwinian evolution to accommodate recent findings and perspectives from the fields of biology, physics, chemistry and mathematics. The book drives to the heart of the exciting debate on the origins of life and maintenance of order in complex biological systems. It focuses on the concept of self-organization: the spontaneous emergence of order widely observed throughout nature. Kauffman here argues that self-organization plays an important role in the emergence of life itself and may play as fundamental a role in shaping life's subsequent evolution as does the Darwinian process of natural selection. Yet until now no systematic effort has been made to incorporate the concept of self-organization into evolutionary theory. The construction requirements which permit complex systems to adapt remain poorly understood, as is the extent to which selection itself can yield systems able to adapt more successfully. This book explores these themes. It shows how complex systems, contrary to expectations, can spontaneously exhibit stunning degrees of order, and how this order, in turn, is essential for understanding the emergence and development of life on Earth. Topics include the new biotechnology of applied molecular evolution, with its important implications for developing new drugs and vaccines; the balance between order and chaos observed in many naturally occurring systems; new insights concerning the predictive power of statistical mechanics in biology; and other major issues. Indeed, the approaches investigated here may prove to be the new center around which biological science itself will evolve. The work is written for all those interested in the cutting edge of research in the life sciences.
Article
Ascertaining the impact of uncharacterized perturbations on the cell is a fundamental problem in biology. Here, we describe how a single assay can be used to monitor hundreds of different cellular functions simultaneously. We constructed a reference database or "compendium" of expression profiles corresponding to 300 diverse mutations and chemical treatments in S. cerevisiae, and we show that the cellular pathways affected can be determined by pattern matching, even among very subtle profiles. The utility of this approach is validated by examining profiles caused by deletions of uncharacterized genes: we identify and experimentally confirm that eight uncharacterized open reading frames encode proteins required for sterol metabolism, cell wall function, mitochondrial respiration, or protein synthesis. We also show that the compendium can be used to characterize pharmacological perturbations by identifying a novel target of the commonly used drug dyclonine.
  • S E Harris
  • B K Sawhill
  • A Wuensche
  • S Kauffman
Harris, S. E., Sawhill, B. K., Wuensche, A. & Kauffman, S. (2002) Complexity 7, 23–40.
  • S Maslov
  • K Sneppen
Maslov, S. & Sneppen, K. (2002) Science 296, 910–913.